Unevenly spaced time-series forecasting and anomaly detection for an industrial usecase. Any ideas?

Iker Otxoa De Latorre @Iker_Otxoa_De_Latorre

17 September 2021 7 6K Report

Hello everybody,

I am currently working on a PhD project for a car manufacturing company, which basically consists of creating a predictive maintenance application for the machines that are currently used to fill the air conditioning circuits of vehicles. In essence, each cycle consists of two phases designed to perform checks on the circuit followed by a last one in which the corresponding refrigerant gas is charged. Specifically, in a first phase the circuit is pressurized in order to detect leaks from inside to outside the circuit, in a second phase a vacuum is exerted on the circuit in order to detect leaks from the outside to the inside, and finally, if no leak is detected, the circuit is filled. Regarding the data collected, for each of the phases different readings are taken of the pressure reached inside the circuit, except for the gas loading phase:

- First phase (pressurization): A total of three pressure readings are taken at different times (pressurization, stabilization and control).

- Second phase (vacuum): A total of 4 readings are taken at different time instants (release of the circuit pressure to the atmospheric one, vacuum, vacuum stabilization and control).

- Third phase (charge): Grams of gas charged in the circuit.

The attached FillingCurve.png file shows the typical theoretical curve of a filling cycle with the three aforementioned phases. As for the data, the attached SampleDataTable.png table presents a small sample of them.

The objective proposed to me is to model and monitor these variables so that, following a predictive maintenance strategy, it is possible to predict their trends and detect possible anomalies in real time, allowing to anticipate failures in the pressurization of the machines or in the pump in charge of the vacuum. With regard to the results of the cycles, it is worth mentioning that only those NOKs associated with the filling console have been taken into account, discarding the cycles that have been NOKs because the vehicle circuit itself had defects (leaks, bad connections, etc.). In any case, it is to be noted that the factory does not fully trust the assigned NOK labels... so maybe it would be better to just consider the OK samples...

As far as I understand, these data constitute time series, a completely new field for me. I have some experience in supervised and unsupervised classification problems using classical machine learning algorithms, as well as in computer vision using deep learning, but none with respect to time series. One of the problems I have encountered is that the classical techniques for dealing with these types of data, such as ARIMA and its variants are only valid for equispaced time series. However, this does not apply to my case because of the industrial context from which it comes: the machine is not filling continuously, there are line stops, breaks, vacations, maintenance stops, etc.

Can anyone guide me on the way to go? Does anyone know the techniques that can be applied to this type of time series? I would appreciate any kind of help, idea or suggestion, because although I thought it would not be so complicated and that the modeling of the time series was in a very mature state, the truth is that I am quite lost.

I believe that in order to apply the classical techniques, one option would be to summarize the data in new time intervals (hourly, for example), although this is not an alternative with which I feel very comfortable.

Thank you so much in advance.

Qamar Ul Islam

Dear Iker Otxoa De Latorre

Anomaly detection is the identification of rare events, items, or observations which are suspicious because they differ significantly from standard behaviors or patterns. Anomalies in data are also called standard deviations, outliers, noise, novelties, and exceptions.

Kind Regards

Qamar Ul Islam

Andrey Davydenko

What stops you from applying machine learning methods here?

If you want to predict pressure, you need to apply data transformation (such as log-transform), I guess.

Regarding the evaluation of models, please let me suggest the following works:

Article Assessing Point Forecast Bias Across Multiple Time Series: M...

Chapter Forecast Evaluation Techniques for I4.0 Systems

Chapter Forecast Error Measures: Critical Review and Practical Recommendations

If you apply data transformation for non-negative variables and then back-transform forecasts, you need to use MAE-based metrics (such as the RelMAE or the AvgRelMAE) for accuracy and the Overestimation Percentage corrected (OPc) for bias, see the first paper for detailed guidelines.

How many observations have you got? I would suggest applying a range of machine learning techniques and then comparing their performances using methods described in the above works. I do not think ARIMA or similar methods in your case will show better results compared to, e.g., random forecast due to the specific nature of your data.

Marcus Neuer

Just try to make your prepare your data in the right way. 90% of working with data analytics/machine learning deals with getting your data right. Data with different timescales can be transformed so that all time series have the same sample frequency. What programming language do you use? In Python this would be a job for scipy or numpy using rescaling and interpolation, R or Matlab have similar functions for this.

Looking on your table and seeing different engine types, I just emphasise that for any machine learning approach, I would start with a simple problem first, working towards more elaborated cases. So you could start separating the scenarios first. Just consider one engine type, than expand later on, once the simple problem has been solved.

Huy Le

Hola Iker Otxoa De Latorre

In case you would like to utilize Machine Learning, I think you should consider an Unsupervised method, because Supervised learning like Classification is not always practical in industrial environment due to the lack of labeled data.

One solution you can consider is to use AutoEncoder for Anomaly Detection of Time Series data.

You can refer this paper: https://doi.org/10.1016/j.egyai.2021.100065

In case you like a programming example: https://www.tensorflow.org/tutorials/generative/autoencoder#third_example_anomaly_detection

Yes, your data is Time Series, but with Deep Learning approach like AutoEncoder, your time series doesn’t always need to be equally spaced (just accumulate when machines runs).

The general idea is to collect time series data of normal state for training the AutoEncoder until it can reproduce the input with an acceptable gap (small reconstruction errors). Then you used the trained model to monitor real time data. If the trained model cannot reconstruct new data (big reconstruction errors), there is a suspicion of Anomaly.

Of course, some data preparation (e.g. normalization…) must be done when training as well as monitoring.

I am applying this method for my Trabajo Fin de Máster (an Industrial Predictive Maintenance topic) and the preliminary result is quite positive.

With this approach, in short term, you may provide quick results for factory improvement. At the same time, as I search in Google Scholar, this approach is still new (most publications in 2019 - 2021). It means there is still much room for your academic research in long term. I think you're right, Time Series modelling is mature, but not too mature for your further ideas.

Of course, it is just the way I know and there are many possible methods.

I am finalizing my Trabajo Fin de Máster using this method. You can contact me for further discussion if any.

Un saludo,

Marcos Huy Le

Eugene Veniaminovich Lutsenko

To do this, you can use the division of numerical scales into adaptive intervals, in which there will be an equal number of observations (according to Kotelnikov's theorem). This is possible in the Eidos system of my development. I can show you how this is done using a numerical example, if you send the source data. There is information about the system here and in my numerous publications in RG. http://lc.kubagro.ru/aidos/The_Eidos_en.htm

Marius Bendsen

Huy Le - the auto encoder can be further expanded with probability by using a variational auto encoder. New research suggest modelling the latent layer using a Student's - T distribution achieves more robust training.

To achieve anomaly detection, the generated/estimated values can be compared to the true values and if they differ enough, the data point is marked as anomalous.

After each new prediction, the model is retrained on the new point in time.

Check this out: https://www.ijcai.org/Proceedings/2018/0374.pdf

Huy Le

Thanks Marius Bendsen for sharing. This approach looks great.

Where to publish an extensive Digital Twin survey?

Q1 quartile journal recommendations into which a literature review on Digital Twin fits?

Nuances in the digital twin concept. Opinions?

Feedback defines the constitution of an organism?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

Are there any instruments for studying time similar to the way it is in space?

Measuring the Intelligence of a Species?

Why does the MFDFA algorithm need to calculate the profile of the time series?

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

The Curse of Evolution and Complexity?

Is there any machine to do real time pcr?

Need help with my research project on open source SIEM and machine learning?

Swimming/space travel depends on the proprioceptive muscle spindles?