April 24, 2024

Avoid Costs and Inefficiencies With Predictive Maintenance

Algorithms & Functions

The most recent evolution in Statistical Process Control (SPC) is predictive maintenance (PdM). In predictive maintenance, sensors capture data on the state of different components of a system in operation. The data is analyzed for behaviors that correlate with eventual problems or equipment failures. Based on alerts produced by a predictive maintenance model, operators can perform preemptive repairs, avoiding costly downtimes or catastrophic failures. Moreover, by monitoring conditions continually and automatically, predictive maintenance avoids many costs and inefficiencies of routine maintenance.

What is Predictive Maintenance
Types of Maintenance Strategies
Predictive Maintenance Requirements and Challenges
Remaining Useful Life Estimates on Test Data
Summary

What is Predictive Maintenance

Predictive maintenance represents something of a paradigm shift in SPC. Traditionally, SPC infers problems in the manufacturing process based on faults detected in the output, indicating there is a problem going on inside the process (i.e., the process is out-of-control). See below for the classical SPC charts, the Control Chart and Pareto Chart for root cause analysis.

Control Chart

Control Chart

Fault detection

Fault Detection

By embedding sensors within the production system that measure and monitor various characteristics, predictive maintenance purports to predict out of control situations before they happen. This is an ambitious idea, but more feasible today because of Industry 4.0 and the Internet of Things (IoT).

These are recently coined buzzwords that refer to the gamut of technological advances that enable devices to gather and transmit data to the internet and between machines, in near real time. This landscape now includes self-driving vehicles, power plants, airplanes, traffic lights, home thermostats and security cameras, smart phones, and more.

In fact, Tesla already uses a form of predictive maintenance on its vehicles. If the vehicle’s computer determines that a component is bound to need replacing, whether due to wear or age, the vehicle will pre-order the part and notify its owner.

Types of Maintenance Strategies

Reactive maintenance is to run machines until they fail; only fixing problems as they occur. Proactive maintenance includes scheduled maintenance, wherein components are serviced or replaced during scheduled inspections.

Condition-Based Maintenance

Condition-based maintenance (CBM) monitors different characteristics of a system looking for conditions that indicate degradation of equipment. CBM relies on sensors to provide real time or near real time data on the state of the equipment.

Predictive Maintenance

Predictive maintenance combines sensor technology and predictive modeling to predict time-to-failure or remaining useful life (RUL) of different components of a system. Both condition-based maintenance and predictive maintenance proactively prevent catastrophic failures but also avoid the costs and inefficiencies of routine maintenance by monitoring conditions continually and automatically.

Maintenance Strategy	Description	Notes
Run To Failure (R2F)	Reactive maintenance performed after failure	Simple but costly
Preventative Maintenance (PM)	Maintenance is performed periodically according to a schedule	Proactive but costly and inefficient because it is performed whether needed or not
Condition-Based Maintenance (CBM)	Performs maintenance when conditions indicate degradation of equipment	Proactive and more efficient. Depends on sensors to monitor conditions
Predictive Maintenance (PdM)	Applies predictive analytics and other tools to predict remaining useful life or time to failure. Maintenance performed when indicated	Similar to CBM but uses predictive models and machine learning. Depends on sensors and more involved data processing

Remaining Useful Life

Remaining useful life (RUL) is an estimate of the number of hours of normal usage of a component before it fails. For example, the chart below shows a hypothetical Performance-Failure chart for a particular gear in an aircraft engine. The y-axis is the measured condition of the gear. Initially pristine, after hours (or miles) of use, the condition indicator naturally declines until the gear fails (e.g., one or more teeth break or are too worn).

Remaining useful life

Note that in the case of aircraft engines, where a gear failure is potentially catastrophic, data for building or training PdM models is collected from experiments on the ground under simulated conditions. With a good PdM system in place, real-time alerts can be generated while a plane is flying and maintenance scheduled accordingly.

Predictive Maintenance Requirements and Challenges

In predictive maintenance, sensors measure behaviors of different components during operation. Not surprisingly, this generates a massive amount of data that will eventually contain a history of conditions wherein failures occur. From among these data streams, predictive maintenance uses statistical analysis to find Condition Indicators, i.e., sensor measurements that indicate a failure condition or are predictive of a failure condition.

Predictive Maintenance Requirements

Expert engineering knowledge to design the predictive maintenance apparatus
High-tech sensors
Data capture and storage
Analytical software (close to the data) for processing and analyzing the data
User interfaces to monitor conditions and alerts

To generate alerts, predictive maintenance may use any appropriate machine learning or predictive model. As with any modeling task, successful models need the right kind of data. Models to predict failures or the remaining useful life of a component need a history of relevant sensor data paired with known anomalies and failures.

Example: Predictive Maintenance for Filter Replacement

Here we formulate a data driven PdM model for air filter replacement. The data is from the Kaggle website (10.34740/kaggle/dsv/3183137). Our aim is to attempt remaining useful life estimation using a purely data-driven approach.

The data consists of a training set of 50 filters run on a test bed under different conditions. The settings for the test apparatus were:

Dust feed: 5 discernable levels, from Low to High

Dust particle size: Fine, Medium, Coarse

Flow rate: Low, High

The training set includes 11 filters for fine dust, 13 for course dust, and 26 filters for medium dust, with varying settings of dust feed and flow rate.

Data Preprocessing

Differential air pressure across the filter rises as the filter becomes more clogged with dust. Thus, differential pressure is the condition indicator for the health of the filter. The filter fails when differential pressure reaches 600 psi.

Flow rate and dust feed

Flow Rate and Dust Feed

The chart above shows differential pressure for the 50 filters in the training set. A filter fails when differential pressure reaches a measurement of 600.

Differential Pressure vs. Time

Differential Pressure vs. Time

In general, filters with fine dust fail faster than filters with medium or coarse dust. An exception to this is seen in the lower right corner of the chart. For this filter, flow rate and dust feed were both low.

Horizontal lines under 600 indicate right-censored failure times (we know only that failure time to the right of (greater than) the last observation time). Only a few of the filters in the training data set were “run to failure.”

Filter Number	Dust Feed	Flow Rate	Dust Size	Failure Time
11	158.5	58	Fine	62.2
43	79.25	81	Fine	102.2
44	158.5	83	Fine	65.6
46	118	83	Fine	62.6
47	79	59	Fine	104.6

Looking at a single path, for filter 11 (below), we can see there is noise in the measurements. We assume that visible decreases in the paths are due to artifacts of the test setup we aren’t interested in modeling. For these reasons, we first create a smoothed version that is not allowed to decrease.

Filter 11

The smoothed version (shown in red) is a left-sided moving-average with a window size of 50 (coinciding with 5 seconds) with monotonicity enforced. The smoothing operation must be left sided because in real-time operation of the PdM, we will not have future data points.

To generate RULs from the smoothed curve, we use a simple linear projection based on an estimate of slope at each time t, solving for the RUL as follows:

$$ RUL_t = \frac{600-\Delta P_t^s}{\beta_t}$$

where ∆𝑃𝑠𝑡 is the smoothed version of differential pressure and 𝛽𝑡 an estimate of the slope at time t. For the example below, we use a lag-50 difference ratio:

$$\beta_t = \frac{\Delta P_t^s -\Delta P_{t-50}^s}{50}$$

The linear projection will be more accurate on the log scale, so instead we use the natural log for the response,

$$\beta_t = \frac{log(\Delta P_t^s) -log(\Delta P_{t-50}^s)}{50}$$

We show the log smoothed differential path (top panel), and the RUL estimates obtained using the above method for Filter 11 (shown in red in the lower panel). The line in black on the lower panel is the actual RUL at each time point, inferred from the failure time of 62.

RUL for Filter 11

Remaining Useful Life for Filter 11

Similarly, we repeat the method for the remaining filters from the table above.

Filter 43 and 44

Filter 46 and 47

Filter Number	Dust Feed	Flow Rate	Dust Size	Failure Time	RMSE
11	158.5	58	Fine	62.2	5.63
43	79.25	81	Fine	102.2	15.88
44	158.5	83	Fine	65.6	5.22
46	118	83	Fine	62.6	8.00
47	79	59	Fine	104.6	24.97

RMSE for Filters Run to Failure

The root mean squared error is calculated after discarding the first 100 estimates for each filter. A reasonable warm-up period could be even longer in practice since there is more noise in the early measurements and predictive maintenance monitoring usually kicks in after some period of operation.

Remaining Useful Life Estimates on Test Data

To apply the method to the test data, we first fit a predictive model to the smoothed differential pressure in the training data. Then using the trained model, we predict the curves for each filter in the test data and proceed to calculate RUL using the linear projection method as described above. Since the test data contains the actual RUL for each filter we can obtain mean squared errors and compare performance of the models.

Log Linear 1	log(Smoothed_diff_press) ~ FR + DF + Time + Fine_Dust + Medium_Dust + logLag20 + logLag50
Log Linear 2	log(Smoothed_diff_press) ~ FR + DF + Time + Fine_Dust + Medium_Dust + logLag20
Log Linear 3	log(Smoothed_diff_press) ~ FR + DF + Time + Fine_Dust + Medium_Dust + logLag50
SVR (eps-reg) 1	log(Smoothed_diff_press) ~ FR + DF + Time + Fine_Dust + Medium_Dust + logLag20 + logLag50
SVR( eps-reg) 2	log(Smoothed_diff_press) ~ FR + DF + Time + Fine_Dust + Medium_Dust + logLag20
SVR (eps-reg) 3	log(Smoothed_diff_press) ~ FR + DF + Time + Fine_Dust + Medium_Dust + logLag50

Model Candidates

FR and DF are the flow rate and dust feed measures; Fine_Dust and Medium_Dust are indicator variables. (Note that Coarse_Dust is accounted for when Fine_Dust and Medium_Dust are both 0.)

logLag20 and logLag50 are lag 20 (2 second lag) and lag 50 (5 second lag) of the log smoothed differential pressure. Using the lagged variables much improved predictions but also implies that the predictive maintenance process has at most a 2 second window (if lag 20 or both are used) or a 5 second window (if only lag 50 is used) for sending, processing, and analyzing data.

Histograms of the root mean squared error for each of six different models are shown below.

Histograms of RMSE for Six Different Models

Histograms of RMSE for Six Different Models

Among the six models, Log linear model 3 produces the smallest RMSE for all but a few of the filters.

Smallest RMSE for each filter

There are four sources of error in our RUL estimation approach. Any except possibly the first one (measurement error) could be refined to improve the estimates.

Raw data measurement (sensor error, noise in system)
Smoothing estimate (window size, method)
Predicted curve error (model choice, parameter settings)
Linear projection error (slope estimate, window size )

Results for individual filters vary. RUL estimates using the log linear model 3 for Filter 3 track the true RUL closely. Filter 3 had a high Dust_feed, low Flow_rate, and medium dust size.

Filter 3 Remaining Useful Life Estimates

Filter 3 Remaining Useful Life Estimates

Filters 13 and 15 each had low flow rate, low dust feed, and the dust size was medium. The log differential pressure curves rise slowly and (even after smoothing) have step increases and lengthy flat spots. These cases suggest we use a strictly increasing constraint on the smoothing method.

Filter 13

Filter 15

Summary

In this blog we provided an overview of Predictive Maintenance and compared it with other maintenance strategies. Using a data set on air filter degradation, we described a data-driven approach to estimate remaining useful life (RUL).

In connection with sensors and appropriate data feeds, such an approach and similar ones are achievable using IMSL Numerical Library for C or Java as the analytical engine.

See what IMSL can do for your project. Evaluate IMSL libraries for free during your free trial.

Get Started

Getting Proactive With Predictive Analysis