Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Data-driven Predictive Analytics for A Spindle’s
Health Divya Sardana, Raj Bhatnagar
University of CincinnatiRadu Pavel, Jon IversonTechsolve, Inc., Cincinnati
MotivationGoal in Manufacturing Industry:Improve Machine performance.Decrease Machine downtime.
A potential step towards achievingthis goal:Early detection of emerging faults anddegradation trends.
Existing systems:Condition based maintenance (CBM)systems continuously monitor machinesand collect data related to machine statusand performance (Eg., vibration,temperature, wear debris data etc).
Challenge:Management and analysis of thehumungous amount of CBM data toaccurately detect equipment degradationand predict their remaining useful life.
What we do?: Problem Definition• We present a methodology for exploiting largequantities of CBM data and building purely data-drivenmodels of failure to predict the remaining useful life ofSpindle.– Investigate the monitored data (temperature, current andvibration) to see if it is sufficient for building a model.
– Extract features out of monitored data.– Build prediction models using aggregated features over 24-72 hour windows of time to predict the remaining useful lifeof spindle.
Related WorkMachine health prognostic techniques
Model based approaches Data-driven approaches
• Assume the availability of a physical model for failure progression.
• Eg. Crack Propagation Model [1] [2]
• Based on analysis of CBM data, like temperature, vibration, current, etc.
• Time, frequency, time-frequency domain analysis (Eg. FFT [3], Wavelet Transform analysis [4]) used to extract features out of raw data.
Related Work: Data Driven Approaches
• Gebraeel and Lawley [5] developed a neural network based prediction model.
• Huang et al. [6] used self-organizing maps and back propagation neural network-based failure prediction models.
• In 2014, Dong et al. [7] used Support Vector Machine (SVM) to predict bearing degradation using time-frequency domain analysis of the vibration signal.
Our Proposed Solution: Broad OverviewData Monitoring unit
Collection of spindle health data using sensors(setup at Techsolve inc.)
Data Exploration for Feature Discovery
Training of Regression Models for Spindle Failure Prediction
Prognostic unit
~400 GB of Monitored Angular Acceleration Data
Data Monitoring Unit
• Testbed setup• Data Collection Details
Testbed setup
Fig. 1. Spindle Testbed at Techsolve
Poly-V beltThermocouple
Load sensor
Fig. 2. Bottom View of the Loading Mechanism
Sensor Type Manufacturer Type NumberAccelerometer IMI sensors 607A11 OneThermocouples Watlow J-type, Teflon
coatedThree
Current sensor Ohio Semitronics SCT 050CX5 One
Fig 3. Data Channels Diagram for Spindle Testbed
Table 1. Description of sensors used in the Spindle Testbed
Data Collection Details
Dataset Start Date FailureDate
Total Life Bearingsused
Size on disk
4 7 Sept2011
4 March 2012
180 Days NSK 170 GB
5 15 May 2012
21 Sept2012
129 Days BARDEN 155 GB
6 29 April 2013
2 August 2013
96 Days NSK 99 GB
• Spindle was run at approx. 9100 rpm.• Data collected as 1 minute samples recorded at least once every five minutes.
• Accelerometer samples data at a rate of 25,000 samples per second. • Data saved in a National Instruments format called ‘tdms’.• 3 artificially degraded (Datasets 1,2, and 3) and 3 Run to Failure experiments (Datasets 4, 5, 6) were conducted.
Table 2. Description of Spindle datasets used
Prognostic Unit
• Data Exploration for Feature Discovery• Training of Regression Models for Spindle Failure Prediction
Data Exploration for Feature Discovery
Primitive Frequency domain Features0.4 seconds worth of a data signature (Containing 10,000 values)
Level 1 Level 2 Level 3 Level 1 Level 2 Level 3
Fig 4. DFT for Spindle Acceleration during early life Fig 5. DFT for Spindle Acceleration just prior to failure
DFT
Primitive Features per signature
Level-1: Energy in the lowest third of the SpectrumLevel-2: Energy in the middle third of the SpectrumLevel-3: Energy in the highest third of the Spectrum
Fig 6. A Compressed Lifetime Historical Record of Primitive Energy Features in Dataset-4 [8].
Burst region:When the average of energy values in any one partition (low, medium, or high frequencies) of the spectrum remains consistently above a selected burst- threshold-energy for longer than a selected minimum burst-threshold-duration, then we consider that a burst of energy has been observed in the signatures.
Calm region:There are time periods during which energy in each of the three frequency zones is low. We call these periods as those of Calm operation.
Primitive Frequency domain FeaturesPrimitive Features
Level-1: Energy in the lowest third of the SpectrumLevel-2: Energy in the middle third of the SpectrumLevel-3: Energy in the highest third of the Spectrum
Lifetime Variation
High Level Aggregate Features
Aggregate features for 24/48/72 hour period windows of signatures
Primitive Features
Level-1: Energy in the lowest third of the SpectrumLevel-2: Energy in the middle third of the SpectrumLevel-3: Energy in the highest third of the Spectrum Aggregate Features per window
1. Average energy in low frequency zone2. Average energy in med. frequency zone3. Average energy in high frequency zone4. Max. energy in low frequency zone5. Max. energy in med. frequency zone6. Max. energy in high frequency zone7. No. of bursts in low frequency zone8. No. of bursts in med. frequency zone9. No. of bursts in high frequency zone10. Avg. duration of bursts in low frequency zone11.Avg. duration of bursts in med. Frequency zone12.Avg. duration of bursts in high frequency zone
Target Value to predict per windowPercentage time-to-failure at the end of each window
Feature Matrix: All Feature Vectors with 12 columns and 1 Target ValueFM_24, FM_48, FM_72
Training of Regression Models for Spindle Failure Prediction
Training ModelsTraining Phase 1:
Build :A.) Individual regressionmodels for each dataset(4, 5, and 6) and eachwindow size (24, 48 and72).
B.) Combined regressionmodel for all the datasetscombined together, one foreach window size (24, 48and 72).
Training Phase 2:
Improve Feature vectorselection by usingclustering to extract featurevectors corresponding tobursts.Ø Feature vector similaritycalculated usingcorrelation coefficient(cutoff= 0.5).
Ø MCL clustering algorithm[9] used.
Ø Get feature vectorscorresponding to ‘burst’clusters to construct anenhanced feature matrix.
Training Phase 3:
Retrain the combinedregression models usingthe enhanced Featurematrix corresponding toburst vectors from Trainingphase 2 .
• Individual Regression Models– Generate 3 Feature Matrices (FM) for each dataset, for each window size along with the target vectors:• 4-FM_24, 5-FM_24, 6-FM_24• 4-FM_48, 5-FM_48, 6-FM_48• 4-FM_72, 5-FM_72, 6-FM_72• Target vector containing ‘Time to Failure’ values.
• Combined Regression Model– Normalize the Feature Vectors for each dataset (for window size 72) and combine them to make a combined FM (C-FM_72).
Input for Regression models
Significance Study of Regression Models
categorized as belonging to either primarily bursty or primarilycalm regions. A total of three clusters were labeled as burstyclusters. Out of these three bursty clusters, only two clusterscontained feature vectors from all the three datasets. Thisconveys to us that this type of signatures are common amongall three types of datasets. The cluster that did not containfeature vectors from all the datasets signifies characteristicsnot shared by all the datasets. Based upon this intuition, thefeature vectors corresponding to the two bursty clusters werecombined in a matrix called Clus-FM 72. The training of theregression model was repeated using this short-selected featurematrix Clus-FM 72 as input. The significance results showthat this trained model has much better goodness of fit thanthe previous set of models. So, the conclusion from this is thatthe feature vectors that are close to the prototypes of the twoselected clusters are much better at predicting the remaininglife of a spindle.
V. EMPIRICAL EVALUATION OF THREE REGRESSIONMODELS
A. Significance Study of Regression Models.All the Regression analysis experiments were performed
using an excel plugin called NUMXL. We conducted signif-icance tests for the constructed models, including, goodnessof fit and F-test. Goodness of fit describes how well a modelfits the set of observations. Such measures are mostly usedto summarize the discrepancy between observed values andthe actual expected values under the model in question. F-Test is a statistical test that is most commonly used to test theapplicability of the regression model for the given problem. Ap-value cutoff of 5% was used to discard the null hypothesis.
We have used three standard measures for testing thegoodness of fit of the model, namely the Coefficient ofDetermination, R2, Adjusted R2, and Sum of Squared Error(SSE) [21]. We briefly introduce these quantities here. Letthe predicted values or fitted values by the model be yj andthe actual observations be yj . Let n be the total number ofobservations. Coefficient of determination, R2 measures theproportion of the total variations in the dependent variable y,that can be explained by the regression model and it is definedas
R2 = 1� SSE
SST.
Here, the total sum of squares (SST) measures the totalamount of variation in observed y values and is defined as
SST =nX
j=1
(yj � y)2.
The sum of squared residuals (SSE) measures the amountof variability that the regression model can not explain and isdefined as
SSE =nX
j=1
(yj � yj)2.
The Adjusted R2 is a modification of R2, that has beenadjusted based upon the residual degrees of freedom. It adjusts
for the number of explanatory terms in a model relativeto the number of data points since the R2 automaticallyand spuriously increases when extra explanatory variablesare added to the model. The smaller the difference betweenR2 and the adjusted R2, the better is the fit of the model.Mathematically,
AdjustedR2 = 1� SSE(n� 1)
SST (v).
where v is the residual degrees of freedom in the model.
B. Individual Dataset ModelsIn order to study the effect of changing the window size for
aggregating the features, a regression model was trained forthree different window sizes: 24 hours, 48 hours and 72 hours.Such models were trained for each of the three datasets 4, 5and 6. The regression statistics for these nine trained modelshave been summarized in table III. In this table, the modelshave been named as the dataset followed by the window size inhours (e.g. 4 72). For each of the dataset, similar trends couldbe seen as the window size was increased from 24 hours to 72hours. It can be seen that the R2 value increases on increasingthe window size from 24 hours to 72 hours and the SSE valuereduces. It follows the intuition that larger the window size, itbetter captures the aggregate statistics like avg. burst duration,number of bursts for disturbance periods which last for a longtime. For all the nine test cases in table III, it can be seen thatthe trained models are significant as per the F-test and the nullhypothesis can be rejected. This provides good evidence thatthe polynomial relationship between the aggregated featuresand the predicted time-to-failure values does exist for thechosen window sizes. The difference between the R2 and theadjusted R2 values for all the cases is quite small, furtherindicating that the models fit well.
TABLE IIIREGRESSION RESULTS USING INDIVIDUAL MODELS FOR DIFFERENT
WINDOW SIZES
Model Rsquare%
Adj. Rsquare%
Std.Error
Obs. F pvalue%
4 72 87.8 75 0.15 48 6.87 0.04 48 77.4 65.5 0.18 71 6.55 0.04 24 57.5 48.6 0.22 139 6.44 0.05 72 87.7 69.2 0.16 41 4.75 0.15 48 71.6 53.1 0.2 62 3.88 0.05 24 55.0 43.8 0.22 121 4.89 0.06 72 99.0 92.9 0.08 29 16.35 0.76 48 92.1 82.2 0.13 44 9.28 0.06 24 83.4 77.0 0.14 88 13.16 0.0
C. Feature Vector Clustering Results.Feature vectors in the combined feature matrix Comb-
FM 72 (for window size 72 hours and all datasets) wereclustered together to see if we could separate the featurevectors corresponding to the bursty region windows from thosecorresponding to the calm region windows for all the datasets.As described above, a graph clustering algorithm called MCL
categorized as belonging to either primarily bursty or primarilycalm regions. A total of three clusters were labeled as burstyclusters. Out of these three bursty clusters, only two clusterscontained feature vectors from all the three datasets. Thisconveys to us that this type of signatures are common amongall three types of datasets. The cluster that did not containfeature vectors from all the datasets signifies characteristicsnot shared by all the datasets. Based upon this intuition, thefeature vectors corresponding to the two bursty clusters werecombined in a matrix called Clus-FM 72. The training of theregression model was repeated using this short-selected featurematrix Clus-FM 72 as input. The significance results showthat this trained model has much better goodness of fit thanthe previous set of models. So, the conclusion from this is thatthe feature vectors that are close to the prototypes of the twoselected clusters are much better at predicting the remaininglife of a spindle.
V. EMPIRICAL EVALUATION OF THREE REGRESSIONMODELS
A. Significance Study of Regression Models.All the Regression analysis experiments were performed
using an excel plugin called NUMXL. We conducted signif-icance tests for the constructed models, including, goodnessof fit and F-test. Goodness of fit describes how well a modelfits the set of observations. Such measures are mostly usedto summarize the discrepancy between observed values andthe actual expected values under the model in question. F-Test is a statistical test that is most commonly used to test theapplicability of the regression model for the given problem. Ap-value cutoff of 5% was used to discard the null hypothesis.
We have used three standard measures for testing thegoodness of fit of the model, namely the Coefficient ofDetermination, R2, Adjusted R2, and Sum of Squared Error(SSE) [21]. We briefly introduce these quantities here. Letthe predicted values or fitted values by the model be yj andthe actual observations be yj . Let n be the total number ofobservations. Coefficient of determination, R2 measures theproportion of the total variations in the dependent variable y,that can be explained by the regression model and it is definedas
R2 = 1� SSE
SST.
Here, the total sum of squares (SST) measures the totalamount of variation in observed y values and is defined as
SST =nX
j=1
(yj � y)2.
The sum of squared residuals (SSE) measures the amountof variability that the regression model can not explain and isdefined as
SSE =nX
j=1
(yj � yj)2.
The Adjusted R2 is a modification of R2, that has beenadjusted based upon the residual degrees of freedom. It adjusts
for the number of explanatory terms in a model relativeto the number of data points since the R2 automaticallyand spuriously increases when extra explanatory variablesare added to the model. The smaller the difference betweenR2 and the adjusted R2, the better is the fit of the model.Mathematically,
AdjustedR2 = 1� SSE(n� 1)
SST (v).
where v is the residual degrees of freedom in the model.
B. Individual Dataset ModelsIn order to study the effect of changing the window size for
aggregating the features, a regression model was trained forthree different window sizes: 24 hours, 48 hours and 72 hours.Such models were trained for each of the three datasets 4, 5and 6. The regression statistics for these nine trained modelshave been summarized in table III. In this table, the modelshave been named as the dataset followed by the window size inhours (e.g. 4 72). For each of the dataset, similar trends couldbe seen as the window size was increased from 24 hours to 72hours. It can be seen that the R2 value increases on increasingthe window size from 24 hours to 72 hours and the SSE valuereduces. It follows the intuition that larger the window size, itbetter captures the aggregate statistics like avg. burst duration,number of bursts for disturbance periods which last for a longtime. For all the nine test cases in table III, it can be seen thatthe trained models are significant as per the F-test and the nullhypothesis can be rejected. This provides good evidence thatthe polynomial relationship between the aggregated featuresand the predicted time-to-failure values does exist for thechosen window sizes. The difference between the R2 and theadjusted R2 values for all the cases is quite small, furtherindicating that the models fit well.
TABLE IIIREGRESSION RESULTS USING INDIVIDUAL MODELS FOR DIFFERENT
WINDOW SIZES
Model Rsquare%
Adj. Rsquare%
Std.Error
Obs. F pvalue%
4 72 87.8 75 0.15 48 6.87 0.04 48 77.4 65.5 0.18 71 6.55 0.04 24 57.5 48.6 0.22 139 6.44 0.05 72 87.7 69.2 0.16 41 4.75 0.15 48 71.6 53.1 0.2 62 3.88 0.05 24 55.0 43.8 0.22 121 4.89 0.06 72 99.0 92.9 0.08 29 16.35 0.76 48 92.1 82.2 0.13 44 9.28 0.06 24 83.4 77.0 0.14 88 13.16 0.0
C. Feature Vector Clustering Results.Feature vectors in the combined feature matrix Comb-
FM 72 (for window size 72 hours and all datasets) wereclustered together to see if we could separate the featurevectors corresponding to the bursty region windows from thosecorresponding to the calm region windows for all the datasets.As described above, a graph clustering algorithm called MCL
categorized as belonging to either primarily bursty or primarilycalm regions. A total of three clusters were labeled as burstyclusters. Out of these three bursty clusters, only two clusterscontained feature vectors from all the three datasets. Thisconveys to us that this type of signatures are common amongall three types of datasets. The cluster that did not containfeature vectors from all the datasets signifies characteristicsnot shared by all the datasets. Based upon this intuition, thefeature vectors corresponding to the two bursty clusters werecombined in a matrix called Clus-FM 72. The training of theregression model was repeated using this short-selected featurematrix Clus-FM 72 as input. The significance results showthat this trained model has much better goodness of fit thanthe previous set of models. So, the conclusion from this is thatthe feature vectors that are close to the prototypes of the twoselected clusters are much better at predicting the remaininglife of a spindle.
V. EMPIRICAL EVALUATION OF THREE REGRESSIONMODELS
A. Significance Study of Regression Models.All the Regression analysis experiments were performed
using an excel plugin called NUMXL. We conducted signif-icance tests for the constructed models, including, goodnessof fit and F-test. Goodness of fit describes how well a modelfits the set of observations. Such measures are mostly usedto summarize the discrepancy between observed values andthe actual expected values under the model in question. F-Test is a statistical test that is most commonly used to test theapplicability of the regression model for the given problem. Ap-value cutoff of 5% was used to discard the null hypothesis.
We have used three standard measures for testing thegoodness of fit of the model, namely the Coefficient ofDetermination, R2, Adjusted R2, and Sum of Squared Error(SSE) [21]. We briefly introduce these quantities here. Letthe predicted values or fitted values by the model be yj andthe actual observations be yj . Let n be the total number ofobservations. Coefficient of determination, R2 measures theproportion of the total variations in the dependent variable y,that can be explained by the regression model and it is definedas
R2 = 1� SSE
SST.
Here, the total sum of squares (SST) measures the totalamount of variation in observed y values and is defined as
SST =nX
j=1
(yj � y)2.
The sum of squared residuals (SSE) measures the amountof variability that the regression model can not explain and isdefined as
SSE =nX
j=1
(yj � yj)2.
The Adjusted R2 is a modification of R2, that has beenadjusted based upon the residual degrees of freedom. It adjusts
for the number of explanatory terms in a model relativeto the number of data points since the R2 automaticallyand spuriously increases when extra explanatory variablesare added to the model. The smaller the difference betweenR2 and the adjusted R2, the better is the fit of the model.Mathematically,
AdjustedR2 = 1� SSE(n� 1)
SST (v).
where v is the residual degrees of freedom in the model.
B. Individual Dataset ModelsIn order to study the effect of changing the window size for
aggregating the features, a regression model was trained forthree different window sizes: 24 hours, 48 hours and 72 hours.Such models were trained for each of the three datasets 4, 5and 6. The regression statistics for these nine trained modelshave been summarized in table III. In this table, the modelshave been named as the dataset followed by the window size inhours (e.g. 4 72). For each of the dataset, similar trends couldbe seen as the window size was increased from 24 hours to 72hours. It can be seen that the R2 value increases on increasingthe window size from 24 hours to 72 hours and the SSE valuereduces. It follows the intuition that larger the window size, itbetter captures the aggregate statistics like avg. burst duration,number of bursts for disturbance periods which last for a longtime. For all the nine test cases in table III, it can be seen thatthe trained models are significant as per the F-test and the nullhypothesis can be rejected. This provides good evidence thatthe polynomial relationship between the aggregated featuresand the predicted time-to-failure values does exist for thechosen window sizes. The difference between the R2 and theadjusted R2 values for all the cases is quite small, furtherindicating that the models fit well.
TABLE IIIREGRESSION RESULTS USING INDIVIDUAL MODELS FOR DIFFERENT
WINDOW SIZES
Model Rsquare%
Adj. Rsquare%
Std.Error
Obs. F pvalue%
4 72 87.8 75 0.15 48 6.87 0.04 48 77.4 65.5 0.18 71 6.55 0.04 24 57.5 48.6 0.22 139 6.44 0.05 72 87.7 69.2 0.16 41 4.75 0.15 48 71.6 53.1 0.2 62 3.88 0.05 24 55.0 43.8 0.22 121 4.89 0.06 72 99.0 92.9 0.08 29 16.35 0.76 48 92.1 82.2 0.13 44 9.28 0.06 24 83.4 77.0 0.14 88 13.16 0.0
C. Feature Vector Clustering Results.Feature vectors in the combined feature matrix Comb-
FM 72 (for window size 72 hours and all datasets) wereclustered together to see if we could separate the featurevectors corresponding to the bursty region windows from thosecorresponding to the calm region windows for all the datasets.As described above, a graph clustering algorithm called MCL
Coefficient of Determination (𝑅")
Total Sum of Squares (SST)
Sum of squared residuals (SSE)categorized as belonging to either primarily bursty or primarilycalm regions. A total of three clusters were labeled as burstyclusters. Out of these three bursty clusters, only two clusterscontained feature vectors from all the three datasets. Thisconveys to us that this type of signatures are common amongall three types of datasets. The cluster that did not containfeature vectors from all the datasets signifies characteristicsnot shared by all the datasets. Based upon this intuition, thefeature vectors corresponding to the two bursty clusters werecombined in a matrix called Clus-FM 72. The training of theregression model was repeated using this short-selected featurematrix Clus-FM 72 as input. The significance results showthat this trained model has much better goodness of fit thanthe previous set of models. So, the conclusion from this is thatthe feature vectors that are close to the prototypes of the twoselected clusters are much better at predicting the remaininglife of a spindle.
V. EMPIRICAL EVALUATION OF THREE REGRESSIONMODELS
A. Significance Study of Regression Models.All the Regression analysis experiments were performed
using an excel plugin called NUMXL. We conducted signif-icance tests for the constructed models, including, goodnessof fit and F-test. Goodness of fit describes how well a modelfits the set of observations. Such measures are mostly usedto summarize the discrepancy between observed values andthe actual expected values under the model in question. F-Test is a statistical test that is most commonly used to test theapplicability of the regression model for the given problem. Ap-value cutoff of 5% was used to discard the null hypothesis.
We have used three standard measures for testing thegoodness of fit of the model, namely the Coefficient ofDetermination, R2, Adjusted R2, and Sum of Squared Error(SSE) [21]. We briefly introduce these quantities here. Letthe predicted values or fitted values by the model be yj andthe actual observations be yj . Let n be the total number ofobservations. Coefficient of determination, R2 measures theproportion of the total variations in the dependent variable y,that can be explained by the regression model and it is definedas
R2 = 1� SSE
SST.
Here, the total sum of squares (SST) measures the totalamount of variation in observed y values and is defined as
SST =nX
j=1
(yj � y)2.
The sum of squared residuals (SSE) measures the amountof variability that the regression model can not explain and isdefined as
SSE =nX
j=1
(yj � yj)2.
The Adjusted R2 is a modification of R2, that has beenadjusted based upon the residual degrees of freedom. It adjusts
for the number of explanatory terms in a model relativeto the number of data points since the R2 automaticallyand spuriously increases when extra explanatory variablesare added to the model. The smaller the difference betweenR2 and the adjusted R2, the better is the fit of the model.Mathematically,
AdjustedR2 = 1� SSE(n� 1)
SST (v).
where v is the residual degrees of freedom in the model.
B. Individual Dataset ModelsIn order to study the effect of changing the window size for
aggregating the features, a regression model was trained forthree different window sizes: 24 hours, 48 hours and 72 hours.Such models were trained for each of the three datasets 4, 5and 6. The regression statistics for these nine trained modelshave been summarized in table III. In this table, the modelshave been named as the dataset followed by the window size inhours (e.g. 4 72). For each of the dataset, similar trends couldbe seen as the window size was increased from 24 hours to 72hours. It can be seen that the R2 value increases on increasingthe window size from 24 hours to 72 hours and the SSE valuereduces. It follows the intuition that larger the window size, itbetter captures the aggregate statistics like avg. burst duration,number of bursts for disturbance periods which last for a longtime. For all the nine test cases in table III, it can be seen thatthe trained models are significant as per the F-test and the nullhypothesis can be rejected. This provides good evidence thatthe polynomial relationship between the aggregated featuresand the predicted time-to-failure values does exist for thechosen window sizes. The difference between the R2 and theadjusted R2 values for all the cases is quite small, furtherindicating that the models fit well.
TABLE IIIREGRESSION RESULTS USING INDIVIDUAL MODELS FOR DIFFERENT
WINDOW SIZES
Model Rsquare%
Adj. Rsquare%
Std.Error
Obs. F pvalue%
4 72 87.8 75 0.15 48 6.87 0.04 48 77.4 65.5 0.18 71 6.55 0.04 24 57.5 48.6 0.22 139 6.44 0.05 72 87.7 69.2 0.16 41 4.75 0.15 48 71.6 53.1 0.2 62 3.88 0.05 24 55.0 43.8 0.22 121 4.89 0.06 72 99.0 92.9 0.08 29 16.35 0.76 48 92.1 82.2 0.13 44 9.28 0.06 24 83.4 77.0 0.14 88 13.16 0.0
C. Feature Vector Clustering Results.Feature vectors in the combined feature matrix Comb-
FM 72 (for window size 72 hours and all datasets) wereclustered together to see if we could separate the featurevectors corresponding to the bursty region windows from thosecorresponding to the calm region windows for all the datasets.As described above, a graph clustering algorithm called MCL
Adjusted 𝑅"
Goodness of FitF-test
• A statistical test used to test the applicability of the regression model for the given problem.
• A p-value cutoff of 5%was used to discard the null hypothesis.
Individual Models: Significance Results
categorized as belonging to either primarily bursty or primarilycalm regions. A total of three clusters were labeled as burstyclusters. Out of these three bursty clusters, only two clusterscontained feature vectors from all the three datasets. Thisconveys to us that this type of signatures are common amongall three types of datasets. The cluster that did not containfeature vectors from all the datasets signifies characteristicsnot shared by all the datasets. Based upon this intuition, thefeature vectors corresponding to the two bursty clusters werecombined in a matrix called Clus-FM 72. The training of theregression model was repeated using this short-selected featurematrix Clus-FM 72 as input. The significance results showthat this trained model has much better goodness of fit thanthe previous set of models. So, the conclusion from this is thatthe feature vectors that are close to the prototypes of the twoselected clusters are much better at predicting the remaininglife of a spindle.
V. EMPIRICAL EVALUATION OF THREE REGRESSIONMODELS
A. Significance Study of Regression Models.All the Regression analysis experiments were performed
using an excel plugin called NUMXL. We conducted signif-icance tests for the constructed models, including, goodnessof fit and F-test. Goodness of fit describes how well a modelfits the set of observations. Such measures are mostly usedto summarize the discrepancy between observed values andthe actual expected values under the model in question. F-Test is a statistical test that is most commonly used to test theapplicability of the regression model for the given problem. Ap-value cutoff of 5% was used to discard the null hypothesis.
We have used three standard measures for testing thegoodness of fit of the model, namely the Coefficient ofDetermination, R2, Adjusted R2, and Sum of Squared Error(SSE) [21]. We briefly introduce these quantities here. Letthe predicted values or fitted values by the model be yj andthe actual observations be yj . Let n be the total number ofobservations. Coefficient of determination, R2 measures theproportion of the total variations in the dependent variable y,that can be explained by the regression model and it is definedas
R2 = 1� SSE
SST.
Here, the total sum of squares (SST) measures the totalamount of variation in observed y values and is defined as
SST =nX
j=1
(yj � y)2.
The sum of squared residuals (SSE) measures the amountof variability that the regression model can not explain and isdefined as
SSE =nX
j=1
(yj � yj)2.
The Adjusted R2 is a modification of R2, that has beenadjusted based upon the residual degrees of freedom. It adjusts
for the number of explanatory terms in a model relativeto the number of data points since the R2 automaticallyand spuriously increases when extra explanatory variablesare added to the model. The smaller the difference betweenR2 and the adjusted R2, the better is the fit of the model.Mathematically,
AdjustedR2 = 1� SSE(n� 1)
SST (v).
where v is the residual degrees of freedom in the model.
B. Individual Dataset ModelsIn order to study the effect of changing the window size for
aggregating the features, a regression model was trained forthree different window sizes: 24 hours, 48 hours and 72 hours.Such models were trained for each of the three datasets 4, 5and 6. The regression statistics for these nine trained modelshave been summarized in table III. In this table, the modelshave been named as the dataset followed by the window size inhours (e.g. 4 72). For each of the dataset, similar trends couldbe seen as the window size was increased from 24 hours to 72hours. It can be seen that the R2 value increases on increasingthe window size from 24 hours to 72 hours and the SSE valuereduces. It follows the intuition that larger the window size, itbetter captures the aggregate statistics like avg. burst duration,number of bursts for disturbance periods which last for a longtime. For all the nine test cases in table III, it can be seen thatthe trained models are significant as per the F-test and the nullhypothesis can be rejected. This provides good evidence thatthe polynomial relationship between the aggregated featuresand the predicted time-to-failure values does exist for thechosen window sizes. The difference between the R2 and theadjusted R2 values for all the cases is quite small, furtherindicating that the models fit well.
TABLE IIIREGRESSION RESULTS USING INDIVIDUAL MODELS FOR DIFFERENT
WINDOW SIZES
Model Rsquare%
Adj. Rsquare%
Std.Error
Obs. F pvalue%
4 72 87.8 75 0.15 48 6.87 0.04 48 77.4 65.5 0.18 71 6.55 0.04 24 57.5 48.6 0.22 139 6.44 0.05 72 87.7 69.2 0.16 41 4.75 0.15 48 71.6 53.1 0.2 62 3.88 0.05 24 55.0 43.8 0.22 121 4.89 0.06 72 99.0 92.9 0.08 29 16.35 0.76 48 92.1 82.2 0.13 44 9.28 0.06 24 83.4 77.0 0.14 88 13.16 0.0
C. Feature Vector Clustering Results.Feature vectors in the combined feature matrix Comb-
FM 72 (for window size 72 hours and all datasets) wereclustered together to see if we could separate the featurevectors corresponding to the bursty region windows from thosecorresponding to the calm region windows for all the datasets.As described above, a graph clustering algorithm called MCL
Table 3. Regression Results using individual models for different window sizes
• Trained models are significant as per the F-test.
• The difference between 𝑅"and Adjusted 𝑅" is quite small, so we can infer that models fit well.
𝑅"increases with the window size
was used for this purpose. The results for MCL clusteringwere obtained by using its implementation the ClusterMakerplugin [14] available in a graph analysis tool called Cytoscape[?]. MCL has a single parameter called inflation, which canbe used to tune the granularity of the clusters. It was set tobe = 3 in our analysis. Out of a total 119 feature vectors,MCL assigned 110 feature vectors into clusters of size >=3, resulting in a total of six clusters. In order to characterizeeach cluster as belonging to the bursty region or calm region,we define a measure called Cluster Burst Duration (CBD).Before describing CBD, we define what we mean by TotalBurst Duration or TBD for a feature vector.
Total Burst Duration (TBD:) For a feature vector, let thetotal burst duration in each frequency zone be b1, b2 andb3. These quantities are computed for each feature vector bymultiplying its avg. burst duration with the number of burstsin a frequency zone. Depending upon the dataset a featurevector belonged to, these burst durations were normalizedaccording to the maximum burst durations that occurred indifferent zones in that complete dataset. Let the normalizedburst durations in the three frequency zones be nb1, nb2 andnb3. Based upon these values, the measure TBD for a featurevector was defined as follows.
TBD = (nb1 + nb2 + nb3).
The above defined value of TBD was calculated for eachfeature vector belonging to a cluster. A high value of TBD fora feature vector implies a large amount of burst activity takingplace within the window corresponding to that feature vector.Based upon these TBD values, the quantity CBD is definedas follows.
Cluster Burst Duration (CBD): The median of TBD valuesfor all the feature vectors belonging to a cluster is computedto be the Cluster Burst Duration or CBD for that cluster.
A non-zero CBD value for a cluster implies that more thanhalf of the time windows belonging to that cluster contain alarge amount of energy burst activity within them. A CBDvalue of zero implies that more than half of the time windowsin that cluster had no energy burst activity. Therefore, thisvalue of CBD can be used as a measure to characterize acluster as belonging to primarily burst windows or primarilycalm windows.
A summary of CBD values obtained for all the six clustersreturned by the MCL algorithm is shown in table IV. Basedupon the CBD values for each cluster, it can be said that thetop three clusters can be characterized as belonging to thebursty time windows, whereas, the last three clusters can becharacterized as belonging to the calm time windows. Thistable also contains, for each cluster, the number of featurevectors coming from each of the three datasets. Focusingattention only on the bursty clusters(1, 2 and 3), it can be seenthat clusters 1 and 3 contain feature vectors belonging to eachof the three datasets, whereas cluster 2 has feature vectors onlybelonging to datasets 4 and 6. Based upon this observation,the attention was further narrowed down to study clusters 1and 3 in further detail. This decision was made because this
would help in the identification of the characteristics of thebursts that occur in all the three datasets. This in turn, helpsin designing a more generic prognostic framework.
The feature vectors belonging to clusters 1 and 3 have beenpresented in Figures 7 and 8. In these Figures, the featurevectors belonging to different datasets have been labeled andcolored differently. The width of each node has been drawnproportionately to the TBD value for that feature vector.Further, each node label also includes the actual percentagetime-to-failure value for the window to which the node belongsto. For example Figure 7 shown the composition of cluster1. It contains feature vectors from dataset 5 with 0.0% time-to-failure, feature vectors from dataset 4 with a time-to-failureof 26.8%, and feature vectors from dataset 6 with a time-to-failure of 23.5%. These feature vectors in cluster 1 are stronglycorrelated with each other.
TABLE IVCLUSTERS OBTAINED USING MCL CLUSTERING ON ALL THE FEATURE
VECTORS IN C-FM 72
Cluster#
#Nodes
#4 #5 #6 CBD
1 3 1 1 1 3002 10 8 0 3 53.43 47 6 23 8 11.834 31 20 1 10 05 11 6 0 5 06 8 0 6 2 0
On further evaluation of cluster 1, it is found that the threefeature vectors (5 0.0, 4 26.8 and 6 23.5 ) in this clustercorresponded to the peak values of TBD for their respectivedatasets! From this, it can be concluded that the peaks ofenergy bursts belonging to different datasets do share somesimilar features. Further, it also strengthens our hypothesisthat the choice of aggregated features used in the regressionmodel does help in grouping similar feature vectors together.This also reveals the fact, that in the life time of a spindle,the peak burst occurs, not necessarily towards the very endof the spindle lifetime. Even if a peak burst is followed byrelatively calmer regions, as was the case for datasets 4 and 6,the spindle may not have much remaining life left after sucha peak burst occurs.
D. Regression Model Using Combined Feature VectorsIn the preceding discussion it was observed that out of all
window sizes, the regression models trained for the windowsize of 72 hours were the most significant for all the threedatasets. We denote this class of regression models trainedon individual datasets as indv Model. However, in order tomake the regression model more generic, it was decidedto retrain the regression model using the combined featurematrix Comb-FM 72. Let this regression model be called ascomb Model. Further, using the feature vectors from clusters1 and 3 of the clustering output were combined together in afeature matrix called clus-FM 72. The regression model wasretrained using this feature matrix clus-FM 72 as input. Letthis regression model be called as clus Model. R2 values were
Cluster Burst Duration(c) (CBD): Median of FBD values for all feature vectors inside a cluster c.
Feature Burst Duration(f) (FBD): Sum of normalized burst durations in the three frequency zones for a feature vector f.
Clustering Goal: To separate feature vectors belonging to burstyregions from the calm regions.Results: Six clusters of size >=3 obtained for 119 Feature vectors.Measures to assign cluster label as ‘Bursty’ vs. ‘Calm’:
Combined Model: Clustering Analysis
Table 5. Clusters obtained using MCL clustering on the feature matrix C-FM_72 window sizes
Fig 7. Cluster # 1
Fig 8. Cluster # 3
Comparison of results for three types of models (for window size 72)
• Individual models for each dataset (4,5,6) for window size 72(indv-model).
• Combined model for window size 72 (comb-model).• Model obtained using only burst vectors for window size 72 (clus-model).
All models: Dataset 4
All models: Dataset 5
All models: Dataset 6
Deployment of different Regression models
Fig. 7. Cluster #1 obtained after clustering of C-FM 72.
Fig. 8. Cluster #3 obtained after clustering of C-FM 72.
computed for each dataset to evaluate the predictions madeusing comb Model and clus Model. As discussed earlier, tableIII lists the values of R2 obtained for all datasets on theirrespective indv Model. In table V, we compare the R2 valuesobtained for each dataset for their indv Model with the valuesobtained using comb Model and clus Model. Further, for eachof the three datasets, plots comparing the actual value of time-to-failure (TTF) and the corresponding predicted values usingall the three trained regression models are shown figures 9,10 and 11.
As is evident from these plots, the regression models trainedon the individual feature matrices 4-FM 72, 5-FM 72 and 6-FM 72 obtain the highest value of R2. When the regressionis performed on the combined matrix Comb-FM 72, the R2
value goes down for each of the dataset. This was expectedas the matrix Comb-FM 72 contains feature vectors from allthe three datasets. The MCL clustering of feature vectorsbelonging to Comb-FM 72 helped in identifying the featurevectors corresponding to bursts from all the datasets whichhad high similarity. When the regression model is retrainedusing only the feature matrix corresponding to the burstclusters (Clus-FM 72), the R2 value for predicting improvesconsiderably. This improvement is significant for datasets 4and 6, whereas not that much for dataset 5. Also, for each of
Fig. 9. Actual and Predicted TTF values for dataset 4 for window size of 72hours.
Fig. 10. Actual and Predicted TTF values for dataset 5 for window size of72 hours.
the dataset, it can be seen that the clus Model predicts thepeak bursts with very good accuracy.
TABLE VR2 VALUES FOR THE REGULAR, COMBINED, AND CLUSTERED
REGRESSION MODELS FOR WINDOW SIZE 72.
Dataset#
indv TTF R2
%comb TTF R2
%clus TTF R2
%4 87.8 75.0 96.85 87.7 39.0 52.06 99.0 35.0 90.2
VI. SIGNIFICANCE AND IMPACT.
In the manufacturing industry, spindle is a key component ofthe machine tool. Any breakdown occurring in the spindle canlead to a failure in the machine function until a replacementis installed. Therefore, any attempt to predict its time- to-failure even a few weeks before its actual failure can be ofgreat help. Several model-based and data-driven solutions have
Table 6. 𝑅"values for the individual, combined, and clustered Regression Models for window size 72.
Conclusion• Using the huge amount of CBM data for predicting the remaining useful life of spindle can be of great help to the manufacturing industry.
• We proposed a data-driven prognostic methodology to exploit large quantities of CBM data for developing regression based spindle failure models.
• We deployed these models to predict the remaining useful life of spindle for three datasets generated by a testbed setup at Techsolve inc.
Future Work
• The regression and clustering based methodology presented in the paper can be seen as a proof of concept.
• By performing more test bed experiments, our proposed methodology can be used to build a generic prediction framework for predicting the remaining useful life of spindle.
References[1] James A Harter, Comparison of contemporary fcg life prediction tools, International Journal of Fatigue 21 (1999), S181–S185. [2] Sean Marble and Brogan P Morton, Predicting the remaining life of propulsion system bearings, Aerospace Conference, 2006 IEEE, IEEE, 2006, pp. 8–pp. [3] Jihong Yan, Chaozhong Guo, and Xing Wang, A dynamic multi-scale markov model based methodology for remaining life prediction, Me- chanical Systems and Signal Processing 25 (2011), no. 4, 1364–1376.[4] Hasan Ocak, Kenneth A Loparo, and Fred M Discenzo, Online tracking of bearing wear using wavelet packet decomposition and probabilistic modeling: A method for bearing prognostics, Journal of sound and vibration 302 (2007), no. 4, 951–961.[5] Nagi Z Gebraeel, Mark Lawley, et al., A neural network degradation model for computing and updating residual life distributions, Automa- tion Science and Engineering, IEEE Transactions on 5 (2008), no. 1, 154–163. [6] Runqing Huang, Lifeng Xi, Xinglin Li, C Richard Liu, Hai Qiu, and Jay Lee, Residual life predictions for ball bearings based on self-organizing map and back propagation neural network methods, Mechanical Systems and Signal Processing 21 (2007), no. 1, 193–207. [7] Shaojiang Dong, Shirong Yin, Baoping Tang, Lili Chen, and Tianhong Luo, Bearing degradation process prediction based on the support vector machine and markov model, Shock and Vibration 2014 (2014). [8] Divya Sardana, Raj Bhatnagar, Radu Pavel, and Jon Iverson, Investi- gations on spindle bearings health prognostics using a data mining approach, Proceedings of the Society for machinery Failure Prevention Technology (MFPT), 2014, 2014. [9] Stijn Van Dongen, Graph clustering via a discrete uncoupling process, SIAM Journal on Matrix Analysis and Applications 30 (2008), no. 1, 121–141.
Thank you!