Simple Metrics for Evaluating and Conveying Prognostic Model Performance To Users With Varied Backgrounds

Michael Eric Sharp
Submission Type: 
Full Paper
phmc_13_081.pdf446.88 KBSeptember 16, 2013 - 12:15pm

The need for standardized methods for comparison and evaluation of new models and algorithms has been known for nearly as long as there has been models and algorithms to evaluate. Conveying the results of these comparative algorithms to people not intimately familiar with the methods and systems can also present many challenges as nomenclature and relative representative values may vary from case to case. Many predictive models rely primarily on the minimization of simplistic error calculation techniques such as the Mean Squared Error (MSE) for their performance evaluation. This, however, may not provide the total necessary information when the criticality, or importance of a model’s predictions changes over time. Such is the case with prognostic models; predictions early in life can have relatively larger errors with lower impact on the operations of a system than a similar error near the end of life. For example, an error of 10 hours in the prediction of Remaining Useful Life (RUL) when the predicted value is 1000 hours is far less significant than when the predicted value is 25 hours. This temporality of prognostic predictions in relation to the query unit’s lifetime means that any evaluation metrics should capture and reflect this evolution of importance.

This work briefly explores some of the existing metrics and algorithms for evaluation of prognostic models, and then offers a series of alternative metrics that provide clear and intuitive measures that fully represent the quality of the model performance on a scale that is independent of the application. This provides a method for relating performance to users and evaluators with a wide range of backgrounds and expertise without the need for specific knowledge of the system in question, helping to aid in collaboration and cross-field use of prognostic methodologies. Four primary evaluation metrics can be used to capture information regarding both timely precision and accuracy for any series or set of prognostic predictions of RUL. These metrics, the Weighted Error Bias, the Weighted Prediction Spread, the Confidence Interval Coverage, and the Confidence Convergence Horizon are all detailed in this work and are designed such that they can easily be combined into a single representative “score” of the overall performance of a prediction set and by extension, the prognostic model that produced it. Designed to be separately informative or used as a group, this set of performance evaluation metrics can be used to quickly compare different prognostic prediction sets not only for the same corresponding query set, but just as simply from differing query data sets by scaling all predictions and metrics to relative values based on the individual query cases.

Publication Year: 
Publication Volume: 
Publication Control Number: 
Page Count: 
Submission Topic Areas: 
Verification and validation
Submitted by: 

follow us

PHM Society on Facebook Follow PHM Society on Twitter PHM Society on LinkedIn PHM Society RSS News Feed