Mine Planning and Equipment Selection 2002

Modeling condition and performance of mining equipment

Tad S. Golosinski and Hui Hu

Department of Mining Engineering, University of Missouri-Rolla, Rolla, MO 65409-0450, USA

ABSTRACT: The paper follows on the earlier MPES publication that suggested use of data mining techniques to model operation of mining equipment. It reports on the new developments that concentrated on modeling of performance and condition of mining trucks based on the analysis of digital data collected in the field by truck vital sign information and management system. The models developed as the result of this work allow for projection of truck condition and performance into future with reasonably high accuracy. As such they allow for better control of mining operation and are expected to find numerous applications in mines worldwide.

Mine Planning and Equipment Selection 2002

1 INTRODUCTION

Modern mining equipment is equipped with numerous sensors that monitor its condition and performance. Data collected by these sensors is used to alert the operator to existence of abnormal operating conditions and to perform emergency shutdown if the pre-set upper or lower limits of the monitoring parameters are reached. This data is also used for post-failure diagnostics and for reporting and analysis of equipment performance. Availability of this voluminous data, together with availability of sophisticated data processing methods and tools, allow for extraction of additional information contained in the data. One method that may permit this is data mining (Golosinski, 2001 and Golosinski and Hu, 2001).

The research presented in this paper investigates use of the data collected from various sensors installed on the mining truck for construction of a truck model, which allows for reliable prediction of both the truck performance and its condition into the future. Subject to research was data collected by a variety of sensors installed on off-highway mining trucks that together constitute the VIMS (Vital Information Monitoring System) system of Caterpillar (Caterpillar 1999). The data mining tool was the IBM Intelligent Miner for Data (IBM 2000).

2 DATA DESCRIPTION

The data used in this research consists of snapshot (event recorder) and datalogger records, each containing values of 70 truck parameters measured over a period of time. The data was collected from 6 Caterpillar 789B trucks during their operation in a surface mine.

The snapshot stores a segment of truck history that contains values of all 70 monitored parameters recorded during the period of six minutes. Each parameter value recorded once per second. The snapshot recording is triggered by one of a set of predefined events, usually occurrence of an abnormal situation indicated by a critical value of a monitored parameter. A snapshot record describes truck conditions from five minutes before the event to one minute after the event (Caterpillar 1999). In this paper, every snapshot record is called “event” for simplicity.

Unlike snapshot, the data logger records values of all truck parameters that are monitored by VIMS over varying periods of time, also at one-second intervals (Caterpillar 1999). The recording is triggered and stopped manually, with individual records covering periods of up to 30 minutes of truck operation. Datalogger records do not have to be associated with any events.

Of the 70 truck parameters monitored in the field, values of 26 were recorded as categorical and the remaining 44 as numeric values. The examples of basic statistical description of both the categorical and numerical parameter values are presented in Tables 1 and 2. Previous research was confined to analysis of numerical data only (Hu and Golosinski, 2002). The approach present in this paper analyzes both types of data the actual values being statistical parameters of recorded values defined for one to three minute time intervals. The statistical parameters include:

Table 1. Example of categorical parameter values

Parameter Name / Modal Value / Modal Frequency (%)
ACTUAL_GEAR_352 / Neutral / 41.55
AFTRCLR_LVL_137 / OK / 98.95
BODY_LVR_727 / Not Moving / 95.4
BODY_POS_726 / Down / 93.7

Table 2. Example of numerical parameter values

Parameter Name / Minimum
Value / Maximum
Value / Mean
Value / Standard
Deviation
AFTRCLR_TEMP_110 / 0 / 95 / 41.8 / 12.8
AMB_AIR_TEMP_791 / 0 / 38.5 / 21.9 / 7.0
ATMOS_PRES_790 / 0 / 93 / 89.4 / 9.1
BOOST_PRES_105 / 0 / 164 / 31.0 / 50.1

Figure 1 illustrates prediction of one VIMS event, “high engine speed”. To predict occurrence of this event statistical data is defined for each three minute interval of VIMS records. If one set of three-minute data has similar statistical characteristics as do the first three minutes of the “high engine speed” snapshot the probability exists that that high engine speed will be reported after another two minutes of truck operation.

Figure 1. VIMS event prediction model

As shown in Figure 1, the one-minute model can predict events that will occur within the next 4 minutes of truck operation. Similarly the two-minute model can provide predictions extending of event to occur within the following 3 minutes. The three-minute model can only predict events that will occur within the following two minutes.

3 model DESIGN

3.1. Objective

Modeling was intended to evaluate and quantify the pattern of changes in parameter values as associated with various events. As the sensors installed on the truck activate the snapshot recorder when the predefined limit of a parameter is reached, the objective was to identify any patterns in parameter values that may allow for early failure recognition. These patterns were then used for prediction of future events by building a decision tree classification model of the truck. The model was to predict an occurrence of a selected event based on the pattern of changes in values of other parameters.

The “high engine speed” events that were most numerous in analyzed data were chosen to be the main targets of analysis. In addition data collected during normal operation was selected for comparative analysis and assigned a name “other”. The “engine speed” is defined as the actual rotational speed of the crankshaft. For the modeled truck this event is activated when the engine speed reaches 2250 rpm and deactivated when the speed drops to 1900 rpm.

3.2. Data Mining Tools

IBM Intelligent Miner software package was used as the data mining tool. The basic algorithm used was SPRINT, a modified CART (Classification and Regression Tree) algorithm. It was chosen in preference to the neural network classification algorithm as it is easier to interpret and understand, thus facilitating easy analysis of the truck failure pattern (IBM 1999).

The workings of SPRINT are similar to that of most popular decision tree algorithms, such as C4.5 (Quinlan, 1993); the major distinction is that SPRINT induces strictly binary trees and uses re-sampling techniques for error estimation and tree pruning, while C4.5 partitions according to attribute values (Jang and Sun, 1997). The GINI index is used to measure the misclassification for the point split by SPRINT algorithm. For a data set S containing examples from n classes, the gini(s) is defined as shown in Eq.(1) where pj is the relative frequency of class j in S. If a split divides s into two subsets s1 and s2, the index of the divided data ginisplit(s) is given by Eq.(2). The advantage of this approach is that the index calculation requires only the knowledge of distribution of the class values in each of the partitions (Breiman and Friedman, 1984).

(1)

(2)

The tree accuracy is estimated by testing the classifier on the subsequent cases whose correct classification has been observed (Quinlan, 1993). The v-fold cross-validation technique estimates the tree error rate. This estimation of error rate is used to prune the tree and choose the best classifier. More detail about this algorithm can be found elsewhere (Shafer, 1996).

3.3. Modeling Procedures

The two main procedures of data mining are training called also model construction, and testing called also model validation. In training mode, the function builds a model based on the selected input data. This model is later used as a classifier. In test mode, the function uses a set of data to verify that the model created in the training mode produces results with satisfactory precision.

In this work all available data was split into two parts. Bulk of the data, 86.4%, was used for model training. The remainder, 13.6% of available data, was used for model testing. The test data includes dataset #1 (random selection) and dataset #2 (whole snapshot and datalogger).

After three models are built based on the one, two, and three minute statistical data sets, the error rate was defined and used for evaluating the performance of the training and the testing processes.

4 RESULTS AND DISCUSSION

The model built on three-minute statistical data has less than 5% training error rate, 19% error rate on test #1, and 14% error rate on test #2. The model shows better performance on unseen VIMS event prediction than one- and two- minute models (Table 3). However, the tradeoff is that this model can only provide two-minute early prediction with three classes, “Eng1”, “Eng2” and “Other”.

Representative three-minute model output is shown in figures 1 to3 for training data set, for test #1 data and test #2 data as the confusion matrix. A confusion matrix for the pruned tree shows the distribution of the misclassifications. In every matrix, the number on the diagonal is the correct classification; others are the number of misclassification.

Table 3. Model Performance Comparison

Figure2. Training: three-minute statistical model

Figure 3. Test#1: three-minute statistical model

Figure 4. Test#2: three-minute statistical model

4.1. Model Performance Analysis

After the VIMS data was aggregated into statistical data at the three-minute interval, the number of rows is dramatically reduced and the data mining process is much faster than mining second data. Error rates on test data (unseen events) are reduced to 19% and 14% (figure 2, 3) for test #1 and test #2 respectively, which means the model is more robust than using statistical data at one and two minute interval.

In addition, the error rate, as well as the related error rate of mean and standard deviation, is calculated for every class. As per figure 5 and 6, the three-minute model presents the best prediction performance with 3% and 21% average error rates and 3% and 26% standard deviation errors for both the training and tests datasets.

Figure 5. Comparison of average error rate

Figure 6. Error rate: comparison of standard Deviations

Table 4. Error rate statistics.

While the three-minute model has low average prediction error rate, the high error rate of standard deviation for test data sets makes the prediction unstable. As an example in three-minute model 14% error rate of “Other” event prediction (Table 4) implies the 14% probability of the false alarm indicating too high engine speed. The class “Eng1” as defined by the two-minute model with 50% error rate (Table 4) might imply 50% probability of the false high engine speed alarm. Thus the related model is rather unreliable and can not be used for prediction of events taking place.

4.2. Three-Minute Decision Tree Classification Model

This approach resulted in development of more knowledgeable decision tree (i.e. simpler one) that can be presented as a binary decision tree (figure 7). Each interior node of the binary decision tree tests an attribute of a record. If the attribute value satisfies the test, the record is sent down the left branch of the node. If the attribute value does not meet the requirements, the record is sent down the right branch of the node. Three classes are marked with different colors at upper left corner. The solid circles are the decision nodes. The binary decision tree consists of the root node on top, followed by non-leaf nodes and leaf nodes. Branches connect a node to two other nodes. Root and non-leaf nodes are represented as pie charts. Leaf nodes are represented as rectangles.

Figure 7. Decision Tree Structure or Three-Minute Statistical Data

In this decision tree (figure 7) the root node, named “ENG_SPD_MAX”, classifies nearly all “Eng2” events into right leaf (69 out of 70). These are displayed as the yellow rectangle. This reflects the fact that the “high engine speed” event is activated and recorded when the engine speed reaches the predefined limit at the second three-minute of the snapshot. The rule for this classification is:

If (ENG_SPD_MAX>=2184.25)

then class=ENG

The activation value of “high engine speed” event defined by VIMS is 2250 rpm. This differs from the value determined by the decision tree model 2184.25 re misclassified as “Eng2” events.

The rest of events are further classified using more complex rules. One of these rules classifies the “Eng1” event as follows (the circled leaf in figure 7):

If (ENG_SPD_MAX<2184.25)

and (TRBO_IN_PRES_MIN<87.75)

and (ENG_SPD_REGR_SYY<44991960)

and (RTR_LTR_SUSPCYL_REGR_INTERCEPT<-1688.9)

and (GEAR_SELECT_RANGE>=100.5)

then class=ENG1

Of 15 “Eng1” events that were analyzed this rule has classified 14 events correctly, with only one misclassification. As such it allows for prediction of the event in question two minutes before it occurs.

5 CONCLUSIONs

This approach compresses the information into statistical table and provides the prediction with certain accuracy. It also gives the possibility to predict the event for two minutes earlier. However, the prediction accuracy needs further improvement and results need verified by more test data. The possible approach to improve the prediction accuracy is to add more statistical parameters and use more VIMS data.

REFERENCES

Breiman, L. and Friedman, J. 1984, Classification and regression tree . Wadsworth International Group.

Caterpillar, Inc. 1999, Vital Information Management System (VIMS): system operation testing and adjusting. Company publication.