A Neural Network Approach to ECG Classification

ECE 539

Final Project

12/20/2001

Joe Krachey

Table of Contents

Introduction

Background Information

Project Work

MLP

KNN

Discussion

Conclusion

Introduction

Cardiologists examine electrocardiograms (ECGs) as a means of detecting heart disease and other cardiac related abnormalities. This diagnosis is a subjective evaluation of the patient’s condition. Cardiologists determine if there are abnormalities by visually looking at an ECG and determining if there are signs of a problem. Due to the fact that everyone has a unique ECG, this makes the job of diagnosis difficult. Cardiologists are left to make decisions based on experience.

The purpose of this project is to construct a neural network that would classify ECG signal into 2 distinct classes. The neural network could then act as a pre-screener for cardiologists or be used as a method of diagnosis outside of a hospital environment. If an ECG were analyzed by this neural network and detected an abnormality, this may help the cardiologist to make a more accurate diagnosis. It could also be used in conjunction with present monitoring equipment to inform nurses or other hospital staff as to the current condition of the patient.

Background Information

ECGs are a measurement of the electrical activity of the heart.

“ECGs are an electrical signal measured at the surface of the body. Mechanical changes in the heart lead to changes in how the electrical excitation spread over the heart and therefore change the body surface ECG. Cardiology is based on the recording of the ECGs of thousands of patients over many years and observing the relationships between various waveforms in the signal and different abnormalities. Thus clinical electrocardiology is largely empirical, based mostly on experiential knowledge. A cardiologist learns the meanings of various parts of the ECG signal from experts who have learned from other experts[1]”

Due to the fact that cardiology is based on experimental knowledge, there is often inconsistency in diagnosing a patient’s condition from one cardiologist to the next. Prof. Tompkins estimates that there is only about 70% likelihood that two cardiologists will make the same diagnosis. It is also the case that a cardiologist may make a different diagnosis if he were to look at the same ECG one week later. For this reason, it would be advantageous to develop a computer algorithm could take some of the ambiguities out of diagnosis.

The ECG is then broken down into several parts: P, Q, R, S, and T waves. These are labeled in Figure 1.

Figure 1:ECG Components

Analyzing the ECG allows doctors to classify abnormalities into different categories. Two categories of abnormalities will be analyzed in this project: R on T ventricular premature beat and ventricular escape beats. An R on T ventricular premature beat occurs when a premature R beat starts within the preceding T wave. A ventricular escape beat occurs when the sinus node fails to pace or to conduct to the ventricles, the ventricles may "escape" by picking up a rhythm from an alternative pacemaker site.

Surehka Palreddy, Ph. D. University of Wisconsin, classified ECGs in two categories: R on T ventricular premature beat/ventricular escape beats, while all other beats were placed in a second category. To do this, Palreddy developed C code to extract 9 features from ECGs provided by MIT/BIH (Massachusetts Institute of Technology and Beth Israel Hospital) database[2].

The following is a description of each of those 9 features:

FeatureRR0: R-R interval between the previous beat and the one before it

FeatureRR1: R-R interval between the current beat and the previous beat

FeatureRR2: R-R between the current beat and the next beat

FeatureRR3: Ratio of RR0/RR1

Figure 2: Features 1-3

FeatureRR4: The correlation of the previous beat and the current beat

FeatureRR5: The correlation of the current beat and the next beat.

Figure 3:Cross Correlation Formula

FeatureRR6: Percentate of the ECG signal above .2 of a predetermined threshold.

FeatureRR7: Percentage of the ECG signal above .6 of a predetermined threshold.

FeatureRR8: Percentage of the ECG signal above .8 of a predetermined threshold.

The information extracted from this C program was found at:

http://www.cae.wisc.edu/~ece539/data/ecg/

Project Work

MLP

ECG classification is important in determining the health of an individual. Cardiologists are trained to examine an ECG and make a diagnosis. In some situations though, there is not always a trained cardiologist on hand. Airports, amusement parks, and shopping malls are just a few of the places where computers are used to diagnosis a person’s cardiac condition if a life threatening condition occurs. This project attempts to mimic a cardiologist’s diagnosis by implementing a neural network diagnosis algorithm. The scope of this project does not cover all the possible cardiac abnormalities. Instead, classifies an ECG into one of the two beat classifications listed above.

Having 9 feature vectors already extracted from the MIT/BIH database, the next step was to develop a neural network to classify each of the samples. The network tested was a multilayer perceptron network with back propagation. This neural network was selected due to the high dimensionality of the input space. Developing the MLP was aided by using Prof. Hu’s bp.m program.

The following table summarizes the constants used to determine the ‘optimal’ MLP. These constants were determined by several preliminary runs and then used for all successive MLP models.

Alpha / 0.1
Momentum / 0.1
Input Scaling / [-5 5]
Output Scaling / [.2 .8]
Number of Epochs / 700
Epochs before convergence test. / 10
Exit after # of Epochs with no improvement / 15
Type 1 Training Samples / 179
Type 2 Training Samples / 81
Type1 Testing Samples / 110
Type 2 Testing Samples / 91

Figure 4: MLP information

In order to get the highest classification rates, several networks were analyzed. The following is a table summarizing the classification rates of various configurations. The classification rate reported is the averaged value of 15 trials of each configuration.

Configuration / Classification Rate
9-1-1 / 87.5954
9-2-1 / 93.3665
9-3-1 / 91.7081
9-4-1 / 94.6932
9-5-1 / 91.7745
9-6-1 / 94.4610
9-7-1 / 83.0846
9-8-1 / 82.8192
9-9-1 / 78.5406
9-1-1-1 / 67.0978
9-2-1-1 / 75.1575
9-3-2-1 / 94.0580
9-4-2-1 / 82.8192
9-5-3-1 / 94.1625
9-6-3-1 / 90.2156
9-7-4-1 / 88.8889
9-8-4-1 / 92.6401
9-9-5-1 / 90.3151
9-1-1-1-1 / 64.9088
9-2-1-1-1 / 75.4561
9-3-2-1-1 / 73.6318
9-4-2-1-1 / 78.0763
9-5-3-1-1 / 87.8275
9-6-3-2-1 / 81.7247
9-7-4-2-1 / 80.1658
9-8-4-2-1 / 85.5390
9-9-5-3-1 / 90.6136

Figure 5:MLP Configuration and Classification Results

For the configurations tested, the best configuration turns out to be 9-6-1 at a classification rate of 94.46%.

This was the classification rate of only a small sample of the entire database though. At this point, the best configuration above(9-6-1) was given all the data points in the file. This amounted to classifying all 72,000+ data samples. The following was the output:

Type2 / Type1
6036 / 576
1062 / 66508

Table 1:Entire File Classification Matrix

Classification Rate = 97.7919

The next phase of the project was to determine how relevant each of the features was in determining the classification of ECG beats with respect to type 1 and type 2. To accomplish this, one feature vector was removed and the original training and testing data was run on a configuration of 8-6-1. The following are the test classification results for each feature vector removed.

Feature
Removed / Classification Rate
F8 / 93.3333
F7 / 88.2388
F6 / 72.4577
F5 / 91.1841
F4 / 87.5423
F3 / 90.6468
F2 / 84.1393
F1 / 81.5124
F0 / 87.8010
Original / 94.46

Table 2: Feature Removed vs Classification Rate

KNN

The second neural network that was applied was the Kth nearest neighbor algorithm. Knndemo.m, which was provided by Prof. Hu , was modified to handle the data being used for this project.

The KNN algorithm was first run using the same training and testing sets as were used in the MLP development.

Type 1 Training Samples / 179
Type 2 Training Samples / 81
Type1 Testing Samples / 110
Type 2 Testing Samples / 91

Table 3: KNN Data Information

The results were very good. It classified both Type1 and Type2 data with 100% accuracy for the 1st nearest neighbor.

Type2 / Type1
91 / 0
0 / 110

Table 4:KNN Results for Sampled Data( 1st nearest Neighbor)

Type2 / Type1
91 / 0
0 / 110

Table 5:KNN Results for Sampled Data ( 2nd nearest Neighbor)

Type2 / Type1
90 / 1
0 / 110

Table 6: KNN Results for Sampled Data ( 3rd nearest Neighbor)

From this point, the entire data file was run using KNN. The results for the 1st nearest neighbor are as follows:

Type2 / Type1
6074 / 538
3294 / 64276

Table 7:KNN Results for Entire Data Set (1st Nearest Neighbor)

Type2 / Type1
6210 / 402
5623 / 61947

Table 8: KNN Results for Entire Data Set (2nd Nearest Neighbor)

Type2 / Type1
5800 / 812
1925 / 65645

Table 9: KNN Results for Entire Data Set (3rd Nearest Neighbor)

Discussion

The first topic that needs to be discussed is the data that was sampled to train and test the MLP. The first time data was sampled, 2000 data samples were used as training data and 500 were used as testing data. Analysis of this data was performed and a classification rate of over 97% was achieved. Upon further investigation, this classification rate was found to be useless. The training data used was chosen randomly from all 72,000 samples. As a result, almost all of the training data was of Type1 due to the fact Type1 vastly out numbers Type2. As a result, the MLP classified everything as Type1. The reason that the classification rate was so high was that the testing data also have very few instances of Type2 data to classify. For 500 testing samples, say only 20 were of Type2. If all the data sets were classified as Type1, the classification rate would be 480/500 or 96%. This is therefore misleading due to the fact that the MLP really didn’t learn to do anything other than say everything was Type1 data.

Examining the feature database, it is clear to see that there is significantly less Type2 (escape beat, R-on-T) than Type1 beats(approximately 95%). In order to train the network properly, 30 % of the training samples were of Type2 and the remaining 70 percent would be of Type1. This distribution was selected to maintain the natural majority of Type1 beats while providing enough Type2 beats to train a robust neural network. In the end, a much smaller data set was used for training, but it certainly was a more meaningful training set. The testing set was constructed to have approximately the same number of Type1 and Type2 data points (110 vs. 91). This would lead to a more meaningful classification rate. After making such adjustments, a classification rate of 94% was achieved for the testing data.

Once the ‘optimal’ MLP was found, it was then tested on the entire data set to see how robust it was. The overall classification rate was 97.7%. This classification rate is somewhat deceiving though because there are so many more instances of Type1. A better indication of the classification rate is to look at how well the MLP classified only Type2. This classification rate is 91.32%. ( 6036/(6036+576) ) It is felt that this is a satisfactory classification rate.

The second stage of the project was to determine if all 9 of the features are actually relevant to the classification process. The features dealing with R-R intervals seemed applicable due to the nature of the abnormalities that were being classified. The features dealing with the correlation also seemed to be pertinent. The features in question were F6-F8. It was unclear from Palreddy’s paper how the features regarding the thresholds were determined. I also could not find the code used to generate the data set because the files specified in his paper were not readable. There was no indication if he filtered the signals for noise or baseline drift.

Both of these situations could have affected the features dealing with threshold. For this reason, it was tested to see how relevant they were. As it turned out, Feature 9 (80% above threshold) seems to have very little affect on the classification rate. The classification rate was reduced by 1%. Removing Features 0-3 had effects that I would expect. The classification went down an average of 8.5% by removing any of the 4. The surprising results came by removing feature F6. It reduced the classification rate by 22%! This shows that a threshold measurement is indeed important to determining abnormalities in ECGs. I would guess that much more testing would be needed to determine exactly which features are and which features are not necessary to classify ECGs. In all there are 512 different combinations of the 9 feature vectors that would need to be tested. Removing a feature vector may indicate that it is not relevant, but it could also mean that an entirely different MLP configuration would need to be developed.

Further examination of the MLP results showed that building the MLP was very volatile. Upon examining the best configuration (9-6-1), I found that the overall classification rate could vary by quite a bit. In some situations, there would be a 100% classification rate for the smaller testing set. If the process was repeated, the same configuration could lead to only a 62% classification rate of the smaller data. This classification rate was only exacerbated when the MLP was run on all the data. The classification rate of Type2 data was almost 0%.

For this reason, a new type of neural network was tested. The KNN algorithm seemed to be much more stable in the classification of the ECG data. The KNN algorithm had a 100% classification rate for the smaller set of test data (1st nearest neighbor). When analyzing the results of the entire data set, one can see that the Type2 classification rate is 91.86%. This is slightly higher than the Type2 classification rate of the MLP network. On the other hand, one can notice that the Type1 classification rate is much lower. Almost 3 times as many Type1 data points are mislabeled as Type2.

Conclusion

The analysis of the data given shows us that there is indeed the possibility of developing a MLP or KNN neural network to diagnose abnormalities in ECG signals. The thing to remember is that we are classifying only two out of at least 20 different types of ECG arrhythmias. In order to have a marketable product, the neural network must be able to diagnose the variety of arrhythmias that a cardiologist would. It would also have to achieve a high classification rate so that it would pass the various government regulations. In order to make such a full-bodied neural network, several steps would need to be taken. First, the algorithm the Palreddy used to derive the feature vectors would have to be analyzed to see if it was applicable to all types of arrhythmias. It is possible that even more features would be needed to accommodate an increased number of arrhythmia types. Second, a more knowledgeable weighting structure would need to be implemented. Arrhythmias such as tachycardia are far more dangers than PVCs. In such a scenario, it would be better to have a neural network with a lower overall classification rate as long as it classified tachycardia with the utmost accuracy. This means that there may be more false positives ( classifying a healthy ECG as tachycardia), but it would minimize or eliminate the number of false negative ( classifying a tachycardia as normal.)

The classification rates of both MLP and KNN algorithms are very similar for the given set of data. The KNN algorithm seems to have lower variance from simulation to simulation, but the MLP is more versatile and adaptable. It is had to day which neural network would prove superior if the feature space becomes higher with more arrhythmias to classify. Dealing only with my limited experience and the results found in this project, I would say that an MLP might prove to provide superior results.

[1] Tompkins, Willis. Biomedical Digital Signal Processing. 1993 Prentice-Hall

[2] Palreddy, Surehka. ECG BEATS DATABASE DESCRIPTION 1996