Fuzzy Database for Heart Disease Diagnosis

Rehana Parvin Dr. Abdolreza Abhari

Department of Computer Science Department of Computer Science

Ryerson University Ryerson University

Toronto, Ontario M5B2K3, Canada Toronto, Ontario M5B2K3, Canada

Keyword: fuzzy database, inaccurate data, membership function, linguistic representation, SQL.

Abstract

Using a Database is a well known method for storing information. In regular database systems, sometimes because of existence of huge data it is not possible to fulfill the user's criteria and to provide them with the exact the information that they needto make a decision. In this paper, a medical fuzzy database is introduced in order to help users in providing accurate information when there is inaccuracy in database. Inaccuracy in data represents imprecise or vaguevalues (like the words use in human conversation) or uncertainty in using the available information required for decision making. In this paper, a fuzzy database management system is introduced to diagnose the severity of the heart disease of a patient by using existing data in the common database systems.

1. Introduction

Information plays an important role in this modern civilization to step forward in every sphere from earth to sky, earth to planet, earth to ocean. People are discovering amazing inventions at a rapid pace based on information. Therefore, it is necessary that accurate information be obtained.

Data sizes are becoming large day by day. So, it is a challenge to deal with these large amounts of data stored in databases. Regular database is based on Boolean logic, which means that the information is either completely true or completely false. It is impossible to deal with imprecise information in classical database management systems. It is important to discover convenient ways to store and manage human perception based data which is often vague and uncertain in regular database system.

There are huge data management tools available within health care systems, but analysis tools are not sufficient to discover hidden relationships amongst the data. Most of the medical information is vague, imprecise and uncertain. Extracting correct information from this data is considered an art. It can be said to be an art because it is complicated by many factors and its solution involves literally all of a human's abilities including intuition and subconscious.

Medical diagnosis is a complicated task that requires operating accurately and efficiently. According to the World Health Organization, 12 million deaths occur each year due to heart diseases. It is the primary reason behind deaths in adults. In the United States, 50% of deaths occur due to cardio vascular disease. In fact, one person dies every 34 seconds due to cardiovascular diseases in the United States. Similarly, in other developed countries heart disease is one of the main reasons behind adult death. In order to decrease the mortality rate of cardiovascular disease, it is necessary for the disease to be diagnosed at an early stage.

Having fuzzy data management capability in a database is important to be able to store vague data. Ignoring vague data management means the risk of losing important information, which may be useful for some applications. A database that supports vague, imprecise and uncertain information is called a fuzzy database. It is based on fuzzy logic and fuzzy set theory which is introduced by Zadeh [1].

This paper is organized as follows: Section 2 describes related work, Section 3 talks about Expert's Opinion regarding the riskfactors of HeartDisease and fuzzy sets, Section 4 deals with the methodology and proposed system which includes Algorithm of the system, Membership functions, Input and Output of the system, SQL (Structured Query Language), several snap shots of the system, System testing and forms, lastly, Section 5 will conclude and discuss future outcomes.

2. Related Works

There are many papers published related to the diagnosis of various diseases.

Shrandhanjali [2] developed a Fuzzy Petri net application for Heart Disease Diagnosis. The rule based is associated with transition for certainty factors. The fuzzy Petri net is drowning for the rule base and get decision of the disease, truth value proposition is used.

Lavanya et al. [3] designed a Fuzzy rule based inference system for detection and diagnosis of lung cancer. The dataset is used from domain expert with symptoms, stages and treatment facilities to provide an efficient and easy method to diagnose lung cancer.

Adeli et al. [4] proposed a Fuzzy expert system for the Heart Disease Diagnosis. The developed system uses fuzzy logic. In their system the crisp value is fuzzified to get fuzzy values. The expert system uses those fuzzy values and the output is also fuzzy. The fuzzy output is defuzzified to get a crisp output.

Sony et al.[5] designedan Intelligent and Effective Heart Disease Prediction System using weighted associative classifiers. They used Java as front end and Ms Access as backend tool. They only consider two cases for prediction (Heart disease and No Heart Disease).

Neshat et al. [6] developeda Fuzzy Expert Systemfor diagnosis of Liver Disorder. They considered two cases, people with healthy liver, and people with unhealthy liver along with calibration of disease risk intensity measure. The fuzzy inference system is developed in MATLAB software.

Kadhim et al. [7] implemented a fuzzy expert system which was to diagnosis the back pain. The rules were developed by experts and decision sequence is illustrated by a decision tree.

All the works explained above are the developed systems that use Fuzzy Logic to diagnose disease.Most of them is based on developing a fuzzy inference engine by using MATLAB.

The difference between our work and theses works is we build our system based on the available data existing in the database. Therefore, by focusing on database concepts we developed a system that uses SQL on common database management systems such as Microsoft Access.

3. Expert's Opinion regarding the risk factors of Heart Disease and Fuzzy sets of the system

Here a very brief explanation of Fuzzy theory in medical application is presented. A crisp property P can be defined µ: x→ {0, 1} whereas a fuzzy property can be described by µ: x → [0, 1]. µ(x) indicates the degree to which x has the property. To clarify fuzzy logic in medical science, asimple example described in [8 and 9] is explained below.

For example: in case of fever, the linguistic variable is high fever. If the body temperature(x) is greater than 39 degree Centigrade then in medical concept µ(x) for high fever is 1 it means x has surely 'High Fever'. If x is less than 38.5 degree centigrade then µ(x) of medical concept ' High Fever' is 0, which means x surely does not have 'High Fever'. If x is in the interval [38.5 degree Centigrade, 39 degree Centigrade] then x has a property of high fever with some degree (i.e., membership function) between [0, 1].

In our proposed system, the inputs have fuzzy sets according to the range that they fall in. Then, to achieve more accurate result the inputs are connected with the membership function multiplied by output membership function to generate the rules strength.

When we take the union of all rules strength then we have 44 rules. According to the input the rule strength are generated and we take maximum from the union fuzzy set to generate the Heart Disease Conditionas desired output.

According to the American Heart Association [10], the major risk factors of heart disease are those that significantly increase the risk of heart and blood vessel (cardio vascular). The more the risk factors one is exposed to, the greater the chance of developing a heart disease. Some of the risk factors such as increasing age and gender are related to birth and cannot be changed. People at 65 or older die due to heart disease comparison to other ages. Men have a greater risk of developing a heart disease compared to women. However, older women have the same risk similar to men. Cholesterol level is also affected by age and gender.

Blood sugar is another contributing factor that increases the risk of developing heart disease. If blood sugar is not controlled then a high risk remains. Blood pressure is another risk factor which increases the heart's workload and causes the heart to be thickened and stiffened. This stiffening of the heart decreases normal function. Stress, inactive lifestyle, smoking, alcohol, diet and nutrition are other risk factors that influence heart disease in the long run.

Therefore, in our system input variables are: Age, Gender, Chest Pain, Cholesterol, Blood Pressure, Blood Sugar, Heart Beat, Electrocardiography (ECG), Exercise, Old Peak and Thallium Scan. The Output is the Heart Disease Condition shown as the linguistic terms.

4. METHODOLOGY AND PROPOSED SYSTEM In this section, first the algorithm and methodology are explained and then the snapshots of the system functions and testing are illustrated. Here

4.1 Algorithm

I. 11 inputs: Age, Gender, ChestPain, Cholesterol, BloodPressure, BloodSugar, HeartBeat, ECG, Exercise, OldPeak, ThalliumScan, Category of Heart Disease.

II.Output: Heart Disease Condition shown as the linguistic terms.

III. Each input has fuzzy variables

IV. Each fuzzy Variable is associated with membership Function.

V. The membership function is calculated for each fuzzy variable

VI. The rules strength is calculated based on the membership function of the fuzzy variable.

VIII. Heart Disease condition is calculated by taking the maximum of the selected output set as the final result.

4.2 Membership Function, Input and Output

For the input variable Cholesterol, we use Low Density Lipoprotein (LDL). However, it is also possible to use High density Lipoprotein (HDL). In case of Blood Pressure, Systolic Blood Pressure is used.

Membership function is important for each fuzzy variable. Also rules strength is calculated based on the membership function.

i. Age: This input consists of four fuzzy sets i.e. Linguistic variable (Young, Mid, Old, Very old). Each Linguistic variable has membership functions associated with them. The range of the fuzzy sets for age is shown in Table 1.

Table 1Age

Input Field / Range / Linguistic Representation
Age / <38
33-45
40- 58
52> / Young
Mid
Old
Very Old

Figure 1 Membership function for Age

ii. Chest Pain: This input field has four Chest Pain types: Typical Angina, Atypical Angina, NonAngina, and Asymptomatic. One Patient can have only one type of Chest Pain at a time. To represent ChestPain, 1= Typical Angina, 2 = Atypical Angina, 3= Non Angina and 4 = Asymptomatic.

iii. Cholesterol: This input field influences the result much more comparing to other input fields. Cholesterol can be Low Density Lipoprotein (LDL) and High density Lipoprotein (HDL). In our system, we only consider LDL. However, it is possible to consider HDL instead of LDL. We use only one type at a time. This field has four fuzzy sets.

Each fuzzy variable is associated with membership function. The range of the fuzzy sets for Cholesterol is given in Table 2.

Table2 Cholesterol

Input field / Range / Linguistic Representation
Cholesterol / < 197
188-250
217-307
281> / Low
Medium
High
Very High

iv. Gender: This input Field has two representations (Male and Female). 1 represents male and 0 indicates female.

v. Blood Pressure: Another important risk factor is Blood Pressure. It can be Systolic, Diastolic and Mean types. In our system, we consider Systolic Blood Pressure. It is possible to choose any type of Blood Pressure. This field has four fuzzy sets. The ranges for the Linguistic variable representation are given in Table 3. The membership function is calculated based on the range.

Table 3 Blood Pressure

Input field / Range / Linguistic Representation
Blood Pressure / < 134
127- 153
142-172
154> / Low
Medium
High
Very High

Figure 2 Membershipfunction for Blood Pressure

vi. Heart rate: This field has three fuzzy sets. Each Linguistic representation is associated with membership function. The ranges for each linguistic representation are given in Table 4.

Table 4 Linguistic Representation for Heart rate

Input Field / Range / Fuzzy sets
Heart rate / < 141
111-194
152> / Low
Medium
High

vii. Blood Sugar: This field plays an important role in changing the results. It has two linguistic representations. Each fuzz variable is associated with membership function based on the range. The ranges of fuzzy sets are given in Table 5.

Table 5 Blood Sugar

Input Field / Range / Linguistic Representation
Blood Sugar / >=120
<120 / Yes(1)
No(0)

viii. Electrocardiography (ECG): ECG is measured by several waves in a graph paper such as T wave, Q wave, P wave, S wave of electric in pulse of Heart muscle. Normally the waves stay upper bound of the graph. If any of the wave goes down then it is thought as abnormality. In order to develop our system, we assume that S wave and T wave go down to represent the abnormality and this abnormality is named as ST_Tabnormal in our designed system. This input field has three fuzzy sets: Normal, ST_Tabnormal and Hypertrophy. The ranges for fuzzy sets representation are given in Table 6.

Table 6 ECG

Input Field / Range / Fuzzy sets
Electrocardiography(ECG) / <0.4
0.4 - 1.8
1.8> / Normal
ST_T abnormal
Hypertrophy

Figure 3 Membership function for ECG

ix. Exercise: This field indicates whether the patient need exercise test. This input field has two fuzzy sets representations. If the Patient requires an Exercise test then Value 1 is entered and if the patient does not need Exercise test value zero is entered. The linguistic representations are "Yes" for 1 and "No" for 0. Each fuzzy set has membership function associated with that.

x. Old Peak: This field means ST depression induced by exercise relative to test. The meaning of ST depression is related to the ECG field. It means previously the patient's T wave and S wave in the ECG graph paper were down. Old Peak is necessary to assure the present condition of T wave and S wave of the ECG. It has three fuzzy sets representation. Each fuzzy variable is associated with membership function. The range for the fuzzy sets is given in Table 7.

Table 7 Old Peak

Input Field / Range / Fuzzy sets
Old Peak / <2
1.5 - 4.2
2.5> / Low
Risk
Terrible

xi. Thallium Scan: Thallium scan is the redistribution of heart image. This input field has three linguistic representations: Normal, Reversible Defect and Fixed Defect. It depends on the hours that a heart image appears on the screen of the Gamma camera. This Gamma camera is able to detect radioactive dye in the body. To develop our system we assume that the linguistic representation of thallium scan in the Normal, the heart image appears within 3 hours, in fixed Defect heart image appears within 6 hours and in the Reversible Defect the heart image appears within 7 hours. The linguistic representation for Thallium scan is given in Table 8.

Table 8 Thallium Scan

Input Field / Range / Fuzzy sets
3
6
7 / Normal
Fixed Defect
Reversible Defect

Output: The output is the presence of Heart disease valued from 0(no presence i.e. Healthy condition) to 4. If the integer value increases then the heart disease risk increases. We divide the Output fuzzy sets {Healthy, Mild, Moderate, Severe, VerySevere}.The ranges and membership function for output variable are given below:

Table 9 Output

Output Field / Range / Fuzzy sets
Result / < =1
0-2
1- 3
2-4
3> / Healthy
Mild
Moderate
Severe
VerySevere

Figure 4Membership function for Output

4.3 SQL (Structured Query Language)

In our developed system, all the inputs are connected together through SQL and thus we reach our desired output. We build one Patient table with all inputs and then connect the inputs together by SQL to get the desired output. One SQL from our designed system is given below:

SELECT Patient. Age, IIf([Age]<=29,1,IIf([Age] Between 29 And 38,Format((38-[Age])/9,"#.00"),0)) AS YoungMF, IIf([Age]=38,1,IIf([Age] Between 33 And 38,Format(([Age]-33)/5,"#.00"),IIf([Age] Between 38 And 45,Format((45-[Age])/7,"#.00"),0))) AS MidMF, IIf([Age]=48,1,IIf([Age] Between 40 And 48,Format(([Age]-40)/8,"#.00"),IIf([Age] Between 48 And 58,Format((58-[Age])/10,"#.00"),0))) AS OldMF, IIf([Age]>=60,1,IIf([Age] Between 52 And 60,Format(([Age]-52)/8,"#.00"),0)) AS VeryOldMF;

The above query is designed to represent the membership value of age into fuzzy variable such as Young, Mid, Old, Very Old.

The following figure indicates the overall summary of our system:

Figure 5Developed Fuzzy Expert Database Systems

4.4 Snapshots of the System

Some snap shots for the developed system are shown below

Figure 6 Input field Age with membership function

4.5 System Testing

We have tested our developed system with the following values and also for the Cleveland Clinic Foundation's processed Heart dataset. For more details please see[11]

Figure 7Diagnostic of Heart Disease with risk factors

Figure 8 The Input Form for the Fuzzy Diagnostic of Heart Disease with all the risk factors

5. Conclusion and Future work