International Journal of Science, Engineering and Technology Research (IJSETR)

Volume 1, Issue 1, May 2014

[(]

Applying Case-Based Reasoning for Medical Diagnosis System

Myint Myint Moe, Aung Myint Aye

Abstract— In medical field, it is very difficult to get correct diagnosis because there are many possible diseases in each case. Case-based reasoning (CBR) has become a successful technique for developing medical diagnosis systems. CBR is a problem solving method which can operate through a process of remembering one or a small set of cases and basing decisions on comparisons between the new and old situations. So, this system intends to design and implement a diagnosis system for diseases using case-based reasoning approach, based on the symptoms of the patients. A medical diagnosis system is implemented by using CBR methodology and the technical aspects. In the case-based reasoning system, old cases are retrieved to solve user input problems or new cases. To be a complete CBR system, case adaption process is used to revise and retain the new case for future use when no match case is found in the case-base. This system intends to design and implement a diagnosis system for Heart disease, Dengue Hemorrhagic Fever disease, and Kidney disease are by using case-based reasoning approach, based on the symptoms of the patients. This system is trained 316 records in the case-base and 32 symptoms are used to input the new case. In this system, Manhattan distance measure is used for case retrieval. If the result of input case is not satisfied, the case adaptation process will be done. Decision Tree algorithm is used in case adaption process. Moreover, cross validation method is used to calculate the performance of the system. This system is implemented by using C# programming language.

Index Terms— Case Adaptation, Case-Based Reasoning, Cross Validation, Medical Diagnosis System, Problem Solving.

I. INTRODUCTION

Today, people usually use experience that was successful in solving previous, similar problems to solve problems in real world. It is used in many areas like medical diagnosis, pattern recognition and planning. In CBR, experiences are modeled into a different form as concrete problems with their solutions (cases). Case-Based Reasoning is a popular methodology for problem solving which can generate the solution of a new problem by using the solution of previous similar problem from the knowledge base (case base). Case-based reasoning systems were applied to identify previous situations that match aspects of a current problem and to provide guidance on how to solve problems and make decisions.

In many different fields, case-based reasoning (CBR) are used to create numerous applications in a wide range of domains including financial analysis, medical diagnosis, design, classification of objects, help desk and decision support system [1].

Case-based reasoning (CBR) method solves problems by comparing a problem situation (a case) to previously experienced ones. When a new problem is found in a case-based system, it first retrieves cases with similar problem descriptions from the case-base. The solutions in case-base are used to display a solution for the new problem. If the new problem is different from cases in the case-base, it may be necessary to adapt the proposed solution. In addition, the answer of the new problem is satisfied and the problem solution of the new case is retained in the case-base to review the new problem and its solution. In this system, if the user input the symptoms (attributes) of disease that user suffer, the system will output the class of disease name. So, this system is applied for developing the medical field.

This system is proposed to solve problems within medical diagnosis system and to investigate the integration of algorithms of classification and case-based reasoning, to produce the correct result of new case by using Manhattan distance measure method. Decision Tree algorithm is used to improve the quality of decision. The successful case solution is retained for future assistance to estimate the performance of the proposed system by using cross validation method. Medical diagnosis system is implemented by using CBR approach

II. related work

S. Rainer, M. Stefania and B. Riccardo [2] presented the results of the Case-Based Reasoning for Medical Knowledge-based Systems. While in many domains Cased-Based Reasoning (CBR) has become a successful technique for knowledge-based systems, in the medical field attempts to apply the complete CBR cycle are rather exceptional. Some systems have recently been developed, which on the one hand use only parts of the CBR method, mainly the retrieval, and on the other hand enrich the method by a generalisation step to fill the knowledge gap between the specificity of single cases and general rules. And some systems rely on integrating CBR and other problem solving methodologies. This system presented the appropriateness of CBR for medical knowledge-based systems.

A. Klaus-Dieter and B. Ralph [3] described that every year Russia has more intoxication cases than any other country in Europe. They described an approach for developing knowledge-based medical decision support systems based on the rather new technology of case-based reasoning. Mario Lenz [4] presented Artificial Intelligence that has been applied in numerous applications in the health science domain. Case-Based Reasoning (CBR) appeared as an interesting alternative for building medical AI applications, and has since been further established in the field. Certainly, one of the intuitively attractive features of CBR in medicine is that the concept of patient and disease lends itself naturally to a case representation.

III.  Background theory

Technological aspects of case-based reasoning (CBR) are applied for developing the medical diagnosis system. In this system, Manhattan distance measure method is used to calculate the distance value between the new case and old case in the case-base in case retrieval process. In case revising process, Decision Tree algorithm is used to generate the decision rules for the new case.

A.  Case-based Reasoning

A case-based reasoner solves new problems by adapting solutions to older problems. Therefore, CBR involves reasoning: retaining a memory of previous problems and their solutions and solving new problems by reference to that knowledge. Generally, a case-based reasoned will be presented with a problem, either by a user or by a program or system. The case-based reasoner then searches its memory of past cases (called the case base) and attempts to find a case that has the same problem specification as the case under analysis. If the reasoner cannot find an identical case in its case base, it will attempt to find a case or multiple cases that most closely match the current case.

In situations where a previous identical case is retrieved, assuming that its solution was successful, it can be offered as a solution to the current problem. In the more likely situation that the case retrieved is not identical to the current case, an adaptation phase occurs. During adaptation, differences between the current and retrieved cases are first identified and then the solution associated with the case retrieved is modified, taking these differences into account. The solution returned in response to the current problem specification may then be tried in the appropriate domain setting

Figure 1. Two major components of a CBR system

The problem-solving life cycle in a CBR system consists of the following four stages.

CBR Life Cycle:The problem-solving life cycle in a CBR system consists of the following four stages :

•  Retrieving: similar previously experienced cases (e.g., problem- solution- outcome triples) whose problem is judged to be similar

•  Reusing: the cases by copying or integrating the solutions from the cases retrieved

•  Revising (adapting): the solution(s) retrieved in an attempt to solve the new problem.

•  Retaining: the new solution once it has been confirmed or validated [5].

\Figure 2. CBR Life Cycle

B.  Classification

Classification is the form of data analysis that can be used to extracts models describing the important data classes or to predict future data trends. Classification predicts the categorical labels (classes). There are various classification methods. These are as follows:

•  K-Nearest Neighbour Classifier

•  Naive Bayesian Classifier

•  Support Vector Machine (SVM)

•  Manhattan Distance Measure Method

•  Decision Tree Classifier

Among them, the proposed system used the Manhattan distance measure method and Decision tree classifier.

C. Manhattan Distance Measure Method

Distance measure plays a central role for retrieving the similar case. The dissimilarity or similarity between objects is typically computed based on the distance between each pair of objects. The well-known distance measure is Manhattan distance, which is defined as [6]:

d(i, j) = |xi1- xj1| + |xi2 - xj2| + ------|xip - xjp| (1)

The formula defines data objects i and j with a number of dimension equal to p. The distance between the two data objects d (i, j) is expressed as given in the above formula. xip is the measurement of object i in dimension p.

D.  Decision Tree Classifier

In this system, Decision Tree Classifier is used for case revising process. A decision tree is a flow-chart-like tree structure that employs a top down. To select the test attribute at each node in the tree, the information gain measure is used. The decision tree is built by using the attribute selection measure equation and decision tree algorithm [6].

1)  Decision Tree Algorithm: Decision tree algorithm is as follows [8]:

Algorithm: Generate_decision_tree.

Step 1: create a node N;

Step 2: if samples are all of the same class, C then return N as a leaf node labeled with the class C;

Step 3: if attribute-list is empty then return N as a leaf node labeled with the most common class in samples;

Step 4: select test-attribute, the attribute among attribute-list with the highest information gain;

Step 5: label node N with test-attribute;

Step 6: for each known value ai of test-attribute

-  grow a branch from node N for the condition test-attribute=ai;

Step 7: let si be the set of samples in samples for which test-attribute=ai;

Step 8: if si is empty then attach a leaf labeled with the most common class in samples;

else attach the node returned by Generate _decision _tree;

2)  Attribution Selection Measure: Information gain measure is used to select the test feature at each node in the tree. The feature with the highest information gain is chosen as the test attribute for the current node. Let S be a set consisting of s data samples. Suppose the class label attribute has m distinct values defining m distinct classes, Ci (for i=1,…,m). Let si be the number of samples of S in class Ci. The expected information needed to classify a given sample is given by

(2)

where, pi is the probability that an arbitrary sample belongs to Ci and is estimated by.

Let attribute A have v distinct values, {a1, a2,…, av}. Attribute A can be used to partition S into v subsets, {S1,…, Sv}, where Sj contains those samples in S that have value aj of A. Let sij be the number of samples of class Ci in a subset Sj. The entropy, or expected information based on the partitioning into subsets by A, is given by

(3)

The term acts as the weight of the jth subset and is the number of samples in the subset divided by the total number of samples in S. For a given subset Si,

(4)

where, and is the probability that a sample in Sj belongs to class Ci.

The encoding information that would be gained by branching on A is

(5)

The feature with the highest information gain is chosen as the test feature for the given set S. A node is created and labeled with the feature, branches are created for each value of the feature, and the samples are partitioned accordingly [6].

E.  Cross Validation Method

Estimating classifier accuracy is important in that it allows one to evaluate how accurately a given classifier will label future data that is data on which the classifier has not been trained. In the k-fold cross-validation, the data was randomly partitioned into k mutually exclusive subsets or folds of approximately equal size. A learning algorithm was trained and tested k times; each time it is tested on one of the k – folds and trained using the remaining k-1 folds. The cross- validation estimate of accuracy was the overall number of correct classifications from the k experiment, divided by the number of examples in the initial data.

The classification accuracy Ai of classifier depends on the number of samples correctly classified and is evaluated by the formula:

Ai = × 100 (6)

where, t = the number of sample cases correctly classified, and n = the total number of sample cases [8].

IV.  system design

The proposed system design, the implementation of the system and experimental results of this system are described in figure 3.

A.  Proposed System Design

The architecture of proposed system implements the diagnosis system by using case-based reasoning. At first, this system checks the user who is either administrator (physician) or patient. Physician can insert the new case about disease to the case-base (database) by updating the existing data and deleting useless data. If the user is patient, the system will display the common symptoms of disease to him.