A Data Mining Of Leukemia Cancer Detection Using Genetic Algorithm and Neural Network

Pawandeep1, Arshdeep Singh2

1.1 Department of Information Technology

2.1 Department of Computer Science & Engineering

1,2 Adesh Institute of Engineering & Technology, Faridkot

ABSTRACT

Medical imaging has become one of the most important visualization and interpretation methods in biology and medicine over the past decade. The most challenging aspect of medical imaging lies in the development of integrated systems for the use of the clinical sector. One of the most feared by the human disease is cancer. Leukemia is a type of blood cancer, and if it is detected late, it will result in death. Leukemia occurs when a lot of abnormal white blood cells produced by bone marrow. In this paper, we have proposed a framework using data mining technique to detect leukemia cancer utilizing genetic algorithm and neural network.In proposedmethodology, we have used genetic algorithm for reduction of large leukemia data set or gene set to reduced gene set. This also calculates the best fitness function. Neural network classified the matched and unmatched values. At last accuracy parameters are used for accuracy. FRR and FAR is evaluated.

Keywords: Leukemia, Cancer, Neural Network, Genetic Algorithm, Blood Cells.

1  INTRODUCTION

Data mining is the process in which valuable information is extracted from the large dataset. It has reached the high growth over past few years. Due to the usefulness of data mining approaches in health world, it has become the good technology in healthcare domain. This realization leads to explosion of data mining approaches [1]. Medical data mining can exploit the hidden patterns present in voluminous medical data which otherwise is left undiscovered. Data mining techniques which are applied to medical data include association rule mining for finding frequent patterns, prediction, classification and clustering.

The research work done in data mining medical feildis given as: Evans et. al [2] proposed a system based on data mining techniques to detect the hereditiary syndromes. Pradhan and Prabhakaran [3] proposed an approach through association rule mining to mine high-dimensional, time series medical data for discovering high confidence patterns. DoronShalvi and Nicholas DeClaris, [4] discussed medical data mining through unsupervised neural networks besides a method for data visualization. They also emphasized the need for preprocessing prior to medical data mining. In the year 2000 Krzysztof J. Cior [5], bioengineering professor, identified the need for data mining methods to mine medical multimedia content. Tsumoto [6] identified problems in medical data mining. The problems include missing values, data storage with respect to temporal data and multi-valued data, different medical coding systems being used in Hospital Information Systems (HIS). Brameier and Banzhaf [7] explored and analyzed two programming models such as neural networks, and linier genetic programming for medical data mining. Abidi and Hoe [8] proposed and implemented a symbolic rule extraction workbench for generating emerging rule-sets. Abidi et al. [9] explored the usage of rule-sets as results of data mining for building rule-based expert systems. Olukunle and Ehikioya [10] proposed an algorithm for extracting association rules from medical image data. The association rule mining discovers frequently occurring items in the given dataset.

Traditionally data mining techniques were used in various domains. However, it is introduced relatively late into the Healthcare domain.

Normally the necessary part of any human body is blood since it keeps one alive. It executes many vital functions such as to transfer oxygen, carbon dioxide, mineral and etc. to the complete body in order to keep metabolism. Blood consists of three main components which RBC, WBC and Platelets. Insufficient amount of the blood could affect the metabolism critically which could be very hazardous if early treatment is not taken. One of the normal blood disorders is Leukemia. Leukemia is the common type of cancer in children. All cancers start in body cells, and leukemia is a type of cancer that starts in blood cells [11].

In this paper, we have proposed a framework using data mining technique to detect leukemia cancer utilizing genetic algorithm and neural network. In next section we have briefed about leukemia, its symptoms, its causes and risk factors occurs due to its presence.

2  BACKGROUND OF LEUKEMIA

Leukemia is a type of cancer of the blood or bone marrow categorize by an irregular augment of undeveloped white blood cells called "blasts." It is a thick term covering a compilation of diseases. According to American Cancer Society it is approximated that 48,610 persons (27,880 men and 20,730 women) will be detect with and 23,720 men and women will terminate of leukemia in 2013 only. In turn, it is part of the even broader set of diseases disturbing the blood, bone marrow, and lymphoid system, which are all known as hematological neoplasm. Over time, leukemia cells can crowd out the normal blood cells. This can lead to serious problems such as anemia, bleeding, and infections. Leukemia cells can also spread to the lymph nodes or other organs and cause swelling or pain. There are several different types of leukemia.

·  Acute lymphoblastic leukemia, or ALL.

·  Acute myelogenous leukemia, or AML.

·  Chronic lymphocytic leukemia, or CLL.

·  Chronic myelogenous leukemia, or CML.

In general, leukemia is grouped by how fast it gets worse and what kind of white blood cell it affects. Acute Lymphoblastic Leukemia (ALL) is the most all-purpose type of leukemia in young children and Acute Myelogenous Leukemia (AML) occurs more usually in adults than in children, and more usually in men than women [12]. The young WBC can also build up in a variety of extreme dullard sites, especially the mining’s, gonads, thymus, liver, spleen, and lymph nodes. Hence due to extreme lye- phobic blast or myeloid blast in the marrow they also low into the peripheral blood stream. Acute myeloid leukemia (AML) is also recognized by other names, which include acute myelocytic leukemia, acute Myelogenous leukemia, acute granulocytic leukemia, and acute non-lymphocytic leukemia. “Acute” means that this leukemia can develop rapidly if not treated, and would approximately certainly be lethal in a few months. “Myeloid” refers to the type of cell from where this leukemia begins. In most cases AML build up from cells that would wind into white blood cells (other than lymphocytes), but in some cases of AML expand in other types of blood forming cells.

AML starts in the bone marrow (which is the soft inner part of certain bones, where new blood cells are fashioned), but in most cases it speedily moves into the blood. It can sometimes widen to other parts of the body together with the lymph nodes, liver, annoy inner nervous system (brain and spinal cord), and testicles. Other types of growth can start in these organs and behind that expand to the bone marrow. But these cancers that start anywhere else and then increase to the bone heart are not leukemia’s. [13]

Diagnosing leukemia is based on the fact that white cell count is greater than before with immature blast (lymphoid or myeloid) cells and decreased neutrophils and platelets. The attendance of excess number of blast cells in marginal blood is a significant symptom of leukemia. So hematologists routinely inspect blood smear under microscope for proper identification and classification of blast cells [14].

1.1  Causes and Risk Factors of Leukemia

The satisfactory causes of leukemia are unidentified and in most case its unsettled why leukemia has developed. Research into possible causes is going on all the time. Like other cancers, leukemia isn’t transferable and can’t be approved on to other people. There are several of factors that may amplify a person’s risk of budding leukemia. Having a scrupulous hazard factor doesn’t denote you will definitely get this category of disease and personnel lacking any recognized risk factors can still develop it. The recognized risk factor of generate this type of cancer i.e. leukemia are clarify here.

·  Exposure to radiation: People who exposed to high level of release, such as nuclear developed accidents, have a main risk of developing leukemia than people who have not been exposed. On the other hand, a small numeral of people in the UK will be uncovered to emission levels high adequate to augment their risk.

·  Smoking: Smoking increase the risk of initial leukemia. This may be due to the intense levels of benzene in cigarette smoke.

·  Exposure to benzene: In very unusual cases, leukemia may begin due to the long term contact to benzene (and possibly other solvents) used in industry.

·  Cancer treatments: Now and then, a few anti-cancer treatments such as chemotherapy or radiotherapy can be a basis for leukemia to build up after some years of this behavior. The risk increase when persuaded types of chemotherapy drugs are mutual with radiotherapy. While leukemia develops since of earlier anti-cancer treatment, this is called lower leukemia or treatment related leukemia.

·  Blood disorders: People with certain blood disorders, such as myelodysplasia or myeloproliferative disorders have a distended risk of initial AML.

·  Genetic disorders: People with certain hereditary disorder, excluding Down’s syndrome and Franconia’s anemia, have an inflated risk of embryonic leukemia.

Other less general symptoms may be caused by an increase of leukemia cells in a finicky area of the body. Your bones may ache, reason by the strain from a buildup of undeveloped cells in the bone marrow. You might also distinguish raised, bluish wine areas under the covering due to leukemia cells in the skin, or swollen gums caused by leukemia cells in the gums. [12] Cancer starts when cells in a piece of the body begin to rise out of control and can extend to other areas of the body.

1.2  Symptoms

·  Looking fair and sensation exhausted and breathless, which is due to anemia reason by a need of red blood cells

·  Having more disease than normal, because of a not have of muscular white blood cells.

·  Abnormal bleeding caused by too few platelets – this might comprise stain (bruises may come into view with no any obvious injury), heavy periods in women, bleed gums, nosebleeds and blood spots or rashes on the skin (petechial)

·  Feeling in general unwell and run down.

·  Having a passion and sweats, this may be due to an infection or the leukemia itself.

·  A new lump or swollen gland in your neck, under your arm, or in your groin.

·  Frequent fevers.

·  Bone pain.

·  Unexplained appetite loss or recent weight loss.

·  Swelling and pain on the left side of the belly.

3  PROPOSED FRAMEWORK

3.1  Methodology

Step 1 :  Start from uploading the samples which is in the form of database file. we select the database file and upload it into the GUI homepage

Step 2 :  The uploaded file contains the gene information about the patients who have leukemia

Step 3 :  After data has been uploaded ,create a graphical representation of data for visualization

Step 4 :  Now we applied the Genetic Algorithm for the reduction of gene set

Step 5 :  For reduction of gene set we calculate the following G.A steps:

Step 6 :  Calculate total gene information as population

Step 7 :  Apply fitness function on gene data

Step 8 :  Select sub population and evaluate a new fitness function.

Step 9 :  Finally evaluate best fitness function

Step 10 :  Data set is reduced known as reduced gene set

Step 11 :  At last, classification is done by using neural network. Back propagation is used.

Step 12 :  Matched and unmatched values are categorized

Step 13 :  Compare these values matched and unmatched and evaluate the accuracy by using accuracy parameters.

Step 14 :  Accuracy parameters are fault acceptance rate (FAR) and fault rejection rate (FRR) is calculated.

3.2  Flowchart

Fig.1 Flowchart of Methodology

This flowchart contains the methodology for reduction of large leukemia data set or gene set to reduced gene set using genetic algorithm. Genetic algorithm calculates the best fitness function. Neural network classified the matched and unmatched values. At last accuracy parameters are used for accuracy. FRR and FAR is evaluated.

The micro-array gene classification technique involves three major steps namely (i) Dimensionality reduction, (ii) Feature selection, and (iii) Gene classification. The GA technique performs the dimensionality reduction process for obtaining the dataset with small size. Initially, the dimensionality reduction process is carried out on the microarray cancer gene dataset for diminishing the complexity in the gene classification. This process is performed because the dataset size is high dimensional, which increases the processing time and does not produce accurate result for the classification process. The fitness function is carried out to choose the best chromosomes among the generated chromosomes.

Next step is classification of reduced set using neural network.Based on the values acquired from training phase, the performance of the NN network is analyzed to obtain appropriate values for testing phase. In order to find the optimum structure, the NN network performance has been analyzed for the optimum number of hidden nodes and epochs. For this situation, the epochs will be set to a certain fixed value. Then, the NN network was trained at the appropriate range of hidden nodes. The number of hidden nodes that have given the best performance is then selected as the optimum hidden nodes. After that, by fixing the optimum number of hidden nodes, the epochs will be analyzed in a similar way to obtain the optimum number of epochs that can give the highest or best accuracy.

4  RESULTS & IMPLEMENTATION

Fig.2GUI Panel

The above figure shows the graphical user interface panel of the proposed system having different guicontrols of the user panel having upload buttons to upload the data set, Genetic algorithm and neural network intialization and testing button.

This panel contains push buttons like click here to upload data set ,apply genetic algorithms,intialize neural network and test leukaemia data .

At left bottom,code information is created.