Identification of Benign and Malignant Lesion by Feacture Extraction on Mammographic Images

IDENTIFICATION OF BENIGN AND MALIGNANT BREAST LESION

BY IMAGE SHAPE ANALYSIS

Aura Conci, Luciana Marinho Soares and Leonardo Hiss Monteiro

Computação Aplicada e Automação – CAA

Universidade Federal Fluminense - UFF

Rua Passo de Pátria, 156, CEP 240210-240, Niterói, RJ, Brasil

{aconci, marinho, hiss}@caa.uff.br

ABSTRACT

An implementation for identification of breast calcifications and masses as benign or malignant based on mammographic discrete shape analysis is presented. A set of features is extracted using shape characterization. A database for use by the mammographic analysis research community has been established. From this, fifty-two cases with undoubted diagnostic have been used for training. After extensive experimentation a set of discriminant functions and nearest neighbor are combined for classification in a diagnosis system. The recognition rate for tumors with spicules was 100% on low-density breast. The main contribution of this work is its projected database that expands with easiness whenever a new image with proven diagnosis is introduced.

Keywords: Biomedical image processing, classification of breast tissue, mass classifier in digitized mammograms, discrete shape analysis.

1. INTRODUCTION

Breast tumors are the leading cancers in women and one of the major causes of death for middle-aged and older female population. Early detection is very important for reduction of breast cancer mortality. Mammographic screening is the most effective method of early cancer detection. Mammograms can delineate most of the significant changes of breast tissue. The earliest radiographic signs for breast cancer diagnosis are clustered microcalcifications (Kopans, 1989). Predominant breast carcinomas consist of central tumor mass surrounded by spicules (Tavassoly, 1992). Spicules have the appearance of stellate regions with star-shaped fine lines that emanates from the central mass. The medical diagnosis by mammogram is based on identification of spiculations or more diffuse stellate masses, which characterizes malignant breast cancer (De Paredes, 1989). Therefore, in the development of computer algorithms to aid in the diagnosis these patterns (spicular shape and clusters) can also be used on (a) identification of tumors as benign or malignant cases and (b) classification of tissues as suspicious or normal. Unfortunately, these patterns are very difficult to include on automated image analysis algorithms. Four aspects cause this difficulty. First, the x-ray image can presents very different sizes and grades of definition. Second, stellate tumor has an irregular shape with borders radiating spicules that may extent from few millimeters to many centimeters. Third, breast tissue around the masses may vary from grease to dense, the former presents tumors with well-defined borders but the latter presents poorly contrasted gray levels regions with bad-defined borders. Finally, the small size of microcalcifications, they have an average diameter of 0.3mm (Wodds and Bowyers, 1996).

Early computer aided algorithms concentrate mainly on enhancement of mammograms to radiologist (Laine et al., 1994). Chang and Laine (1997) enhanced mammograms using multiscale wavelet and oriented information. Few works consider the detection of microcalcifications on mammograms. Kim and Park (1997) detect clustered microcalcifications using a texture analysis method called Surrounding Region Dependence. This method is a statistical texture analysis based on the second-order histogram of two surrounding regions. They use four extracted features to classify region of interest into positive (containing calcification clusters) or negative (normal breast tissue). A three-layer backpropagation neural network is employed as classifier. 120 X-ray mammograms were analyzed using this approach and 172 regions of interest were investigated. Their classification performance presents a relation between number of neurons used on the network and the false positive and false negative ratio.

Recent works consists of feature extraction followed by classification (Méndez et al., 1996). Feature extraction based on fractal dimension, incorporating density estimation and classification based on discriminant analysis was used on classifications of healthy and tumors tissue from mammograms by Lorey et al. (1995). A method for automatic detection of spicule shadows in mammograms using two steps was presented by Jiang et. al. (1997). These steps are: enhancement and feature recognition. The enhancement removes noise and made a directional map. Two features are then selected for recognition of tumors with spicules: direction of spicules and density of tumors. The reported recognition rate for spicules was 100% without false positives. They tested 24 samples including seven tumors with spicules. Li et al. (1997) have developed a combined method to enhance and extract suspicious masses. They used morphological operations, Finite Generalized Gaussian Mixture (to model the histogram of the image) and Contextual Bayesian Relaxation Labeling Technique. They have chosen area, compactness (or circularity based on area and boundary perimeter) and difference entropy as features for classifying masses. Their classifier is a mixture of experts: one expert is trained to detect true masses and another is trained to detect false masses. The training uses fifty mammograms with biopsy proven masses and fifty normal cases. Forty-six single-view mammograms were used for testing (23 normal cases and 23 biopsy proven masses). It was reported that their classifier suspicious masses with a sensitivity of 84% and 1.6 false positive. Liu and Delp (1997) have used multiresolution detection of stellate lesions. They first obtain a multiresolution representation of the original mammogram, using linear phase 2-D wavelet transform. Then, they extracted five features at each resolution. The images of MIAS (Mammographic Image Analysis Society - database has been used in their experimentation. There are a total of 19 mammograms containing stellate lesion on this base. These and other 19 normal mammograms were then divided into two sets, one used as a training set and the other used for classification.

The objective of this paper is to present an implementation to automatic detection of suspicious areas in mammograms. The implemented structure consists of the three blocks shown in figure 1: feature extraction, shape classification and an expansive database. It has been shown a detection rate of 100% without false positives or false negatives. In the next section, we describe the databases used for pattern identification and the seven features selected for extraction from the input images to be tested. Section 3 presents the combined classification approach. Then, we present experimental results and conclusions.

2. SHAPE PATTERNS

A project to establish a database for use by the mammographic image analysis community is a collaborative effort involving the Antonio Pedro University Hospital-HUAP, the Radiology Department of the Faculty of Medicine, the postgraduate course on Computer Application and Automation-CAA (Medical Images Research Program) of the Federal Fluminense University–UFF and the IRSA-Institute of Radiology S.A at Niterói. The primary purpose of the database is to make less difficult researches in the development of computer algorithms to aid in screening and diagnosis. Secondary purpose of the database may include the development of algorithms to aid in the diagnosis and the development of teaching or training aids. The database contains cases collected along 3 decades by the Radiology Department of the Faculty of Medicine and the IRSA-Institute of Radiology. Each study includes breast image with diagnostic information from expert radiologists of these institutes. Both benign and malignant cases are included. The digitized images are available on From these, 52 images from different patients of proven diagnostic (all these had either a biopsy proven or at least 3 years of subsequent follow-up without change) were identified by expert radiologists as the most representative cases. These benignant (27 cases) or malignant (25 cases) have been used for feature extraction and pattern classifier.

Figure 1 – Simplified diagram of the implementation

Features are extracted within a certain neighborhood. In this work a surface of 200x200 pixels are analyzed. All images on the database are scanned on 8-bits per pixel (256 levels of gray). The morphological opening and erosion operations are used to reduce small objects that were not masses. If G(i,j) represents its gray-level (from 0 to 255) for each pixel (i,j), then the most important gray-level is the thresholding between the nodule gray-level, gn , and the gray-level of its neighborhood. This has to be defined by histogram identification previously (Hussain, 1991). The parameters used on classification combine 7 features: the number of nodules or calcifications areas, the nodule boundary length, the nodule area, its inertial tensor of order two (2 features) and three (2 features). Figure 2 illustrated the features extracted for the implementation. These are calculated from the following shape parameters of the digitized image, G(i,j):

(a)The nodule area (central images on each column of figure 2) is defined as:

A = ij B( i ,j )

where B(i,j) = 1 if G(i,j)  gn

and B(i,j) = 0 ifG(i,j) < gn.

(b)Considering the area’s centroid (i0 , j0 ), where i0 = m10 / A and j0 = m01 / A. The central moment of B(i,j) of order p+q is defined as mpq = ij ( i - i0)p (j - j0)q . The second order central moments of B (i,j), defines its inertial tensor of order two:

m20 / m20
m11 / m02

This tensor has two invariants, its trace and its determinant. The former is the polar moment of inertia around the centroid: I1 = m20 + m02. The eccentricity is also an invariant: I2 = (m20 - m02)2 + 4m112 . The thirty order central moments of B (i,j), defines a tensor of order three. This tensor presents several invariants. In this work we have used:

I3 = (m30 - 3m12)2+ (3m21 - m03)2

I4 = (m30 + m12)2 + ( m21 + m03)2

(c)The number of pixels on the nodule edge (top images on each column of figure 2) :

E = B,

where B represents the edge pixels of the nodule.

(d)For classification accuracy of small and large tumors, we divide each feature by area powers: E/A1/2 ; I1/A2 ; I2/A4 ; I3/A5 ; and I4/A5. These have been introduced in program implemented and used for shape classification of the mammograms.

To compute the number of calcifications we use a fast scheme. The erosion processed image is scanned and the connected pixels is numbered from 1 to n (where n is the calcifications area or A); non connected pixels receives 0; then the number of 1 in the image represents the number of calcifications. Figure 3 represents this simple and efficient process. To limit this counter to microcalcification we limit the calculation to small n (or A).

3. CLASSIFICATION SCHEME

The ultimate goal of pattern recognition systems is to achieve the best possible classification performance. This traditionally led to the development of different classification schemes for the problem to be solved. Then, after experimental assessment, one classifiers is chosen as a final solution of the problem. It has been observed, in recent studies (Kittler et al., 1998), that although one classifier produces the best performance, the set of patterns misclassified by different classifiers not necessarily overlap. This suggested that different classifier designs potentially offered complementary information about the patterns to be classified, which could be harnessed to improve the performance of the system. Such observation motivates the recent interest in combining classifiers. The idea is not to rely on a single decision scheme but combining them to derive a consensus decision. We use this idea in this works to improve efficiency and accuracy. Here, the classifier separates the feature vector into one of two classes: benignant-B or malignant-M. The decision boundary between the classes is expressed by combining discriminant functions and nearest-neighbor classifier.

0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0
0 / 1 / 2 / 3 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 1 / 2 / 0 / 0 / 0 / 1 / 2 / 0 / 0 / 0 / 0 / 0 / 0 / 1 / 2 / 3 / 0 / 0 / 0 / 0
0 / 4 / 5 / 0 / 0 / 1 / 2 / 3 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 3 / 4 / 0 / 0 / 1 / 0 / 0 / 0 / 4 / 0 / 0 / 0 / 0 / 1
0 / 0 / 0 / 0 / 0 / 4 / 5 / 6 / 7 / 0 / 0 / 0 / 0 / 1 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 2 / 3 / 0 / 0 / 0 / 0 / 0 / 0 / 2 / 3
0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / 0

Figure 3 – Computing the number of microcalcifications: each 1  a microcalcification if A Amc

In the nearest-neighbor classifier, we use all benignant and malignant images in the database. We decide whether a feature vector to be in class B or M based on the nearest neighbor within Euclidean distance. If its distance to B (or M) is the minimum then it is in B (or M) class (Sameer and Nayar, 1997). This classifier is simple and powerful, but if suffer from computational complexity if the number of elements in the database increase too much. We assume in this project that the database expands whenever a new image with proven diagnosis is introduced in the bank. Therefore, the reduction of its computational complexity as much as possible is an important aspect. One method of doing this is to reorganize the search procedure by preprocessing the representation vectors. Another method that can be incorporated before applying the first one is to reduce the number of representation vectors, with a proper selection or training procedure (called editing). We combined these two methods. Subsequent extension of this work will improve this aspect with the implementation of several fast search algorithms, in particular, using triangular inequality elimination (Lee and Chae, 1998). For explicitly specifying the discriminant functions, we first have been made great number of analyses and experiments to specify the decision boundaries. This learning procedure show us that for all images on the database the two classes are separable by a plane in the 3-dimensional space defined by the features E/A1/2, I1/A2 and I4/A5. Figure 4 represent this. Then the classification is realized using threshold logic (Li et al., 1997). Therefore, all these decision rules have been used in the shape classification block of the implementation (on figure 1).

Figure 4- Partition of the classification space used in one of the combined classification procedure

4. CONCLUSIONS

In this paper, we present a scheme to classifications of lesions in mammograms based on discrete shape patterns. The training process uses image database (now with 52 images, 27 benign and 25 malignant cases). The database consists of digitized images like the one shown on figure 5. Experimental classifications use completely different images of the training process: the Nijmegen mammographic images ( Classification results have shown that the scheme is capable of correct classification on all tested cases until now. Nijmegen Database contains 40 images seven of benign cases. All these images were correctly classified, with zero false positives and zero false negative ratios. The recognition rate for tumors with spicules was 100% on low-density breast. Subsequent extension of this implementation will include the two pairs of breast image shape information in the analysis for better characterization of the (Vujonic and Brzakovic, 1997).

ACKNOWLEDGMENT

This work was supported, in part, by project FINEP/RECOPE SAGE #0626/96. The authors acknowledge to CNPq (project number: 302649/87-5), FAPERJ (E26/150.771/96) and CAPES

Figure 5 – Database mammogram

REFERENCES

Boyd, N. F., 1997, “Analysis of Digitized Mammograms for the Prediction of Breast Cancer Risk”, .

Chang, C. -M. and Laine, A., 1997, “Enhancement of Mammograms from Oriented Information”, ICIP’97- IEEE Proceedings of International Conference on Image Processing, October 26-29, Santa Barbara, Vol. 3, pp. 524-527, No 607.

De Paredes, E. S., 1989, Atlas of Film-Screen Mammography.

Horiuchi, T.1998, “Decision Rule for Pattern Classification by Integrating Interval Feature Value”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, No. 4, April, pp. 440-448.

Hussain, Z., 1991, Digital Image Processing: practical applications of parallel processing techniques, Ellis Horwood

Jiang, H., Tiu, W., Yamamoto, S.and Iisaku, S-I., 1997, “Detection of Spicules in Mammograms”, ICIP’97- Proceedings of IEEE International Conference on Image Processing, October 26-29, Santa Barbara, Vol. 3, pp. 520-523, No 380

Kim, J.K. and Park, H.W., 1997, “Surrounding Region Dependence Method for Detection of Clustered Microcalcifications on Mammograms”, ICIP’97- Proceedings of IEEE International Conference on Image Processing, October 26-29, Santa Barbara, Vol. 3, pp. 535-538, No 508

Kittler, J., Hatef, M., Duin, R.P.W. and Matas, J.,1998, “On Combining Classifiers”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, No. 3, March, pp. 226- 239.

Kopans, D.B., 1989, Breast Imaging, J. B. Lippincott Co.

Laine, A.F., Schuler, S., Fan, J. and Huda, W., 1994, “Mammographic Features Enhancement by Multiscale Analysis”, IEEE Trans. on Medical Imaging, Vol. 13, No. 4, December, pp. 263-274.

Lee, E.W. and Chae, S.I., 1998, “Fast Design of Reduced-Complexity Nearest-Neighbor Classifiers Using Triangular Inequality”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, No. 5, May, pp. 562-566.

Li, H., Liu, K.J.R., Lo, S-C B., Wang, Y., 1997, “Stochastic Model and Probabilistic Decision-Based Classifier for Mass Detection in Digital Mammograms”, ICIP’97- Proceedings of IEEE International Conference on Image Processing, October 26-29, Santa Barbara, Vol. 3, pp. 539-542, No 460

Liu, S., Delp, E.J., 1997, “Multiresolution Detection of Stellate Lesions in Mammograms”, ICIP’97- Proceedings of IEEE International Conference on Image Processing, October 26-29, Santa Barbara, Vol. 2, pp. 109-112, No 614

Lorey, R. A., Solka, J. L., Rogers, G. W., Marchette, D. J. and Priebe, C. E., 1995, “Mammographic Computer-Assisted Diagnosis using Computational Statistics Pattern Recognition”, Real-Time Imaging, Academic Press, Vol.1, 95-104.

Méndez, A. J., Tahoces, P. G., Lado, M. J., Souto, M. and Vidal, J. J., 1996, “Computer-Aided Diagnosis: Detection of Masses on Digital Mammograms”, Proceedings IWISP’96, 4-7 November, Manchester, U. K., 465-468.

Nene, S. A. and Nayar, S. K.,1997, “A Simple Algorithm for Nearest Neighbor Search in High Dimensions”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, No. 9, September, pp. 989-1003.

Shen, L., Rangayyan, R.M. and Desautels, J.E.L., 1994 “Application of Shape Analysis to Mammographic Calcification”, IEEE Transactions on Medical Imaging, Vol. 13, No. 2, June, pp. 263-274.

Tavassoly, F. A., 1992, Pathology of the Breast.

Vujonic, N., and Brzakovic, D., 1997, “Establishing the Correspondence Between Control Points in Pairs of Mammographic Images”, IEEE Transaction on Image Processing, Vol. 6, No. 10, October, pp. 1388-1399.

Wodds K. and Bowyers K., 1996, “A general view of detection algorithms”, Proceedings of 3rd International Workshop on Digital Mammography, June 9-12, Chicago, pp. 385-390.