A Novel Approach For Mass Detection In Digitized Mammograms

NOHA YOUSSRY, FATMA E.Z. ABOU-CHADI

Department of Electronics and Communications Engineering

Mansoura University

EGYPT

Abstract:- A novel method for early detection of circumscribed masses in digitized mammograms is introduced.

A wavelet enhancement technique is used for contrast enhancement of the mammograms then the breast tissue is scanned using a variable window size, for each sub-image co-occurrence matrices in different5 orientations (θ= 0º, 45º, 90º and 135º) are calculated and texture features are estimated for each co-occurrence matrix, then the features are used to train a neuro-fuzzy model. The classification results reach 75% for abnormal cases and 90% for normal ones.

Index terms: Mammography, Mass detection, Wavelet Enhancement, Texture analysis, Neuro-Fuzzy.

1 Data Set

The data set used in this paper is obtained from the Mammographic Image Analysis Society (MIAS)[8]. The MiniMIAS database provides digitized mammograms at 200μm pixel resulting in 1024*1024 pixel resolution with 8 bit accuracy. The number of mammograms containing circumscribed lesions are twenty two, eleven mammograms are fatty tissue, eight mammograms are glandular and only three mammograms are dense tissue.

The mammograms contain twenty four lesions, twenty are begnin and only four are malignant findings. The diameters of the abnormalities differ from 36 pixels to 396 pixels. Table 1 summarizes the data set used. Fig.1 shows an example of the mammograms used

Table 1 The data set used.

Fatty

/

Glandular

/

Dense

Begnin / 9 / 6 / 3 / 18
Malignant / 2 / 2 / 0 / 4
11 / 8 / 3 / 22

Fig. 1 A typical example of the used mammograms

2 Image Preprocessing

Image preprocessing consists mainly of two steps: Image segmentation to isolate the breast tissue from the background image and image enhancement to increase the contrast between the mammographic lesions and the surrounding breast tissue.

2.1 Image Segmentation

Image segmentation is used to isolate the breast tissue from its background. Segmentation is accomplished through two steps; First, histogram equalization which is used to increase the grey level of the tissue near the surface of the body making a discriminating difference between the tissue and the background. This makes the edge detection process easier. Second, applying a certain threshold to the equalized image to obtain a binarized image with only two grey levels; 255 representing the breast tissue and 0 representing the background.

2.1.1 Histogram Equalization

The histogram of an image represents the relative frequency of occurrences of the various gray levels in the image. Histogram modeling techniques (e.g. histogram equalization) provide a sophisticated method for modifying the dynamic range and contrast of an image by altering that image such that its intensity histogram has a desired shape. Unlike contrast stretching, histogram modeling operators may employ non-linear and non-monotonic transfer functions to map between pixel intensity values in the input and output images. Histogram equalization employs a monotonic, non-linear mapping which re-assigns the intensity values of pixels in the input image such that the output image contains a uniform distribution of intensities. This can be done as shown in [1].

Consider an image, U ≥ 0 to be a random variable with continuous probability density pu (u) and commutative probability density Fu (u) = P[U ≤ u]. Then the random variable

will be uniformly distributed over (0,1).

Let the input u has L gay levels xi , i=0,1, . . . , L-1 with probabilities pu (xi).

These probabilities can be determined from the histogram of the image that gives h(xi), the number of pixels with gray level xI

The output for the quantized histogram v’, also assumed to have L levels, is given as follows;



Thus, v’ will be uniformly distributed only approximately because v is not a uniformly distributed variable. Fig. 2 shows the effect of histogram equalization on digital mammograms.

a) The original mammogram / b) The equalized mammogram

Fig 2 The effect of equalization on digitized mammograms

2.1.2 Thresholding

One of the simplest and most important approaches for segmenting an image is to divide the grey scale into bands and use thresholds to determine regions or obtain boundary points[2]. This technique is called “Grey Level Thresholding” and it is based on dividing the histogram of an image into two bands B1 and B2 separated by a threshold T. The band B1 contains levels associated with the background and band B2 represents the object. Suppose the grey level histogram shown in Fig 3 corresponds to an image f (x,y) composed of light objects on dark background, just like the mammographic images. Thus it is obvious that a simple way to extract the objects from its background is to scan the image and a change from band B1 to Band B2 represents a boundary.

Fig 3 A histogram for an image with threshold T

Then, for the mammograms used, any point f (x,y) for which f (x,y)>T is given a value of 255 in the binarized image representing the breast tissue, otherwise f (x,y) is given a value of zero representing a background point. Segmentation is accomplished by scanning the whole mammogram pixel by pixel and labeling each pixel as object or background according to its binarized grey level. Fig 4 shows a typical example of the binarized mammograms obtained.

Fig 4 The binarized mammogram

2.2 Image enhancement

Mammographic image analysis is a challenging task due to poor illumination and high noise levels in the image[3]. Therefore, wavelet enhancement technique is used to increase the contract of the mammograms and facilitate discriminating between normal and abnormal tissues.

The wavelet enhancement technique is based on dyadic wavelet transform and consists of three steps; first, wavelet transform is used to decompose the image into detailed coefficients and a coarse image. Second, a nonlinear enhancement function is applied to the coefficients for contrast enhancement. Finally, reconstruction of the enhanced image from its coefficients is performed. Fig5 shows a block diagram of the enhancement technique.

Fig5 shows a block diagram of the enhancement technique.

The choice of the filters and the enhancement functions shown in the figure are declared as follows;

Mallat et al.[3] proposed the filter h(ω) to be

Laine et al.[5] have chosen the laplacian filter as a proper choice for g (ω)


Since the filters h(ω), g (ω) and k (ω) satisfy the condition


l(ω)=(1+| h(ω)|2)/2. (7)

Fig 6 shows the original and enhanced mammograms.


The original image / The Enhanced Image

Fig 6 Wavelet enhancement of digitized mammograms


3 Feature Extraction

Texture features or more precisely, Grey Level Co-occurrence Matrix (GLCM) features are used to distinguish between normal and abnormal tissues. Five co-occurrence matrices are constructed in four different spatial orientations horizontal, right diagonal, vertical and left diagonal (0º, 45º, 90º and 135º). A fifth matrix is constructed as the mean of the preceding four matrices [4].

3.1 Texture Features (Grey Level Co-occurrenc Matrix Features)

From each co-occurrence matrix, a set of nine features are extracted in the different orientations for the training of the neuro-fuzzy model.

Let P be the N*N co-occurrence matrix calculated for each sub-image, then the features as given by Byer [9] are as follows;


1. Maximum probabilty


2. Contrast


3. Inverse Difference Moment (Homogeneity)


4.Angular Second Moment (ASM)

5. Dissimilarity

6. Grey Level Co-occurrence Mean (GLCM)


7. Variance

8. Correlation Coefficient

where

9. Entropy

3.2 Feature selection

Feature selection concerns the reduction of the dimensionality of the pattern space and the identification of features that contain most of the essential information needed for discriminating between normal and abnormal cases. Selection of efficient features can reduce significantly the difficulty of the classifier design. Therefore feature selection based on the correlation coefficient between features is performed.

Any two features with correlation coefficient that exceeds 0.9 in both spaces can be combined together and thought as one feature reducing the dimensionality of the feature space by one. Therefore, according to the tables listed, the maximum probability and the contrast can be removed and the number of features are reduced to seven features.

4. Neuro Fuzzy Classifier

A neuro-fuzzy classifier is used to detect candidate circumscribed lesions. Generally, the input layer consists of seven neurons corresponding to the seven features, the output layer consists of one neuron indicating whether the tissue is a candidate circumscribed lesion or not, and the hidden layer changes according to the number of rules that give best recognition rate for each group of features. The different models used and results obtained are shown in Tables (3), (4)

Table 4 The classification results for different models in different directions

Direction of co-occurrence matrix / No. of membership functions / Classification rate for abnormal cases / Classification rate for normal cases
0º / 3 / 75% / 80%
4 / 100% / 80%
45º / 4 / 87.5% / 70%
6 / 75% / 80%
90º / 3 / 87% / 60%
4 / 75% / 80%
6 / 62.5% / 70%
135º / 2 / 62.5% / 80%
4 / 62.5% / 80%

Table 5 The results obtained from using the features of the mean co-occurrence matrix

No. of membership functions / Classification rate for abnormal cases / Classification rate for normal cases
4 / 37.5% / 60%
5 / 62.5% / 80%

The preceding results are based on a training set of 32 patterns, 16 normal patterns,16 abnormal patterns and a testing set of 18 patterns, 10 normal and 8 abnormal patterns. It has been found that the best features for discriminating patterns are the features extracted from the co-occurrence matrix at angle zero, and the best neuro-fuzzy model is the one that uses four membership functions as shown in Fig (7)

This model gives 100% correct classification for ambormal cases and 80% correct classification for normal ones.

Fig 7 The neuro-fuzzy model used for classification

5. Conclusion

A neuro-fuzzy model for detecting candidate circumscribed masses in digitized mammograms is presented. Wavelet enhancement is used to increase the contrast between the tissue and the background image. Texture features are used in the training of the neuro-fuzzy model. Co-occurrence matrices at different directions are calculated and Grey Level Co-occurrene Matrix (GLCM) features are extracted from the matrices, best results are obtained when using texture features of co-occurrence matrix at angle zero.

The percentage of correct classification for abnormal cases reaches 100% and the percentage of correct classification for normal cases reaches 80%. The results obtained are promising and comparable to those obtained by Sahiner et al. [5] and Kokkinakis et al. [6] and better than that reported in the previous work [7].

References

[1] A. Jain, Fundamentals of Image Processing, Prentice-Hall Inc.,1989.

[2] R. Gonzalez and R. Woods, Digital Image Processing, Addison-Wesley Publishing Company, 1992.

[3] S.Mallat and S. Zhong. “ Characterization of signals from multiscale edges” IEEE Trans. Pattern Analysis and Machine Intelligence, vol.14 pp710-732, 1992.


[4] K. Bovis and S.singh, “Detection of masses in mammograms using texture features”, International Vonference On Pattern Recognition, Vol. 2, Sept. 2000

[5] B.Sahiner, H.P.Chan and N.Petrick, “Classification of mass and normal breast tissue : A convolution neural network classifier with spatial domain and texture images”, IEEE Trans. Medical Imaging, Vol.15, No.5, pp 598-610 Oct. 1996

[6] G. Kokkinakis, I. Christyianni and E. Dermatas,” Fast detection of masses in computer aided mammography”, IEEE Signal Processing Magazine, pp 54-64, Jan 2000.

[7] N.Youssry, F.E.Z.Abou Chadi, A. El Sayad, “A Neural Network Approach For Mass Detection In Digitized Mammograms”, submited to be published in the 1st Annual Conference of Biomedical Engineering, Dec. 2002.

[8]

[9]