Abstract- The percentage of radiodense tissue in the breast has been correlated to an increased risk of breast cancer. This paper presents an automated method to quantify the amount of breast tissue found in a digital mammogram. The process involves creating a segmentation mask using a radial basis function, spatially varying Neyman-Pearson threshold depending on breast size parameters, and density percentage estimation. Typical research results are presented – these have been independently validated by a radiologist.

I. INTRODUCTION

Mammography has emerged as a reliable technique for the early detection of breast cancer – the second leading cause of cancer-related mortality among American women [1]. The radiographic appearance of female breast consists of radiolucent (dark) regions due to fat and radiodense (light) regions due to connective and epithelial tissue. The amount of radiodense tissue can be used as a marker for predicting breast cancer risk. In the mid 1970s, studies conducted by J. Wolfe[2] suggested that women with high radiodense tissue were at greater risk for breast cancer. In fact, it has been shown that women with more than 60-75% of radiodense tissue are at four to six times greater risk of developing breast cancer than those with lesser densities [2]. Cancers are also detected at later stages in dense breast and therefore have lower diagnostic accuracy.

The estimation of radiodense tissue has traditionally been a subjective determination by trained radiologists; with very few published work describing quantitative measures. There are several problems with using the Wolfe Classification. First is the problem with the variability associated with subjective nature of radiologists. The same image may produce different results depending on the radiologist, and sometimes for the same radiologist. A computer aided analysis of breast tissue would give an objective method to quantifying the amount of radiodense tissue in a patient.

Later work by Boyd, Jensen, et al improved on the subjectivity of Wolfe’s work but was still based on observer decisions [3]. A completely automated system for estimating the amount of radiodense tissue indicated in mammogram images has considerable potential to provide a rapid, objective and cost-effective procedure for estimating breast cancer risk. There have been several attempts by previous researchers for arriving at such a system. Byng, Boyd, Fishell, Jong and Yaffe presented an interactive computer program, called the “Toronto”

method [4]; later, they proposed image properties that are useful for designing an automated technique [5].

More recently, Saha, Udupa, Conant, Chakraborty and Sullivan proposed an automatic system that requires initial knowledge of the region to be segmented [6]. The authors of this paper have previously presented initial work towards such an automatic system [7][8]. To date, a technique for a completely automated process that does not rely on subjective evaluations is not to be found. The problem is exacerbated by the fact that no objective measures of breast density exist.

There are many issues that must be taken into consideration when analyzing a mammography image. The main issue deals with the fact that the quality of the image depends mostly on the technician who acquires the image. Factors such as kilovoltage peak, milliampere-seconds, anode/filter combinations, focal spot, positioning, and compression all play a pivotal role in the contrast, sharpness, and resolution of an image. Even if these factors can be neutralized with a somewhat automated procedure, there is still the issue of diverse patients. Patients come in all shapes and sizes and can affect the quality of the mammogram. Because of these varying sizes, the standardized compression value of the breast cannot always be met. The portion of the breast that is closer to the chest wall will always be compressed more, creating an artificial density that will alter the image.

This paper draws upon previously published work in this area and explores novel mechanisms for objectively quantifying radiodense tissue density in a mammogram. Unlike previous work done by [7] where only one global threshold is chosen for tissue segmentation, the current model will vary the threshold spatially depending on parameters that are dependant on the size of the breast mass.

II. APPROACH

The analysis of the breast tissue requires several non-trivial procedures before the actual percentages can be estimated. The block diagram in Fig. 1 outlines the procedure.

A. Tissue Segmentation using Radial Basis Function Neural Network

The first procedure in the automated process is the creation of a mask, procedure 1 in the block diagram

Fig. 1. Block Diagram of Breast Segmentation Procedure

shown in Fig. 1. The mask segments the breast tissue from the rest of the X-ray, but it also provides a reference for the amount of pixels located within the breast tissue, which will be used later on to determine the percentages.

The mask template is a binary matrix of size theoretically equal to that of the original image.

The initial step in the segmentation algorithm is to perform an adaptive thresholding technique throughout the image. The image is subdivided into 100 x 100 pixel regions. The adaptive thresholding algorithm predicts the likelihood that a given region lays within the breast tissue, within the film, or on the boundary. The original image consists of approximately 1,000 coarse boundary points. This coarse image is then sub sampled and used as the input to a radial basis function (RBF) neural network. The output of the RBF neural network is an optimal prediction of the breast tissue edge, which is used to generate the binary segmentation mask. This mask is then placed over the original image and effectively removes all information outside of the breast tissue region. This process is outlined in Fig. 1.

(a) (b)

(b) (d)

Fig. 2. Mask generation technique showing (a) original image, (b) tissue/film boundary, (c) mask template and (d) segmented image for a typical mammogram.

B. Tissue Location Identification using Gaussian Parametric Model

There are three variables unknown after the segmentation of the breast tissue. These variables include the global threshold and the functions modeling the chest wall boundary and the edge of the breast. Fig. 3 represents the single curve to represent the edge boundary of the breast.

Fig. 3. Chest wall and Breast Boundary Functions

In order to represent the tissue location within the mammogram, a family of curves needs to be created between the breast edge and the chest wall. The curves will represent the changes in perceived density due to the changing compression in a mammogram. To create the curves, the homotopy continuation algorithm is used to interpolate the two functions, the chest wall (fchest) and the breast edge (fedge). The function used to create the curves if shown below:

(1)

Once applied, the equation produces the family of curves, demonstrated in Fig. 4 with N = 25 and using the fchest and fedge as shown in Fig. 3. This is assumed as a representation of the contour map of the pressure distribution caused by tissue compression.

Fig. 4 Family of curves generated between the chest wall and the edge of the breast.

A second model has been systematically developed to represent an alternate version of the contour map of the pressure distribution. As before, at this stage of the algorithm, there are three major variables known, including the function representing the chest wall, the function representing the edge of the breast, and the CNPA threshold, TCNP. However, in the previous method, the homotopy continuation algorithm relies on the difference of the two functions (chest wall and edge of breast). This reliance causes the endpoints to be bounded at 0, where the difference between fchest and fedge will always subtract to be 0. A method has been developed that is not hindered by this bounding effect. The idea lies behind the notion that when converting the fchest and fedge functions into polar coordinates, there will be no bounding effect. The following equation are used to convert from Cartesian coordinates to polar coordinates

(2)

(3)
where rchest is the polar coordinate equivalent of fchest, redge is the polar coordinate equivalent of fedge and θ represents the angles associated to rchest and redge. rchest will always be associated to a vector of 0’s because of the inherent shape when represented in the Cartesian coordinates. There is no need to illustrate rchest because it is implicit in nature. However a better understanding of redge is acquired when interpreting the illustrations as seen in Figure ().

A family of curves is desired between the chest wall and the edge of the breast tissue. This family of curves is now generated by performing mathematical operations in the polar coordinate system. As in the parametric model for tissue location using Cartesian coordinates, within each of the family of curves, the threshold can be modified dependent upon how close the curve is to the chest wall (or away from the edge of the breast). This modification of the threshold can account for artifacts introduced by tissue compression. The homotopy continuation algorithm is proven to be a valid source when interpolating between any two arbitrary functions. In this case, the two arbitrary functions are the chest wall, rchest, and the edge of the breast, redge.

(a) (b)

(c) (d)

Fig. 5 (a) fedge of Image 11051702 with (b) redge of Image 11051702 and (c) fedge of Image 11599502 with (d) redge of Image 11599502.

Mathematically, the homotopy continuation algorithm is represented as the following in the polar coordinate system

. (5)

where rt represents the family of curves generated in the polar coordinate system, rchest represents the chest wall, redge represents the edge of the breast tissue, N is the number of curves desired, and t is a scalar that ranges from 0 to N. At t = 0, and at t = N, . As t increases from 0 to N, rt is represented as a incremental fraction of redge. Application of Equation 5 develops a family of curves and an example is shown in Fig. 6 as represented in the polar coordinate system.

Fig. 6. Family of curves generated between the chest wall and the edge of the breast tissue as represented in the polar coordinate system.

After the family of curves is generated, the next step in this process is to convert all information back to into the Cartesian coordinate system so the information can be applied on digitized mammograms. The equations to convert back to Cartesian coordinates are as follows

Applying Equations 6 and 7 will yield the family of curves generated by the parametric model for tissue location using Polar coordinates and is illustrated in Fig. 7 As assumed, there were no bounding effects as in the parametric model for tissue location using Cartesian coordinates.

Fig. 7. Family of curves between chest wall and the edge of the breast.

C. Parametric Model for Tissue Compression

The global threshold, TCNP, as determined using the CNPA, is used as the baseline for the threshold to distinguish between radiolucent and radiodense tissue indications. Two methods are described above to account for the physical location of any pixel intensity, f(x,y), with dependence on how close that pixel is to the chest wall. The primary assumption made for the effect of tissue compression is that pixels closer to the chest are affected to a great extent by tissue compression because that is where tissue compression is the greatest. This is illustrated in Fig. 8 and Fig. 9.

Fig. 8. Placement of the breast for a CC view in between compression plates before tissue compression is applied.

Fig. 9 Placement of the breast for a CC view in between compression plates after tissue compression is applied.

There is a greater amount of breast tissue at the chest wall and ychestbefore is always greater than yedgebefore. Therefore, upon tissue compression, the difference between ychestbefore and ychestafter is always going to be greater than yedgebefore and yedgeafter. This implies that the threshold must be relatively higher at the chest wall due to the higher amount of tissue compression.

The global CNPA threshold, TCNP, can be modified dependent on which curve the pixel being analyzed lies on or between. The amount of modification must hold true to the assumption that the threshold will be higher at the chest wall and the threshold will decrease towards the edge of the breast tissue region. A function must be applied through the midpoint of the family of curves to indicate a particular value of how much TCNP should be modified. This function is dependent on the amount of tissue compression that has been enforced in the mammography procedure. The threshold shall be higher towards the chest wall where tissue compression is the highest and lower at the edge of the breast where tissue compression is at a minimum. Any function can be represented as a sum of Gaussians; therefore, the function incorporated is the Gaussian. This decision is based on the assumption that the pressure distribution will be modeled by a sum of Gaussians. The Gaussian function also follows the assumptions set forth due to the effect of tissue compression in that it starts off at a higher amount and decreases outward. The parametric model for tissue compression that will modify the threshold is represented as the following function,

. (8)

where x is the distance from the chest wall to the pixel being analyzed (measured in pixels), Tv is the threshold at position x, k is the scaling factor for the Gaussian, TCNP is the constrained Neyman-Pearson global threshold, and σ2 is the variance of the Gaussian. This value decays with dependence on the variance as x approaches the edge of the breast, xe. At the chest wall, where x = 0, the threshold being applied is

. (9)

At the edge of the breast tissue, where x = xe, the threshold being applied is

. (10)

Choice of the two parameters, A and B, will determine the necessary variance for the Gaussian being applied. This is shown below and proves that the selection of the A and B values will allow for an automatic calculation of the trend of the Gaussian function.

. (11)

After the selection of the A and B values and the calculation of the variance, the appropriate Gaussian function is determined. An example of a Gaussian with xe = 508 (pixels), TCNP = 0.563, A = 1.6 (TCNP) and B = 0.9 (TCNP) is illustrated in Fig. 9. The family of curves generated for this particular image was developed with N = 25, stating that there will be 25 curves between the chest wall and the edge of the breast tissue. The Gaussian, as seen in Fig. 9, now acts as a look-up table. There are 25 points captured from the Gaussian. Each point, as denoted in Fig. 9 with an asterisk, represents the threshold applied to each of the respective 25 curves shown on the x-axis labeled as the curve indices. For example, the first and second curves, those closest to the chest wall, are assigned

Fig. 9. Threshold modeled as a Gaussian function.

the highest two gray-level threshold values, being approximately 0.9 and 0.89, respectively, while the last curve (at N = 25), is assigned the lowest of the gray-level threshold values, being approximately 0.5. These values of the Gaussian, once applied to the entire family of curves generated using the spatially varying threshold model, appear as in Fig. 10. Per visualization restraints, only a sub-sample of 5 of the N = 25 curves are shown. Fig. 10 demonstrates that each of the curves is assigned with the appropriate value of the Gaussian generated in Fig. 10. However, in between the curves, the gray-level threshold is not being assigned and is therefore set to a value of 0. This is not yet acceptable to be applied as the threshold model to the digitized mammogram. This threshold model must be interpolated in between each of the curves to generate a non-zero gray-level threshold in between the curves. One option would be to allow for an

Fig. 10. Threshold profile after assigning values of the Gaussian function to the parametric model for tissue location using Cartesian coordinates.

infinite amount of curves to be generated between the chest wall and the edge of the breast tissue, but the computation time would be unacceptable. The fluctuation between curves, from the gray-level threshold value assigned by the Gaussian to the zero value can be seen as high-frequency noise. A common method to reduce high-frequency noise is with local averaging. This is implemented by convolving the signal with the rectangular pulse. This is called a moving-average filter. The gray-level at each pixel is replaced with the average of the gray-levels in a square or rectangular neighborhood. Applying

a moving average filter to the image in Fig. 10 will interpolate between curves and the output is shown in Fig. 11. Investigation of Fig. 12 indicates that the pixels locations in between curves have been adequately modified to follow the Gaussian function distribution.

Fig .11 An example of a spatially varying threshold model.

D. Density Estimation

The image shown in Fig. 11 represents the spatially varying threshold model that will be applied to that particular digitized mammogram. The gray-level value of the pixel at any location (x,y) where x can range from 1 to m and y can range from 1 to n, where m is the total number of columns in the digitized mammogram image and n is the total number of rows, is compared with the gray-level value in the final spatially varying threshold model at the exact same location, (x,y). If the gray-level value of in the digitized mammogram image, f(x,y), is greater in intensity than the gray-level value in the threshold model, Tv(x,y), then that pixel is assigned a value of 1 (white) and is considered to be radiodense tissue. However, if the intensity of the gray-level is less than in the threshold model, that pixel is assigned a value of 0 (black) and is considered to be radiolucent. This essentially creates a binary matrix with pixels at a value of 1 representing radiodense tissue and pixels at a value of 0 representing radiolucent tissue. The percentage of radiodense tissue can be determined using