Mammogram Classification
Mammogram Classification
Mammogram Classification
Project Report
1
Mammogram Classification
Mammography is one of the most popular tools for early detection of breast cancer. Early
diagnosis of this illness plays a key role in decreasing its mortality and improves its prognosis.
Currently, mammography is considered as the standard examination for detection of breast
cancer. However, the identification of breast abnormalities and the classification of masses on
mammographic images are not trivial tasks for dense breasts.
Introduction
A Mammogram is an x-ray of the breast tissue which is designed to identify abnormalities. A
mammogram can often find or detect breast cancer early, when it’s small and even before a lump
can be felt. This is when it’s easiest to treat. Mammograms can be used for two purposes:
A screening mammogram is used to look for signs of breast cancer in women who don’t
have any breast symptoms or problems. X-ray pictures of each breast are taken, typically
from 2 different angles.
Mammograms can also be used to look at a woman’s breast if she has breast symptoms or
if a change is seen on a screening mammogram. When used in this way, they are
called diagnostic mammograms. They may include extra views (images) of the breast
that aren’t part of screening mammograms. Sometimes diagnostic mammograms are used
to screen women who were treated for breast cancer in the past.
What Information can be extracted from these mammograms. The radiologist looks for different
types of breast changes, such as small white spots called calcifications, larger abnormal areas
called masses, and other suspicious areas that could be signs of cancer.
Calcification
Cysts are fluid-filled sacs. Simple cysts (fluid-filled sacs with thin walls) are not cancer and
don’t need to be checked with a biopsy. If a mass is not a simple cyst, it’s of more concern, so a
biopsy might be needed to be sure it isn’t cancer.
2
Solid masses can be more concerning, but most breast masses are not cancer.
A cyst and a solid mass can feel the same. They can also look the same on a mammogram. The
size, shape, and margins (edges) of the mass can help to decide how likely it is to be cancer.
Challenges in finding Cancers on Mammograms
Breast density is based on how fibrous and glandular tissues are distributed in your breast,
compared to how much of your breast is made up of fatty tissue. Dense breasts are not abnormal,
but they are linked to a higher risk of breast cancer. Dense breast tissue can also make it harder
to find cancers on a mammogram.
Tumor
A tumor is an abnormal lump or growth of cells. When the cells in the tumor are normal, it is
benign. Something just went wrong, and they overgrew and produced a lump. When the cells are
abnormal and can grow uncontrollably, they are cancerous cells, and the tumor is malignant.
When they are harmless it is Benign.
Objective
The classification of benign and malignant patterns in digital mammograms is one of most
important and significant processes during the diagnosis of breast cancer as it helps detecting the
disease at its early stage which saves many lives. The main aim of this project is to propose a
solution for tumor classification based on developed learning models. The solution can classify
cancerous cells either Normal, Malignant or Benign.
3
Design
Our proposed design deal with four main sections Preprocessing, Segmentation of ROI,
Features Extraction and Classification. By this classification process it can be found if the
cancer cells will spread or not. For training and testing purpose we took dataset of mammogram
images from MIAS dataset. With our proposed system when an a is given as input, it will go
through the pre-processing, segmentation and then will be classified as its given class. The
mammogram image can be seen and analyzed through every stage. The basic design of our
system shows diagrammatically below:
Image Database
Pre-Processing
Segmentation
Features
Extraction
Classification
Implementation
We have implemented the system in MATLAB 2015. All detailed description of every step is given
below:
Data Set
Our dataset consists of distinct mammograms pictures taken from the Mammographic Image
Analysis Society (MIAS) is an association of UK investigation organizations engaged with the
acknowledgment of mammograms additionally has made a database of computerized
mammograms. The database holds 322 digitized mammograms also been decreased to a 200-
micron pixel frame and the images are 1024x1024. A community that consists of a sum of 321
4
mammographic images, 206 signify normal images, 51 malignant including 62 benign images.
The models in the MIAS database are collected in the Portable Gray Map(.pgm) setup. Each
image is 8-bit grey level range images among 256 various grey levels (0– 255).In this analysis
utilized 40 benign, 40 normal also 40 malignant mammograms which are thick, fat and fatty
glandular moreover have irregularities like surrounded asymmetry, masses,, ill-defined masses
moreover all the mammograms are changed toward .JPG format.
Steps in Implementation
We have implemented the steps shown in design phase. But we implemented them with
following approaches.
Pre-processing is done with Adaptive Median Filter.
Segmentation is done with a novel approach as Gaussian based Hidden Markov
Random Filed with Expectation Maximization.
The difference between normal tissue and cancerous tissue is very small in some cases.
So, the features of the tumor area in the image have key importance for automatic
classification. Using only one feature or using a few features leads to poor classification
results because of the small difference between the textures. So, for Feature extraction is
done with GLCM algorithm.
In training phase, the extracted features discovered are then passed to Probabilistic
Neural Network Classifier for Classification.
A workflow of approaches used for classification
5
1. Pre- Processing
Pre-Processing is very important step to adjust and correct the mammogram image for further study and
analysis. The goal of pre-processing is to enhance the signal to noise ratio between masses and normal
breast tissues in mammograms by using techniques such as filtering. Different filters are available for
image enhancement and noise reduction.
6
similar, is labeled as noisy pixel. These noisy pixels are then exchange by the median
value of the pixels in the neighborhood that have passed the noise labeling test.
Adaptive median filter changes the size of the neighborhood window through operation.
But, in classic median filter; the neighborhood window is constant through the operation.
For that, the standard median filter does not perform well when the impulse noise density is high,
while the adaptive median filter can better handle these noises. Also, the adaptive median filter
preserves Mammogram image details such as edges and smooth non-impulsive noise, while the
standard median filter does not.
Implementation
7
We have given an imput image labeled as selected image and applied an adaptive filter through
which we obtained a better de-noised image for furthur processing steps.
2. Segmentation
Image segmentation divides an image into regions such that pixels within a region are
homogeneous with similar properties based on some predefined condition . Mammogram image
segmentation is one of the critical and challenging tasks. Segmentation methods are challenged for
Mammogram images as the tumors area to be segmented have non-rigid anatomical structure,
complex shape which varies in size and position among images.
Challenges in Mammogram image segmentation
A tumor is a pathology occupying a certain area, ranging from medium-grey to white shades in
the mammogram. The smallest tumors visible in mammograms are approx. 0.5 cm in diameter.
The most significant features indicating whether the tumor is malignant or benign are its shape
and the nature of its margins. Digital mammograms frequently contain strong noise that’s why
pre-processing also done in order to enhance the process of segmentation while cancerous
tumors are of varied shapes and appearances. Furthermore, the contrast of suspicious-looking
regions of mammograms is frequently low and heterogeneous, and the margins between masses
are fuzzy and difficult to identify. All of this means that the segmentation of the lesions is an
important and frequently very difficult task.
A Gaussian based HMRF-EM approach towards Mammogram Images
Segmentation
Gaussian Mixture Model (GMM) has been widely applied in image segmentation. However,
the pixels themselves are considered independent of each other, making the segmentation
result sensitive to noise. To overcome this problem for the segmentation, process a proposed
8
method where a mixture model using Markov Random Filed (MRF) that aims to incorporate
spatial relationship among neighborhood pixels into the GMM. The proposed model has a
simplified structure that allows the Expectation Maximization (EM) algorithm to be directly
applied to the log-likelihood function to compute the optimum parameters of the mixture
model.
The tumors with regular shape like
round and oval frequently indicate a benign change whereas
irregular shaped masses normally tells about malignancy
The tumors with regular shape like
round and oval frequently indicate a benign change whereas
irregular shaped masses normally tells about malignancy
The tumors with regular shape like
round and oval frequently indicate a benign change whereas
irregular shaped masses normally tells about malignancy
The tumors with regular shape like
round and oval frequently indicate a benign change whereas
irregular shaped masses normally tells about malignanc
In general, algorithms used for the detection and segmentation of masses as well as their further
possible classification can be divided into two approaches: supervised segmentation and
unsupervised segmentation.
Supervised segmentation mainly includes model-based methods. Model-based methods
use previously acquired (e.g., defined or learned) knowledge of objects and background
regions that are being segmented. Previous knowledge is used to determine whether
specific regions occur in the image or not. Supervised segmentation methods also
include template matching approaches, in which the training set contains templates or
patterns of objects that can be detected. Unfortunately, the main limitation of model or
template-based methods is their reduced effectiveness in case of irregular masses with
speculated margins that are difficult to distinguish.
Unsupervised segmentation methods work by dividing the image into areas that are
different or uniform about defined features, such as grey levels, their texture or colour.
9
Working
The following steps has been followed in segmentation of the ROI from the image
GMM has been used to model intensity distribution of tissues and tumor in Mammogram
images where its parameters are represented using mean, covariance and Gaussian
components
Each Gaussian component represents individual object. Estimated parameters of GMM
are then given to HMRF-EM framework to predict class label.
Parameters of HMRF model are mean and standard deviation of each GMM. MAP and
EM algorithms have been used to learn HMRF parameters and class labels alternatively
as both are dependent on each other. Parameters are learned by maximizing probability of
class labels and by minimizing total posterior energy
The basic idea of the EM algorithm is to begin with an initial model, to estimate a new
model. The new model then becomes the initial model for the next iteration and the
process is repeated until some convergence condition is satisfied. In our experiment when
no significant change in total energy observed or maximum EM iterations are reached,
algorithm converges. EM algorithm has been used in our experiment to estimate the
parameter set for HMRF model.
As our approach is to optimize the initial conditions to be used by the HMRF-EM. When
initial condition is very different than normal, mainly initial segmentation then EM may
not give proper results. Hence, to optimize initial conditions k-means clustering is used to
generate initial clusters and for approximation of intensity distribution in each segment
Gaussian Mixture Model has been used instead of single Gaussian component.
Steps of Gaussian HMRF-EM the above described framework can be simplified as:
Perform initial intensity-based clustering using k-means depending upon number of
clusters specified.
Estimate intensity distribution of Mammogram image using Gaussian Mixer Model with
parameter set
Estimate the class labels by HMRF-MAP estimation
Update parameter set using EM algorithm
Do MAP estimation, such that minimizes the total posterior energy
Repeat step 4 till maximum EM iterations reached or there is no significant change in the
Then we binarize the images showing dense tumor area and background.
The values for Gaussian components, Number of iterations, beta and k-clusters are set according
to [1].
Implementation
10
3. Feature Extraction
The feature extraction and selection from an image plays a critical role in the performance of any
classifier. Features, characteristics of the objects of interest, if selected carefully are
representative of the maximum relevant information that the image has to offer for a complete
characterization a lesion. Feature extraction methodologies analyze objects and images to extract
the most prominent features that are representative of the various classes of objects. Features are
used as inputs to classifiers that assign them to the class that they represent.
What kind of features helpful in Mammogram Images?
Because digital mammography images are specific, not all visual features can be used to
correctly describe the relevant image patch. All classes of suspected tissue are different by their
shape and tissue composition. Therefore, the most suitable visual feature descriptors for this kind
of images are based on shape and texture. Textural features remain the best type of feature to be
extracted from gray level images such as mammographic images, this is because these variables
constitute texture which are: difference in gray level values; coarseness (scale of gray level
differences); and directionality or regular pattern or lack of it. We can use different feature
extraction methods and test them on variety of classifiers. That’s why we use GLCM features
extractor. It is widely used in much texture analysis application.
Texture Features
Texture is a repeated pattern of information or arrangement of the structure with regular
intervals. In a general sense, texture refers to surface characteristics and appearance of an object
given by the size, shape, density, arrangement, proportion of its elementary parts. Texture
classification produces a classified output of the input image where each texture region is
identified with the texture class it belongs. Texture can be defined as
11
Structural texture is a set of primitive texels in some regular or repeated relationship.
Statistical Texture is a quantitative measure of the arrangement of intensities in a region.
This set of measurements is called a feature vector.
Gray Level Co-Occurrence Matrix (GLCM)
The GLCM is a tabulation of how often different combinations of gray levels co-occur in an
image or image section. The GLCM contains a square matrix of size N × N, where N is the
number of different gray levels in an image. An elements p (i, j, d, θ) of an image indicate the
relative frequency, where i is the gray level of pixel p at location (x, y) and j is the gray level of a
pixel located at d distance from p in the orientation θ. The texture information is used to classify
the lump to be either benign or malignant.
The 22 features are extracted from the GLCM as follows:
1. Energy
2. Entropy
3. Dissimilarity
4. Contrast
5. Inverse difference
6. Correlation
7. Homogeneity
8. Autocorrelation
9. Cluster shade
10. Cluster prominence
11. Maximum probability
12. Sum of squares
13. Sum average
14. Sum variance
15. Sum entropy
16. Difference variance
17. Information variance of correlation
18. Difference entropy
19. Information measure of correlation
20. Maximal correlation coefficient
21. Inverse difference normalized
22. Inverse difference moment normalized.
Implementation
After Applying GLCM, we have got following features:
12
4. Classification
The basic of our proposed solution is to classify a tumor segmented from mammogram image as
Normal, Malignant or Benign. The extracted features are used as input to the classifier.
13
Implementation: Training and Testing
Classification is further divided into
Training phase
Testing phase
In training phase, the database of 120 images including Normal, Malignant and Benign is
loaded and used to train PNN classifier.
In testing phase the PNN is tested with the rest of database and shown a good accuracy, further
input image can be classified into either of the class.
Design
(a)
14
(b) Input image in (a) classified as Malignant
In the above design option Load Database load the database of mammogram images and done
its processing till segmentation.
Then Train Classifier option train the loaded database and train the PNN classifier.
The further options can be use for testing with the left dataset, Image can be given and can
perform all operations filtration, Segmentation and then can be tested as to classify the class as
either Normal, Benign or Malignant.
Conclusion
We have proposed solution for classification of tumor in mammogram images. We done it with
help of image dataset from MIAS dataset. But it has one limitation, the size of images is small.
The chosen classifier PNN was useful for it as has one big advantage over other classifiers it has
faster training and give better results when a new image is given as input. The pre-processing
and segmentation and features extraction techniques came out to be helpful because they are the
input for classifier and classifier has given a better performance.
As it has discussed breast cancer needs to detect early and should treat with state-of-art treatment
for cancer, for early detection mammogram is useful tool. Breast cancer that’s found early, when
it’s small and has not spread, is easier to treat successfully. But more better strategies need to be
designed and work on for preventing deaths from breast cancer.
15
References
16