0% found this document useful (0 votes)
78 views11 pages

Tooth Detection From Panoramic Radiographs Using Deep Learning

The document discusses using a deep learning algorithm called Mask R-CNN to detect teeth in panoramic dental radiographs. Mask R-CNN was trained on a dataset of annotated dental radiographs labeled according to the FDI tooth numbering system. The algorithm was able to accurately segment and number teeth in test images, detecting even missing teeth. Prior works on tooth detection used methods like Fully Convolutional Neural Networks and Faster R-CNN that were less accurate than Mask R-CNN, particularly in detecting missing teeth.

Uploaded by

Shweta Shirsat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views11 pages

Tooth Detection From Panoramic Radiographs Using Deep Learning

The document discusses using a deep learning algorithm called Mask R-CNN to detect teeth in panoramic dental radiographs. Mask R-CNN was trained on a dataset of annotated dental radiographs labeled according to the FDI tooth numbering system. The algorithm was able to accurately segment and number teeth in test images, detecting even missing teeth. Prior works on tooth detection used methods like Fully Convolutional Neural Networks and Faster R-CNN that were less accurate than Mask R-CNN, particularly in detecting missing teeth.

Uploaded by

Shweta Shirsat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Tooth Detection From Panoramic Radiographs Using

Deep Learning
Shweta Shirsat and Siby Abraham
University Department of Computer Science, University of Mumbai, India,
Pin Code:-400098. shwetadshirsat@gmail.com
Center of Excellence in Analytics and Data Science, NMIMS Deemed to be University,
Mumbai, India, Pin Code:-400056. siby.abraham@nmims.edu

Abstract. The proposed work aims at implementing a Deep Convolutional Neural


Network algorithm specialized in object detection. It was trained to perform tooth
detection, segmentation, classification and labelling on panoramic dental radiographs.
A dataset of dental panoramic radiographs was annotated according to the FDI tooth
numbering system. Mask R-CNN Inception ResNet V2 object detection algorithm was
able to give excellent results in terms of tooth segmentation and numbering. The
experimental results were validated using standard performance metrics. The method
could not only give comparable results to that of similar works but could detect even
missing teeth, unlike similar works.

Keywords: Tooth Detection, Deep Learning, Neural Network, Transfer Learning, Radiographs

1. Introduction
Deep Learning has become a buzzword in the technological community in recent
years. It is a branch of Machine Learning. It is influenced by the functioning of the
human brain in designing patterns and processing data for decision making. Deep
neural networks are suitable for learning from unlabelled or unstructured data. Some
of the key advantages of using deep neural networks are their ability to deliver
high-quality results, eliminating the need for feature engineering and optimum
utilization of unstructured data[1]. These benefits of deep learning have given a huge
boost to the rapidly developing field of computer vision. Some of the new
applications of deep learning in computer vision are image classification, object
detection, face recognition and image segmentation. An area that has achieved the
most progress in object detection.
The goal of object detection is to determine which category each object belongs to
and where these objects are located. The four main tasks in object detection include
classification, labelling, detection and segmentation. Fast convolutional neural
networks(Fast-RCNN), Faster Convolutional Neural Network(Faster-RCNN) and
Region-based convolutional neural networks(R-CNN) are the most widely used deep
learning-based object detection algorithms in computer vision. Mask R-CNN, an
extension to faster-RCNN[2] is far superior to others in terms of detecting objects and
generating high-quality masks. Mask-RCNN architecture is represented in fig.1
2

Fig 1. Mask R-CNN architecture[3]


Mask-RCNN is also used for medical image diagnosis in locating tumours, measuring
tissue volumes, studying anatomical structures, lesion detection, planting surgery, etc.
Mask-RCNN uses the concept of transfer learning to drastically reduce the training
time of a model and lower generalization results. Transfer learning is a method where
a neural network model is trained on a problem that is similar to the problem being
solved. These layers of the trained model are then used in a new model to train on the
problem of interest. Several high-performing models can be used for image
recognition and other similar tasks in computer vision. Some of the pre-trained
transfer learning models include VGG, Inception and MobileNet[4].
Deep Learning is steadily finding its way to offer innovative solutions in the
radiographic analysis of dental X-ray images. In dental radiology, the tooth
numbering system is the format used by the dentist for recognizing and specifying
information linked with a particular tooth. A tooth numbering system helps dental
radiologists identify and classify the condition associated with a concerned tooth. The
most frequently used tooth numbering methods are the Universal Numbering System,
Zsigmondy-Palmer system, and the FDI numbering system[4].
3

Fig 2. FDI tooth numbering system on panoramic dental radiograph[10]

The annotation method used in this research work primarily focuses on the
FDI(Federation Dentaire Internationale) notation system(ISO 3950)[9]. The FDI tooth
numbering system is an internationally recognized tooth numbering system where
there are 4 quadrants. Maxillary right quadrant is quadrant 1, Maxillary left quadrant
is quadrant 2, the mandibular left quadrant is quadrant 3 and the mandibular right
quadrant is quadrant 4. Each quadrant is recognized from number 1 to 8. For
example, 21 indicates maxillary left quadrant(quadrant 2) third teeth known as a
central incisor. Deep neural networks can be used with these panoramic radiographs
for tooth detection and segmentation using different variations of convolutional neural
networks. Till date, most of the research work tackled the problem of tooth
segmentation on panoramic radiographs using Fully Convolutional Neural Network,
Faster R-CNN and Mask R-CNN inception V2 COCO model. The annotations
required for dental radiographs were done manually with the help of a dental
specialist. Along with tooth segmentation, this proposed research can perform tooth
detection, tooth numbering and missing tooth detection on a dental radiograph. To the
best of our knowledge, this is the first work to use Mask R-CNN Inception ResNet V2
trained on the COCO 2017 capable of working on Tensorflow version 2 object
detection API. It guarantees to give better results in terms of performance metrics
used to check the credibility of the Deep Convolutional Network algorithm used.

2. Related Works
There were a few attempts to apply deep learning techniques for teeth detection and
segmentation.
Thorbjorn Louring Koch et.al [1] implemented a Deep Convolutional Neural
Network (CNN) developed on the U-Net architecture for segmentation of individual
teeth from dental panoramic X-rays. A single neural network reached the dice score of
0.934 where 1201 radiographic images were used for training, forming an ensemble
that increased the score to 0.936.
4

Minyoung Chung et. al[4] demonstrated a CNN-based individual tooth


identification and detection algorithm using direct regression of object points. The
proposed method was able to recognize each tooth by labelling all 32 possible regions
of the teeth including missing ones. The experimental results illustrated that the
proposed algorithm was best among the state-of-the-art approaches by 15.71 % in the
precision parameter of teeth detection.
Mircea Paul Muresan et. al[5] suggested the semantic segmentation of the
X-Ray radiographs using the Efficient Residual Factorized Convolutional Network.
The implementation was evaluated based on metrics like intersection over union for
the semantic segmentation and accuracy, precision, recall, and F1-score for the
generated bounding box detections. Furthermore, they suggested increasing the
accuracy of the proposed solution and included more semantic classes.
Dmitry V Tuzoff et.al [2] used the state-of-the-art Faster R-CNN
architecture. The FDI tooth numbering system was used for teeth detection and
localization. A classical VGG-16 CNN along with a heuristic algorithm was used to
improve results according to the rules for the spatial arrangement of teeth.
Hu Chen et. al[3]proposed a faster R-CNN architecture that performed pretty
well regarding teeth detection, which located the position of teeth precisely with a
high value of IOU with ground truth boxes. Results demonstrated that both precisions
and recalls exceeded 90% and the mean value of the IOU between detected boxes and
ground truths also reached 91%.
Shuxu Zhao et. al[6] proposed the Mask R-CNN, for classification branches
and segmentation branches. The results showed that the method achieved more than
90% accuracy in both tasks.
Gil Jader, et. al[8] an application of the Mask Recurrent Convolutional
Neural Network was used to demonstrate the power of a deep learning method to
segment dental radiographs from a dataset of 1500 images. Various methods such as
Region growing, Splitting and merging, basic global threshold, ink-black method,
fuzzy c-means, canny, Sobel, active contour without edges, level set method, and a
watershed method were also used to deliver better results.
Gil Jader, et. al[9] they proposed Mask R-CNN for teeth instance
segmentation by training the system with only 193 dental panoramic images of
containing 32 teeth on average, they achieved an accuracy of 98%, F1-score of 88%,
precision of 94%, recall of 84%, and 99% specificity.
Gil Jader, et.al [10] performed a study of tooth segmentation and numbering
on panoramic radiographs using an end-to-end deep neural network. The proposed
work used Mask R-CNN for instance segmentation of teeth in X-ray images. The
calculated accuracy using M-RCNN was 98%, F1-score of 88%, precision of 94%,
recall of 84% and specificity of 99% of over 1224 radiographs.
Gil Jader, et. al[11] also proposed a segmentation system based on the Mask
R-CNN and transfer learning to perform an instance segmentation on dental
5

radiographs. The system was trained with 193 dental radiographs having a maximum
32 teeth. Accuracy achieved was 98%.

3. Methodology
Figure 3 represents Mask R-CNN applied on a set of dental radiographs to perform
tooth identification and numbering.

Fig 3: Mask R-CNN architecture for tooth segmentation and numbering


3.1 Data Collection
To train a robust model we needed a lot of images that should vary as much as
possible. The dataset of panoramic dental radiographs was collected from Ivison
dental labs(UFBA_UESC_DENTAL_IMAGES_DEEP)[10]. In the dataset, the height
and width of panoramic X-ray images ranged from 1012–1504 pixels and 2093 to
3432 pixels respectively. The x and y-axis spacing in the image ranged from 0.07 to
0.10 mm. These radiographs were then resized to a fixed resolution of 800*600. The
suitable format to store the radiographs was JPG.
3.2 Data Annotation
After collecting the required data these radiographs have to be annotated as per the
FDI tooth numbering system. Instead of annotating only existing teeth in a
radiograph, we annotated all 32 teeth including missing teeth. A JSON/XML file will
be created for each radiograph representing manually defined bounding boxes, and a
ground truth label set for each bounding box. Though there are a variety of annotation
tools available such as the VGG Image annotation tool, labelme, and Pixel Annotation
Tool[8]. The proposed work uses labelme software because of its efficiency and
simplicity. All annotations were verified by a clinical expert in the field.
3.3 Data Preprocessing
In addition to the labelled radiographs a TFRecord file needs to be created that can be
used as input for training the model. Before creating TFRecord files, we have to
convert the labelme labels into COCO format, as we have used the same as the
pre-trained model. Once the data is in COCO format it is easy to create TFRecord
files.[9]
6

3.4 Object Detection Model


The below steps illustrate how the Mask R-CNN object detection model works:-
a. A set of radiographs was passed to a Convolutional Neural Network.
b. The results of the Convolutional Neural Network were passed through to a
Region Proposal Network (RPN) which produces different anchor boxes
known as ROI(Regions of Interest) based on each occurrence of tooth
objects being detected.
c. The Anchor boxes were then transmitted to the ROI Align stage. It is
essential to convert ROI’s to a fixed size for future processing.
d. A set of fully connected layers will receive this output which will result in
the generating class of the object in that specific region and defining
coordinates of the bounding box for the object.
e. The output of the ROI Align stage is simultaneously sent to CNN’s to create
a mask according to the pixels of the object
● Hyper Parameter Tuning:-
The training was performed on dental radiographic images having 32 different objects
that were identified and localized. The hyperparameter values of the object detection
models are: Number of classes=32; image_resolution
=512*512;mask_height*width=33*33;standard_deviation=0.01;IOU_threshold=0.5;S
core_converter=Softmax;batch_size=8; No.of steps=50,000;
learning_rate_base=0.008. The parameters like standard deviation, score_converter,
batch_size, no.of epochs and fine_tune_checkpoint_type were optimised.
3.5 Performance Analysis
To measure the performance accuracy of object detection models some predefined
metrics such as Precision, Recall and Intersection Over Union(IoU) are required[8].
● Precision
Precision is the capability of a model to identify only the relevant objects. It is the
percentage of correct positive predictions and is given by Precision = TP/(TP+FP)
● Recall
Recall is the capability of a model to find all the relevant cases (all ground truth
bounding boxes). It is the percentage of true positives detected among all relevant
ground truths and is given by: Recall = TP/(TP+FN).
Where,
TP=true positive is observed when a prediction-target mask pair has an IoU score
that exceeds some predefined threshold;
FP= false positive indicates a predicted object mask has no associated ground truth
object mask.
FN=false negative indicates a ground truth object mask has no associated predicted
object mask.
● IoU
Intersection over Union is an evaluation metric used to measure the accuracy of an
object detector on a specific dataset.
7

IoU = Area of Overlap/ Area of Union.

4. Experimental Results
The experiment was executed on a GPU (1xTesla K80 ), with 16GB memory and
1664 CUDA cores. The algorithm was running in the backend on TensorFlow version
2.4.1 having python version 3.7.3. When the training process was successfully
completed a precision of 0.98 and recall of 0.97 was displayed.
Table 1: Evaluation Metrics

Evaluation Metric Value


Precision 0.98
Recall 0.97
IoU 0.5
Along with evaluation metrics, a graphical representation of various loss functions
presented on the tensorboard is given in fig 4.

Fig 4: Graphical Representation


4.1 Classification Loss
Our tooth detection model is a multi-class classification problem where the detected
tooth belongs to one of the 32 classes. While we started training, classification loss
weight was mentioned as 1.0 and loss function was softmax cross-entropy also
referred to as logarithmic loss. Loss functions for classification represent the
inaccuracy of predictions in classification problems as shown in fig(4.1).
4.2 Localization loss
8

Localization loss is used to demonstrate loss between a predicted bounding box and
ground truth. As the training progresses, the localization loss decreases gradually and
then remains stable as illustrated in fig(4.2).
4.3 Mask Loss
The mask applied on the radiographs is propagated layer by layer and eventually
applied as per the loss. The data annotations must be done concretely for all the
images. To increase the confidence score of instance segmentation the mask loss has
to gradually decrease as it is shown in fig(4.3).
4.4 Total Loss
The total loss is a summation of localization loss and classification loss. The
optimisation model reduces these loss values until the loss sum reaches a point where
the network can be considered as fully trained.
4.5 Learning Rate
Learning Rate is the most important hyperparameter for this model which is shown in
fig(4.5). Here we can see there is a gradual increase in the learning rate after each
batch recording the loss at every increment. When entering the optimal learning rate
zone it is observed that there is a quick drop in the loss function.
4.6 Steps_per_sec
This hyperparameter is useful if there is a huge dataset with considerable batches of
samples to train. It defines how many batches of samples are used in one epoch. In
our tooth detection model, the total number of epochs was 50,000 with an interval of
100 as represented in fig(4.6).

5. Comparative Study
5.1 Comparison with Clinical experts
Some radiographs used in our research were of a lower quality and had a lot of
distortions. As a result, there were several complicated cases, missing teeth,
orthodontic treated teeth with premolars extracted, retained deciduous teeth,
embedded teeth, root canal treated teeth, implant restored teeth and teeth with crowns
and bridges which presented challenges. This poor quality of images resulted in
wrong detection of the teeth.
Table 2: Analysis of inaccurate detection
Inaccurate Detections Prediction from system Prediction from experts

Undetected bounding boxes 5 4

Confusion with similar teeth 8 6

Teeth not recognized 3 0

Confusion with missing 3 3


teeth
9

Missing teeth not recognized 4 0

Failure in complicated cases 5 3

Objects detected more than 0 0


grounding boxes
The results provided by the proposed model were asked to be verified by a clinical
expert for detected and undetected bounding boxes, confusion with similar teeth or
missing tooth labels, failure in complicated cases and objects detected more than
ground tooth. Table 2 represents the reasons for the poor quality of the data (column
1) and the number of such teeth predicted by the method (column 2) and the number
of such teeth identified by the experts. The table shows that the inaccuracies, though
small in number, are attributed to a large extent because of the poor quality of the data
and not the performance deficiency of the model.
5.2 Comparison with other works:-
Table 3 provides a comparative study of the effectiveness of the proposed model with
similar works. Comparison has been done by using seven criteria listed as column
titles of columns 2 to 9, of the table. It shows that the researchers, given in column 1,
were successful in implementing tooth segmentation and numbering using either fast
R-CNN, Faster R-CNN or Mask R-CNN.
The proposed work is different from others as it demonstrates the implementation of
the Mask R-CNN Inception ResNet V2 object detection model. Only one Mask
R-CNN Model is supported with TensorFlow 2 object detection API at the time of
writing of this paper[10]. The transfer learning technique was successfully
implemented using a pre-trained COCO dataset. Along with tooth detection, tooth
segmentation, tooth numbering we were also able to predict missing teeth. These
missing teeth are known as edentulous spaces from maxilla or mandible. M-RCNN
was able to correctly recognize edentulous spaces on a radiograph. This study
demonstrates that the proposed method is far superior to the other state of the art
models pertaining to tooth detection and localization. Also, there is a clear
understanding of poor detection, wherever it occurred though small in number, and its
verification is done by a clinical expert.
Table 3. Comparative Study

Deep Missing
Type of Transfer Tooth Tooth Tooth Comparison
Author Learning tooth
Radiograph learning Detection Segmentation Numbering with experts
method detection

Minyo
ung
Fast
Chung, Panoramic No Yes Yes Yes No No
R-CNN
Jusang
Lee[4]
10

Shuxu Mask
Zhao, RCNN,
Panoramic Yes Yes Yes Yes Yes No
Qing ResNet-
Luo[6] 101
Mask
Guohu
Digital RCNN,
a Yes Yes Yes Yes No No
X-rays ResNet-
Zhu[7]
101
Mask
R-CNN,
Gill
PANet,
Jader[9 Panoramic Yes Yes Yes Yes No Yes
HTC,
]
and
ResNeSt
Mask
Propos
R-CNN
ed Panoramic Yes Yes Yes Yes Yes Yes
Inceptio
work
n V2

6. Conclusion
In this study, the Mask R-CNN Inception ResNet V2 object detection model was used
to train a dataset of dental radiographs for tooth detection and numbering. It was
observed that the training results were exceptionally good especially in locating the
position of teeth with high IOU as well as good precision and recall. The visualization
results were considerably better than Fast R-CNN. The improvement of the neural
network architecture and object localization results were significant. Finally, the
performance of our proposed model was very close to the level of the human dentist
who was selected as a referee in this study. In future studies, we will consider working
with more advanced object detection models to detect potential caries, periodontal
bone loss and different periapical diseases.

References
1. Thorbjorn Louring Koch; Mathias Perslev; Christian Igel; Sami Sebastian Brandt,”
Accurate Segmentation of Dental Panoramic Radiographs with U-NETS”, Published
in Proceedings of 16th IEEE International Symposium on Biomedical Imaging,
Venice, Italy, (April-2019)
2. Dmitry V Tuzoff , Lyudmila N Tuzova , Michael M Bornstein, Alexey S Krasnov,
Max A Kharchenko, Sergey I Nikolenko, Mikhail M Sveshnikov, Georgiy B
Bednenko,” Tooth detection and numbering in panoramic radiographs using
convolutional neural networks”, Published in Conference of Dental Maxillofacial
Radiology in medicine, Russia, March-2019.
11

3. Hu Chen, Kailai Zhang, Peijun Lyu, Hong Li, Ludan Zhang, Ji Wu, Chin-Hui Lee “A
deep learning approach to automatic teeth detection and numbering based on object
detection in dental periapical films”, National Library of Medicine, March 2019.
4. Minyoung Chung, Juang Lee, SangukPark, Minkyung Lee ChaeEun Lee Jeongjin
Lee Yeong-Gi Shin,” Individual Tooth Detection and Identification from Dental
Panoramic X-Ray Images via Point-wise Localization and Distance Regularization”,
Elsevier, Artificial Intelligence in medicine, South Korea, January 2020.
5. Mircea Paul Muresan; Andrei Razvan Barbura; Sergiu Nedevschi, “Teeth Detection
and Dental Problem Classification in Panoramic X-Ray Images using Deep Learning
and Image Processing Techniques”, IEEE 16th International Conference on
Intelligent Computer Communication and Processing, March 2020
6. Shuxu Zhao, Qing Luo and Changrong Liu,” Automatic Tooth Segmentation and
Classification in Dental Panoramic X-ray Images”, IEEE International Conference on
Pattern Analysis and Machine Intelligence, China, February 2020.
7. Guohua Zhu; Zewen Piao; Suk Chan Kim “Tooth Detection and Segmentation with
Mask R-CNN”, Published in International Conference on Artificial Intelligence in
Information and Communication, April 2020
8. Gil Jader, Luciano Oliveira, Matheus Pithon “ Automatic segmentation teeth in X-ray
images: Trends, a novel data set, benchmarking and future perspectives”, Elsevier,
Expert systems with applications volume 10, March-2018.
9. Gil Jader; Jefferson Fontineli; Marco Ruiz; Kalyf Abdalla; Matheus Pithon; Luciano
Oliveira “Deep instance segmentation of teeth in panoramic x-ray images”
SIBGRAPI 31st International Conference on Graphics, Patterns and Images
(SIBGRAPI), Brazil, May 2018
10. Bernardo Silva; Laís Pinheiro; Luciano Oliveira; Matheus Pithon,” A study on tooth
segmentation and numbering using end-to-end deep neural networks” Published in
33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Brazil,
April 2020
11. Jader, G. et al. Deep instance segmentation of teeth in panoramic X-ray images,
http://sibgrapi.sid.inpe.br/col/sid.inpe.br/sibgrapi/2018/08.29.19.07/doc/tooth_segme
ntation.pdf. (2018).

You might also like