Study and Implementation of Object Detection and Visual Tracking
Study and Implementation of Object Detection and Visual Tracking
Study and Implementation of Object Detection and Visual Tracking
The period of Internship at Indian Institute of Technology Tirupati was a valuable one as it had many
experiences and there was opportunity to learn a lot of things. There were so many new experiences
that can influence my professional as well as personal life.
I have put a lot of effort in the project. However, it would not be able for me to complete it with ease
without the help of people at IIT Tirupati. I sincerely thank my research guide Dr. Rama Krishna Sai
Gorthi , Associate Professor, Electrical Engineering Department, IIT Tirupati for giving me a golden
opportunity to intern under him. He has helped me a lot by suggestions, comments and appreciation. I
express my deep gratitude to Mr. Mohana Murali, PhD scholar of IIT Tirupati, who has given
complete assistance during the internship. I express my thanks to Mr. Naveen, MS scholar of IIT
Tirupati, for his help and tips regarding the project.
I express my special gratitude to my co-intern, Mr. Dheeraj Varma for his help and coordination
during the stay in Tirupati. I express special thanks to all the PhD and MS scholars of IIT Tirupati who
helped me during the internship. I thank all the staff of IIT Tirupati for their help.
2
Study and Implementation of Object Detectors and Visual Trackers:
Bharat Giddwani , Mohan Murali, Naveen Palaru , Dr. Gorthi R. K. Sai Subrahmanyam,
Department of Electrical Engineering, Indian Institute of Technology, Tirupati, A.P.-517506, India
Abstract
Efficient and accurate object detection has been an important topic in the advancement of computer
vision systems. With the advent of deep learning techniques, the accuracy for object detection has
increased drastically. The project aims to incorporate state-of-the-art technique for object detection with
the goal of achieving high accuracy with a real-time performance. A major challenge in many of the
object detection systems is the dependency on other computer vision techniques for helping the deep
learning based approach, which leads to slow and non-optimal performance. In this project, we use a
completely deep learning based approach to solve the problem of object detection and visual tracking in
an end-to-end fashion. We used two different pre-trained networks on the most challenging publicly
available datasets (PASCAL VOC and MS - COCO), on which an object detection challenge is
conducted annually our main focus will be on YOLO (a unified state-of-art object detector).
In this report, we also examined the problem of tracking objects in video streams by using Deep
Learning. We use spatially supervised recurrent convolutional neural networks for visual object tracking.
In this method, the recurrent convolutional network uses both the history of locations and the visual
features from the deep neural networks. This method is used for tracking, based on the detection results.
We concatenate the location of detected bounding boxes with high-level visual features produced by
convolutional networks and then predict the tracking bounding box for next frames. Because a video
contain continuous frames, we decide to have a method which uses the information from history of
frames to have a robust tracking in different visually challenging cases such as occlusion, motion blur,
fast movement, etc. Long Short-Term Memory (LSTM) is a kind of recurrent convolutional neural
network and useful for our purpose. We used OTB100 dataset to train our tracking network. Instead of
using binary classification which is commonly used in deep learning based tracking methods, we use a
regression for direct prediction of the tracking locations. The resulting system is fast and accurate, thus
aiding those applications which require object detection and visual tracking.
Key words: Convolutional Neural Networks, Recurrent Neural Networks, YOLO Object Detector,
Visual Tracking.
3
CONTENTS
4
Chapter 1
Introduction
1.1 Background
In the past two decades, the problem of object detection, localization and tracking received significant
attention in different research areas. This coincides with the rising demand for information about objects
location and identity, which stems from applications in various fields, such as manufacturing, military,
business management, surveillance and security, transport and logistics, medical care, traffic
management, childcare, performance analysis in sports and sports medicine. Also human detection and
tracking can be widely used in many applications, including people counting and security surveillance
in public scenes.
Different methods have been used for this purpose. In some research such as, combination of Kalman
filter prediction and mean shift tracking is used. Tree-structured probabilistic model for human tracking
is used in. Recently, Neural Networks such as radial basis function (RBF) neural network and CNN,
have become more popular for image processing purposes. CNNs have recently been applied to various
computer vision tasks such as image classification, semantic segmentation, object detection, and many
others. Such great success of CNNs lead it to be used mostly with distinguished performance in visual
applications. Using CNN for tracking has a limitation which is related to training data. Tracking requires
more data in order to set a sufficient variety and it is difficult to collect a large enough amount of training
data for video processing applications and training algorithms. Several recent tracking algorithms have
mentioned the data leakage issue by transferring pre-tained CNNs on a large-scale dataset such as
ImageNet .
Online training videos can be used to train entirely from scratch online during test time and teach the
tracker to handle complex challenges in the condition such as rotations, changes in viewpoint, lighting
changes with no offline training being performed, but these tracking methods have a problem of too slow
speed. Also, such trackers have lower performance compared with offline training methods because they
cannot take advantage of the large number of videos for improving their performance.
5
1.3 Why Deep Learning
Many problems in computer vision were saturating on their accuracy before a decade. However, with
the rise of deep learning techniques, the accuracy of these problems drastically improved. One of the
major problem was that of image classification, which is defined as predicting the class of the image. A
slightly complicated problem is that of image localization, where the image contains a single object and
the system should predict the class of the location of the object in the image (a bounding box around the
object). The more complicated problem (this project), of object detection involves both classification
and localization. In this case, the input to the system will be an image, and the output will be a bounding
box corresponding to all the objects in the image, along with the class of object in each box. An overview
of all these problems is depicted in Fig. 1.
Chapter 2
Theory and Background
2.1 Convolutional Neural Networks - CNN
Convolutional Neural Networks Convolutional Neural Networks (ConvNets or CNNs) are a category
of Neural Networks that recently have a great application in visual analysis and machine learning.
ConvNets have been successful in classification, segmentation, detection and tracking problems.
CNN has four main steps: convolution, activation, subsampling and fully connectedness.
First step in CNN is convolution. The main idea of using convolution in first layers is extracting features
from the input image. There are some filters that act as feature detectors from the original input image.
In other words, convolution is a process where input signal is labelled by the network based on what it
has learned in the past. If the network decided that the input signal looks like previous cat images that it
6
has learned previously, the “cat” reference signal will be convolved with the input signal. The resulting
output signal is then passed on to the next layer.
In the Second step, activation. The activation layer controls how the signal flows from one layer to the
next layer. In different structures of CNNs, a wide variety of complex activation functions could be
chosen to model signal propagation. One of the most famous function is (ReLU), which is known for its
faster training speed. The ReLu has the mathematical form of:
f(x) = max(0, x)
Third step, subsampling, for reducing the sensitivity of the filters to noise and variations, the inputs from
the convolution layer are smoothed. Subsampling also reduces the dimensionality of each feature map
but save the most important information. This smoothing process is named subsampling or
downsampling or pooling. Subsampling can be achieved by different methods of max, average, sum etc.
The forth step is fully connected. The last layers in the most Convolutional network are fully connected.
It means that neurons of previous layers are connected to every neuron in next layers. The output from
the convolutional and pooling layers contain high-level features of the input image. Features of fully
connected layers are used by softmax layer and the input image is classified into different classes based
on the training data. Also fully-connected layers help to learn non-linear combinations of mentioned
features. Combinations of those features might be better for classification or other application of CNN.
• Formulas for Height , Width and Depth:
After Convolutional Layer: After Max/Average Pooling Layer
A) LeNet (1998) :
The most popular and first implementation of the CNN is the LeNet, which was introduced by Yann
LeCun in 1998 . Figure 2.1 illustrates the LeNet structure.
7
B.) VGG-16 (2014) :
Chapter 3
Object Detection in Real Images
3.1 Challenges
The major challenge in this problem is that of the variable dimension of the output which is caused due
to the variable number of objects that can be present in any given input image. Any general machine
learning task requires a fixed dimension of input and output for the model to be trained. Another
important obstacle for widespread adoption of object detection systems is the requirement of real-time
(¿30fps) while being accurate in detection. The more complex the model is, the more time it requires for
inference; and the less complex the model is, the less is the accuracy. This trade-off between accuracy
and performance needs to be chosen as per the application. The problem involves classification as well
as regression, leading the model to be learnt simultaneously. This adds to the complexity of the problem.
8
3.2.1) Bounding Box
The bounding box is a rectangle drawn on the image which tightly fits the object in the image. A
bounding box exists for every instance of every object in the image. For the box, 4 numbers (center x,
center y, width, height) are predicted. This can be trained using a distance measure between predicted
and ground truth bounding box. The distance measure is a jaccard distance which computes intersection
over union between the predicted and ground truth boxes as shown in Fig. 3.
Figure 3.1: Intersection over Union calculation using Jaccard Method (From Andrew Ng)
9
Types of Detectors:
1. R-CN
2. Fast R-CNN
3. R-FCN
4. Faster R-
CNN
Chapter 4
Approach for object detection
The network used in this project is based on YOLO [] and YOLOv3 []
10
Figure 4.1 – Visualizing the YOLO Method
To evaluate PASCAL VOC, YOLO uses 7×7 grids (S×S), 2 boundary boxes (B) and 20 classes (C).
Figure 4.2 – The model. Models detection as a regression problem. It divides the image into an S × S grid and
for each grid cell predicts B bounding boxes, confidence for those boxes, and C class probabilities. These
predictions are encoded as an S × S × (B ∗ 5 + C) tensor. (From YOLO Research Paper)
4.1.3) Characteristics:
• Tool?-- A single neural network, unified architecture(24 Convolutional Layers, 4 Max Pooling
and 2 Fully connected layers )
• Framework? – Darknet – Original implementation is in C and CUDA
• Technology background?--related methods are slow, not real-time and devoid of generalization
ability.
11
4.1.4) Network Architecture:
Figure 4.3: The Architecture – Detection Network has 24 conv layers and 2 fully connected layers which
is converted into feature map of S * S*(B*5 +C).
3.0) Total Loss Function: Localization Loss + Confidence Loss + Classification Loss
Where denotes if object appears in cell i and denotes that the jth bounding box predictor
in cell i is “responsible” for that prediction.
12
4.0) Non-Max Suppression:
During prediction, non-maxima suppression is used to filter multiple boxes per object that may
be matched as shown in Fig. 3.6.
Advantages:
• Simpler structure of network.
• Much more faster, even with real-time property: 45fps with YOLO and 150fps with Fast
version: able to process streaming video in real-time with less than 25 milliseconds of latency
• Maintaining a proper accuracy range.
4.0) Network :
The entire network architecture is shown in Fig. 4.6.The model consists of the base network
derived from Darknet-19 and then the modified convolutional layer from last fully connected for
fine-tuning and then the classifier and localizer networks.
14
False Detections:
False or Variated Detection Images : Table 2
Problems Observed:
• Larger object dominated when present along with small objects as found in Fig. a
• Occlusion creates a problem for detection. As shown in Fig. b, the occluded birds are not detected
correctly.
• Resolution must be high for an image otherwise bounding box may displace from its position.
As shown in Fig. c.
In order to solve above problems their comes many unified detection methods after YOLO:
1.) YOLO9000 : Better, Faster, Stronger
2.) SSD – Single Shot Multibox Detector.
3.) DSSD- Deconvolutional Single Shot Multibox Detector.
4.) YOLOv3: An Incremental Improvement.
15
Improvement:
In this section I will be describing some major improvements I object detection methods and finally
adopting the YOLO v3 as base for my project report.
4.2.4) Working:
Consider the case as shown in Fig. 10, where the cat has two anchors matched and the dog has one
anchor matched. Note that both have been matched on different feature maps.
16
Figure 4.8 – Working overview
During training SSD:-
➢ Needs an input image and ground truth boxes for each object during training.
➢ Evaluate a small set (e.g.4) of default boxes of different aspect ratios at each locations in several
feature maps with different scales.
➢ Filters (uses IoU Method) match these default boxes to the ground truth boxes (say for IoU>0.5)
and predict both the shape offsets and confidence for all object categories for each default box.
➢ The feed-forward convolutional network produces a fixed-size collection of bounding boxes and
scores for the presence of object class in those boxes.
➢ During prediction, non-maxima suppression is used to filter multiple boxes per object that may
be matched.
1 1
ar = 1, 2, 3, ,
2 3
• Instead of using all the negative examples, SSD sorts them using the highest confidence for each
default box and pick the top ones so that the ratio between the negatives and positives is at most
3:1—leading to faster optimization and more stable training.
• Total Loss Function: The loss function used is the multi-box classification and regression loss.
The classification loss used is the softmax cross entropy and, for regression the smooth L1 loss is used.
1
L ( x, c, l , g ) = ( Lconf ( x, c) + Lloc ( x, l , g ))
N
After the success of SSD many people move towards this architecture to and try to use and improve the
architecture, as SSD is unable to detect small objects and occlusion in an image accurately.
17
Some modified SSD based object detectors with their Architectures are:
A.) DSSD: Deconvolutional Single Shot Detector.
SSD Network
B. ) ESSD: Extend the shallow part of Single Shot MultiBox Detector via CNN.
Figure: 4.11- Darknet53 (From “What’s new with YOLOv3”in towardsdatascience.com by Ayoosh Kathuria.)
4.3.3) Characteristics:
• A (end- to -end) Fully Convolutional Neural Network (FCN).
• YOLO makes use of only convolutional layers, making it a fully convolutional network (FCN).
It has 75 convolutional layers, with skip connections and upsampling layers. No form of
pooling is used, and a convolutional layer with stride 2 is used to downsample the feature
maps. This helps in preventing loss of low-level features often attributed to pooling.
• In YOLO, the prediction is done by using a convolutional layer (Duh…it’s a fully
convolutional network, remember?) with a kernel size of [ 1 x 1 x (B x (5 + C)) ].
19
4.3.4) Working :
Let us consider an example below, where the input image is 416 x 416, and stride is 32. As pointed
earlier, the dimensions of the feature map will be 13 x 13. Divide the input image into 13 x 13 cells.
20
for, say, the grid at the corners in Figure 4.12. The objectness score is also passed through a sigmoid,
as it is to be interpreted as a probability.
4.) Center Coordinates (bx , by):
Notice we are running our center coordinates prediction through a sigmoid function. This forces the
value of the output to be between 0 and 1.
5.) Dimensions of bounding box (bw , bh):
The dimensions of the bounding box are predicted by applying a log-space transform to the output and
then multiplying with an anchor.
6.) Class Predictions (c1, c2…..):
Each box predicts the classes the bounding box may contain using multilabel classification (sigmoid and
binary loss function is used).
7.) Predictions across different scales:
YOLO v3 makes prediction across 3 different scales. The detection layer is used make detection at
feature maps of three different sizes, having strides 32, 16, 8 respectively. This means, with an input of
416 x 416, we make detections on scales 13 x 13, 26 x 26 and 52 x 52 (as seen in Figure :)
And at each scale, each cell predicts 3 bounding boxes using 3 anchors making the total number of
anchors used is 9 (The anchors are different for different scales.)
Output Processing:
For an image of size 416 x 416, YOLO predicts ((52 x 52) + (26 x 26) + 13 x 13)) x 3 = 10647 bounding
boxes. However, in case of our image, there’s only one object, a dog.
How do we reduce the detections from 10647 to 1?
1.) Thresholding by object confidence score, and 2.) Non Max Suppression.
4.0) Network :
The entire network architecture is shown in above Fig. 4.11. The model consists of the base network
derived from Darknet-53 and then the modified convolutional layer from last fully connected
convolutional (1*1*nc) layer for fine-tuning and then the classifier and localizer networks.
Images with some errors such as occlusion and small/far object detection.
22
6.0) Quantitative Analysis
The evaluation metric used is mean average precision (mAP). For a given class, precision recall curve
is computed. Recall is defined as the proportion of all positive examples ranked above a given rank.
Precision is the proportion of all examples above that rank which are from the positive class. The AP
summarizes the shape of the precision-recall curve, and is defined as the mean precision at a set of eleven
equally spaced recall levels [0, 0.1, ... 1]. Thus to obtain a high score, high precision is desired at all
levels of recall. This measure is better than area under curve (AUC) because it gives importance to the
sensitivity. The detections were assigned to ground truth objects and judged to be true/false positives by
measuring bounding box overlap. To be considered a correct detection, the area of overlap between the
predicted bounding box and ground truth bounding box must exceed a threshold. The output of the
detections assigned to ground truth objects satisfying the overlap criterion were ranked in order of
(decreasing) confidence output. Multiple detections of the same object in an image were considered false
detections, i.e. 5 detections of a single object counted as 1 true positive and 4 false positives. If no
prediction is made for an image then it is considered a false negative.
4.4) Conclusion
An accurate and efficient object detection system has been studied and developed which achieves
comparable metrics with the existing state-of-the-art system. This project uses recent techniques in the
field of computer vision and deep learning. Custom dataset was created using labelling and the evaluation
was consistent. This can be used in real-time applications which require object detection for pre-
processing in their pipeline. An important scope would be to train the system on a video sequence for
usage in tracking applications. Addition of a temporally consistent network would enable smooth
detection and more optimal than per-frame detection. We will also see some visual trackers in next
section of this report.
23
Chapter 5
Approach for Visual Tracking
5.1) Recurrent Neural Networks
Recurrent Neural Networks (RNNs) are popular models that have shown great promise in many works
which need the information of history such as language processing or video processing. The main idea
behind RNNs is to use sequential information. In other neural networks, all inputs and outputs are
independent of each other. But for many tasks it does not work well. For example, if you want to predict
the next word in a sentence, it is better to know which words came before it. RNNs are called recurrent
because they perform the same task for every element of a sequence, and the output depends on the
previous computations. Also we can say that RNNs have a “memory” which captures information about
what has been calculated so far. In theory RNNs can make use of information in arbitrarily long
sequences, but in practice they are limited to looking back only a few steps. Here is what a typical RNN
looks like:
Figure 5.1: recurrent neural network and the unfolding in time of the computation involved in its
forward computation
With unrolling we simply mean that we write out the network for the full sequence. For example, if we
want to use 5 sequences of a video, the network would be unrolled into a 5-layer neural network, one
layer for each sequence. More details about the parameters in figure and the formulas of RNN are as
follows: xt is the input at time step t. For example, x1 could be a vector corresponding to the second
sequence of a video. st is the hidden state at time step t. It’s the “memory” of the network. s t is
calculated based on the previous hidden state and the input at the current step:
st = f(Uxt + W st−1)
The function f usually is a nonlinearity such as tanh or ReLU. s−1, which is required to calculate the first
hidden state, is usually initialized to zero. ot is the output at step t. For example, if we wanted to predict
the the position of human in the next time stamp in a video it would be a vector of probabilities.
ot = softmax(V st)
But the most commonly used type of RNNs are LSTMs, which are much better at capturing long-term
dependencies than vanilla RNNs are. LSTM network will be explained in the next section.
These problems of RNN are the main motivation for designing the LSTM model which have a memory
cell which is shown in Figure 2.4. A memory cell has four main elements: an input gate, a neuron with
a self-recurrent connection (a connection to itself), a forget gate and an output gate. The weight of the
self-recurrent connection is 1.0 and ensures that the state of a memory cell can remain without change
in different timestep. The input gate can let incoming signal change the state of the memory cell or block
it. Also, the output gate can let the state of the memory cell change other neurons or prevent it. The forget
gate can let the cell to remember or forget its previous state, as needed
Gradient information is preserved by LSTM. Figure 5.3 illustrates preservation of gradient information
by LSTM. As it was showed in Figure 2.3 the shading of the nodes shows their sensitivity to the inputs
at time one; in the LSTM, the black nodes are maximally sensitive and the white nodes are completely
insensitive. The input, forget, and output gates are illustrated below, to the left and above the hidden
layer respectively. All gates are either entirely open (‘O’) or closed (‘—’). The memory cell ‘remembers’
the first input until the forget gate is open and the input gate is closed. The sensitivity of the output layer
can be switched on and off by the output gate without affecting the cell.
25
Figure 5.4: Preservation of gradient information by LSTM. (From Alex Graves, 2012)
All recurrent neural networks have the form of a chain of repeating modules of neural network. In
traditional RNNs, this repeating module will have a very simple structure, such as a single tanh layer.
Figure 5.5: The repeating module in a standard RNN contains a single layer (From Cristopher Olah)
LSTMs also have similar chain structure, but the repeating module is a bit different. Instead of having a
single neural network layer, there are four, interacting in a very special way.
Figure 5.6: The repeating module in an LSTM contains four interacting layers (From Cristopher Olah)
The formula of input gate it, forget gate ft, output gate ot and final state ht is define as following:
The main difference of LSTM with classical RNNs is the use of the these gating functions it, ft , ot ,
which explained previously, and indicate the input, forget, and output gate at time t respectively. Weight
parameters Wxi ,Whi , Whf ,Who ,Wxf , Whc , Wxo and Wxc , connect the different inputs and gates with the
memory cells and outputs and biases bi , bf , bc and bo. The cell state ct is updated with a fraction of the
previous cell state ct−1 that is controlled by ft.
26
5.3) Visual Tracking:
Visual Object Tracking is the process of localizing a single target in a video or sequential images, given
the target position in the first frame. Visual tracking is a challenging task in computer vision due to target
deformations, illumination variations, scale changes, fast and abrupt motion, partial occlusions, motion
blur, object deformation, and background clutters. The methods of visual tracking are divided into 3
categories: 1.) Fast Tracking, 2.) Robust Tracking, 3) Fast & Robust tracking.
Figure 5.7: Architecture of MDNet. (From Hyeonseob Nam and Bohyung Han, 2016)
The original implementation of MDNet is made in MATLAB -2014 (with MatconvNet library) which a
GPU support. So in order to check the robustness of the tracker I have made small changes in the code
with the help of PhD Scholar Mr. Mohan Murali for the research of old- MS Student Miss Pallavi
27
Venagopal of Prof. R.K Gorthi and tested video samples of VOT 2017, OTB100, and ALOV300+++
datasets and visualize the errors and drawbacks of tracker.
Simply CNN based trackers are fast enough but not robust enough in different challenging environment
such as motion blur. In order make a beautiful combination of Robust & Fast Tracker model we used an
approach combining the features of CNN with RNN to track a video sequence.
And we call it as recurrent- convolutional neural networks based object tracker.
Detail of our approach is shown below (taken from https://arxiv.org/pdf/1607.05781.pdf)
5.3.3) General overview of proposed method: (CNN – RNN based Visual trackers)
YOLO + LSTM = ROLO - Recurrent YOLO:-
(https://arxiv.org/pdf/1607.05781.pdf)
The proposed model contain a deep neural network for which the input is raw video frames and it returns
the coordinates of a bounding box of an object which was tracked in each frame.
Tracking probability is calculated as following:
Bt and Xt are the location of an human and an input frame, respectively, at time t. X<t is a history of
input frames and B<t is a history of previous locations of human before time t.
• YOLO to collect rich and robust visual features, as well as preliminary location inferences;
and
• LSTM in the next stage as it is spatially deep and appropriate for sequence processing.
Training of ROLO Model.
28
For detection of the first frame, we make a decision based on the IOU distance between the detection
boxes and the ground truth. Also a minimum IOU is defined to reject the detection which their IOU are
less than IOU min.
n indicates the number of training samples in a batch, Bprediction is the tracking prediction and Btarget is the
target ground truth value. They use the Adam method for stochastic optimization.
• OCC: Occlusion
• DEF: Deformation
• OV: Out-of-View
29
• BC: Background Clutters
Chapter 6
6.1) Applications
A well-known application of object detection is face detection that is used in almost all the mobile
cameras. A more generalized (multi-class) application can be used in autonomous driving where a
variety of objects need to be detected. Also it has a important role to play in surveillance systems. These
systems can be integrated with other tasks such as pose estimation where the first stage in the pipeline
is to detect the object, and then the second stage will be to estimate pose in the detected region. It can be
used for tracking objects and thus can be used in robotics and medical applications. Thus this problem
serves a multitude of applications.
30
6.2) Conclusion:
During the complete internship, I have learnt a lot new things by doing intensive research on Deep
Learning mainly in the field of object detection and tracking and find and implement effective, accurate
and efficient state-of-art CNN and RNN based object trackers. This internship taught me how to proceed
with current research problems and contribute something new towards the community of Computer
Vision and Deep Learning.
I have faced many challenges while implementing the state-of-art visual tracker such as learning new
libraries for efficient and fast object tracking. Different approaches have been tried for different tasks
during the development of the proposed algorithms implementation which was discussed in the previous
chapters.
On the completion of the internship I understood that the community of Computer Vision and Deep
learning is very vast and open source it is getting contributed day- to- day from many parts of the world.
So I can say that I have satisfactorily completed the internship at IIT Tirupati. But there is to work on in
this field so I want be a part of this large developing field.
31
CERTIFICATE
This is to certify that this internship report entitled “Study and Implementation of
Object Detection and Visual Tracking” submitted to Indian Institute of Technology,
Tirupati, is a record of work done by “Bharat Giddwani” under my supervision from
___ May 2018 to __ June 2018.
Place:
Date:
32