Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation

Uploaded by

paul rodriguez lopez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views6 pages

Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation

Uploaded by

paul rodriguez lopez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Fast CNN-Based Object Tracking Using Localization Layers and Deep Features

Interpolation

Al-Hussein A. El-Shafie Mohamed Zaki S. E. D. Habib

Faculty of Engineering Faculty of Engineering Faculty of Engineering
Cairo University Azhar University Cairo University
Giza, Egypt Cairo, Egypt Giza, Egypt
elshafie_a@yahoo.com azhar@eun.eg seraged@ieee.org

Abstract—Object trackers based on Convolution Neural motion is small. On the other hand, the appearance model is
Network (CNN) have achieved state-of-the-art performance on used to represent the target and verify the predicted location
recent tracking benchmarks, while they suffer from slow of the target in every frame [6]. The appearance model can
computational speed. The high computational load arises from be classified to generative and discriminative methods. In
the extraction of the feature maps of the candidate and generative methods, the tracking is performed by searching
training patches in every video frame. The candidate and for the most similar region to the object [6]. In discriminative
training patches are typically placed randomly around the methods, a classifier is used to distinguish the object from
previous target location and the estimated target location the background. In general, the appearance model can be
respectively. In this paper, we propose novel schemes to speed-
updated online to account for the target appearance
up the processing of the CNN-based trackers. We input the
variations during tracking.
whole region-of-interest once to the CNN to eliminate the
redundant computations of the random candidate patches. In Traditionally, tracking algorithms employed hand-crafted
addition to classifying each candidate patch as an object or features like pixel intensity, color and Histogram of Oriented
background, we adapt the CNN to classify the target location Gradients (HOG) [7] to represent the target in either
inside the object patches as a coarse localization step, and we generative or discriminative appearance models. Although
employ bilinear interpolation for the CNN feature maps as a hand-crafted features achieve satisfactory performance in
fine localization step. Moreover, bilinear interpolation is constrained environments, they are not robust to severe
exploited to generate CNN feature maps of the training patches appearance changes [8]. Deep learning using Convolution
without actually forwarding the training patches through the Neural Networks (CNN) has recently achieved a significant
network which achieves a significant reduction of the required performance boost to various computer vision applications.
computations. Our tracker does not rely on offline video Visual object tracking has been affected by this popular trend
training. It achieves competitive performance results on the in order to overcome the tracking challenges and obtain
OTB benchmark with 8x speed improvements compared to the better performance than that obtained by hand-crafted
equivalent tracker. features. In pure CNN-based trackers, the appearance model
is learned by a CNN and a classifier is used to label the
Keywords- object tracking, CNN, computer vision, video image patch as an object or background. CNN-based trackers
processing, bilinear interpolation, classification-based trackers [8]-[10] achieved state-of-the-art performance in latest
benchmarks [11], [12] even with simple motion models and
I. INTRODUCTION no offline training. However, CNN-based trackers typically
Visual object tracking is a classical problem in the suffer from high computational loads because of the large
computer vision domain where the location of the target is number of the candidate patches and the training patches
estimated in every video frame. The tracking research field which are required in the tracking phase and the training
continues to be active since long period because of the phase respectively.
several variations imposed in the tracking process, like In this paper, we address the speed limitations of the
occlusion, changing appearance, illumination changes and CNN-based trackers. We adapt the CNN not only as a two-
cluttered background. It is challenging for a tracker to handle label classifier, object and background labeling, but also as a
all these variations in a single framework. Therefore, five-position classifier for the object position inside the
numerous algorithms and schemes exist in literature aiming candidate patch. This scheme allows achieving coarse object
to tackle the tracking challenges and improve the overall localization with less number of candidate patches. In
tracing performance [1]-[3]. addition, we exploit a bilinear interpolation scheme of the
A typical tracking system consists of two main models, CNN feature maps already extracted in the coarse
motion model and appearance model. The motion model is localization step for two purposes: first for the fine object
employed to predict the target location in the next frame like localization, and second for the CNN feature extraction of
using Kalman filter [4] or particle filter [5] to model the the training patches. The computation of the bilinear
target motion. The motion model can also be simple like interpolation is significantly less than that of extracting a
constraining the search space to a small search window new feature map which would speed-up the required
around the previous target location and assuming the target
processing time. Moreover, we did not perform offline accurate. On the other hand, online training is necessary to
training on any tracking dataset for our tracker. cope with the potential appearance changes of the target. It is
This paper is organized as follows: Section II gives an typical to update the parameters of the fully-connected layer
overview of the CNN-based trackers and the speed only and keep those of the convolution layers fixed
bottlenecks in these systems. Our proposed schemes are throughout the whole tracking process because the
presented in Section III. Section IV demonstrates the convolutional layers would have generic tracking
experimental results with the OTB benchmark, and finally, information while the fully-connected layers would have
Section V concludes our work. target-background specific information. Short-term and long-
term model updates proposed by [8] have been employed as
II. OVERVIEW OF CNN-BASED TRACKERS well in other CNN-based trackers [9], [10], [23], [24]. Long-
Following the huge success of deep CNNs in image term updates are carried out at regular intervals, while short-
classification [13], [14] and object detection applications term updates are carried out when the object score drops
[15], [16], many recent works in the object tracking domain severely during tracking. The training data required for the
have adopted deep CNNs and achieved state-of-the-art online training is obtained every frame where deep features
performance. There exists different use cases of CNNs in the for positive and negative patches are generated and stored.
tracking filed. References [17]-[19] employed CNNs with The positive and negative patches have Intersection of Union
Discriminative Correlation Filters (DCF) where the (IoU) overlap with the estimated target location larger and
regression models of these DCF-based trackers are trained less than certain thresholds respectively. When it is required
by the feature maps extracted by the deep CNNs. References to update the model, the stored positive and negative feature
[20]-[22] adopted Siamese structure where two identical maps are sampled randomly to update the parameters.
CNN branches are used to generate feature maps for two The main computation steps in CNN-based trackers can
patches simultaneously either from the same frame or be categorized into candidate evaluation, collecting training
successive frames. The outputs from both branches are then data and model update. The model update is performed at
correlated to localize the target. References [8]-[10], [23], fixed intervals in the typical case and has less effect on the
[24] are pure CNN-based trackers where fully-connected computation time compared to the candidate and training
layers are added after generating the feature maps to classify data processing. The CNN-based trackers mainly suffer from
the input patches to object or background. A softmax layer is slow speed because of the computation in the convolutional
typically used at the end to score the candidate patches and layers to obtain the deep features for the candidate and
opt the highest object score as the new target location. These training patches every frame. However, it can be noticed
pure CNN-based trackers achieved state-of-the-art there would be a lot of computation redundancies because
performance in the latest benchmarks and we focus on this the candidate and training patches are generated randomly
type of trackers in the rest of the paper. with large potential overlaps. Hence, we propose novel
Fig. 1 shows a typical CNN-based tracker. In each schemes in the next section to mitigate the redundant
frame, candidate patches are generated with different computations and speed-up the required processing time of
translations and scales sampled from a Gaussian distribution. the CNN-based trackers.
The mean of the Gaussian distribution is the previous
location and scale of the target. The deep features are III. PROPOSED CNN-BASED TRACKER
extracted by the convolution layers for each patch and scored A. Target localization
by the fully connected (fc) layers. Although CNNs typically have local max-pooling layers
to allow CNNs to be spatially-invariant to the input data, the
intermediate feature maps are not actually invariant to large
transformations of the input data [26]. Hence, we exploit
this typical behavior of the network such that we do not
only classify the patch into object or background, but also,
classify the location of the object inside the patch. Having
Fig. 1. A typical CNN-based tracker four classes as up, down, right and left to represent the
target location inside the patch, we can localize the target
For the training of the CNN-based trackers, transfer with less number of candidates. In addition, we do not
learning is typically exploited where the network parameters
generate random candidate patches to cover the Region of
are initialized by another network pre-trained on large-scale
classification dataset like ImageNet [25]. References [8]-[10] Interest (ROI) as previous works but we generate fixed-
adopted offline training models to update the network spacing patches to cover the whole ROI as shown in Fig. 2.
parameters before tracking. It is difficult, however, to collect This scheme prevents the potential redundant computations
a large training dataset for visual tracking. Therefore, recent in case of generating random patches and reduces the risk of
works [23], [24] dispensed with offline training steps and missing the target. We propose also to forward the whole
still achieved state-of-the-art performance. These techniques ROI through the convolution layers to save some redundant
depend on increasing the number of iteration for the training computations instead of forwarding each patch separately.
in the initial frame because the target location is known and
This idea is similar to what proposed in [16], [27] in the positive patches are actually sub-divided into localization
object detection field where the whole image is forwarded patches. Although we add more classification classes in the
through the network instead of the proposal regions. network for localization, the required computation does not
increase much because the localization patches are not
forwarded through the whole convolutional layers and
bilinear interpolation is exploited instead.
C. Scale variation
Reference [8] handled scale variation by generating
training and candidate patches with random scales drawn
from a Gaussian distribution and forwarding these patches
through the whole network to obtain the feature maps.
Fig. 2. Random patches and fixed-spacing patches However in our proposed scheme, we extract feature maps
for three fixed scales only {1, max_scale_up,
It is common in CNN-based trackers that the target max_scale_down}. We then obtain the feature map of any
required scale in that range, for either a candidate or training
localization is carried out by taking the mean location of the
patch, by applying linear interpolation on two scales. Hence,
candidate patches with top object scores, while, in our instead of forwarding the images patches generated
scheme, the patches which are classified as objects are first randomly in spatial and scale domains through the
moved based on the localization network. The patch with convolution layers, we extract feature maps for a larger
the highest overlap with the other object patches is selected image patch at three fixed scales. We perform bilinear
as an input to a fine localization step where we utilize interpolation to obtain the feature map at the required
bilinear interpolation of the feature maps. Bilinear displacement and perform linear interpolation to obtain the
interpolation was first proposed by [26] for the feature map at the required scale. Fig. 4 illustrates our
implementation of a spatial transformer network and it was scheme of obtaining feature maps of image patches at
then employed by [28] in a ROI align scheme for object different scales.
detection applications. Let’s assume the target is represented
by 3x3xd feature maps as shown in Fig. 3 (a), where d is the
feature depth, and we extract feature maps for a region
larger than the target size such that we get 5x5xd feature
maps as shown in Fig. 3 (b). We would have nine 3x3 grid
in total. Each 3x3 grid is displaced by dx and/or dy from its
neighbors. The value of dx and dy depends on the network
structure. Accordingly, we can obtain the feature maps of all
image patches which have displacements ranging from 0 to
dx or dy measured from the center by bilinear interpolation
without forwarding the image patches through the
Fig. 4. Interpolation of fixed-scale feature maps
convolution layers. Any point value is calculated by bilinear
interpolation from the four nearby points in the feature maps IV. IMPLEMENATION DETAILS
such as point * in Fig. 3 (c).
A. Netwrok stucture
We start with the MDNet_N implementation as a
baseline for our work. MDNet_N is the same as MDNet [8]
but without both offline training and bounding box
regression. The parameters of the convolutional layers
(conv1-3) are initialized by the VGG-M [29] model and the
fully connected layers are initialized by random values. In
[8], the object size (h×w) is cropped and padded to the
network input size which is 107×107 such that this fixed
size, 107×107, would be equivalent to an image patch of
(107÷75)×(h×w). The spatial size of the feature maps
Fig. 3. Interpolation of feature maps generated from conv3 is 3×3 for a network input of 107×107.
Our network shown in Fig. 5 is similar to MDNet but we add
B. Network training fc7-9 as a localization network and allow different input
We reuse the feature maps obtained in the localization sizes to get feature maps of sizes 3×3, 5×5, 7×7 … etc when
phase to extract the feature maps of the positive and negative needed. The localization layer classifies the positive patches
training patches by applying bilinear interpolation. The
into five classes based on the position of the object inside the network. The patch with the highest overlap with other
patch (up, down, right, left and middle). object patches is chosen for the next fine localization step.
In the fine localization step, we need to find a finer
B. Initial frame training location and a newer scale of the object. We calculate new
In order to generate training data for the object and 5x5 feature maps centered on the coarse location at two
localization layers, in the initial frame, we generate feature scales, 1.05 and 1.05-1. We generate 100 fine samples
maps for an input of size 139×139 at three fixed scales: 1, displaced with fixed values in the x and y direction and with
1.2 and 1.2-1. The output from conv3 would be 5x5 at three different scales drawn from a Gaussian distribution. The
fixed scales. The initial object is actually represented by the feature maps of these fine samples are calculated by bilinear
inner 3x3 feature maps at scale 1. Accordingly, we can interpolation in the spatial and scale domains. Then, we
exploit bilinear interpolation and generate feature maps for check the object score of the fine samples and average the
any patch with a displacement ranging from 0 to a (16÷75 × three samples with the highest object score.
w) and (16÷75 × h) in the x and y direction respectively and
with different scales ranging from 1.2-1 to 1.2. The object D. Online network update
training samples are generated from a Gaussian distribution We adopt long-term and short-term network update
in the same way as MDNet_N so that the IoU with the initial schemes as proposed in [8]. Long-term is carried-out every
target location is larger than 0.7. The localization training 10 frames using the training samples collected, while, a
samples are generated from five Gaussian distributions short-term update is carried out when the object score is less
equivalent to each localization class and the IoU should be
than 0.5. We generate training samples for the object,
larger than 0.7 as well. In order to generate training data for
the background, in the initial frame, we divide the background and localization layers each frame if the object
background training data into two types, close and far score obtained in that frame is larger than 0.5 similar to [8].
samples. The close samples are the samples which are close However, we reuse the feature maps generated in the
to the initial target location, and hence, we can apply the tracking stage to obtain the feature maps of the training
same interpolation scheme used for the object and samples by bilinear interpolation in the spatial and scale
localization training samples. For the far background domains. In addition, we employ hard minibatch mining for
samples, we generate feature maps as normal by forwarding the negative training samples similar to [8] where 96
the samples through all the convolutional layers. All the negative samples of the highest positive score are selected
background training samples should have IoU less than 0.5 out of 1024 negative samples. The number of training
with the initial target. Our network is trained by Stochastic
samples for the object, localization and background layers is
Gradient Descent (SDG) with mini-batch size of 128 and 65
for fc4-6 and fc7-9 respectively and 90 iterations in the 30, 30 and 100 samples respectively each frame and 10
initial frame. iterations are adapted for the online update.
E. Experimental results
We evaluate our tracker on the Object Tracking
Benchmark (OTB-100) [11] which contains 100 fully
annotated videos. Our tracker is implemented in Matlab
using MatConvNet and runs on an Intel i7-3520M CPU
system. We ran MDNET_N on the same system as a
reference system. The tracking performance is measured by
performing One-Pass Evaluation (OPE) on two metrics,
center location error and IoU between the estimated target
location and the ground truth. Fig. 6 shows the precision plot
Fig. 5. Proposed network structure and the success plot of our tracker on 100 video frames [11]
C. Object tracking compared with MDNET_N. The precision plot shows the
percentage of frames whose estimated target location is
We forward the whole ROI of size (4w×4h) which is within the error threshold (x-axis) of the ground truth, while
centered on the previous target location through the the success plot shows the percentage of frames whose IoU
convolution layers. We crop this ROI to 299×299 before is larger than the overlap threshold (x-axis). The legend
entering the network and obtain a 15×15 conv3 feature map values in the precision and the success plots are the precision
accordingly. As the object is represented by a 3x3 conv3 score in case error threshold = 20 pixel and the area under
feature map, we would obtain 169 feature maps of spatial the curve (AUC) of the success plot respectively. It can be
size 3x3. These 169 feature maps represent image patches seen from Fig. 6 that our tracker which is based on
displaced from the center by [k × (16÷75) × w] and [k × Interpolation and Localization Network (ILNET) has the
(16÷75) × h] in the x and y direction respectively, where k is same AUC as MDNET_N and slightly lower precision.
an integer in [-5:5]. The object score of each 3x3 feature map Fig. 7 demonstrates the effectiveness of our tracker to
is checked and if it is larger than 0.5, the new location of the handle all kinds of tracking challenges. It can be seen that
equivalent patch will be obtained based on the localization our tracker achieves almost the same or better performance
compared to the baseline tracker, MDNET_N. Table I shows TABLE I AVERAGE COMPUTATION TIME IN SECONDS PER FRAME
the breakdown of the processing time savings achieved by MDNET-N* Our work Speed-up
our tracker. Both the tracking and the training speeds have [8] (ILNET) factor
Candidate processing 3.4 0.36 9.4x
increased despite adding a localization network and
Training processing 3.3 0.21 15.7x
increasing the number of training iterations. This speed-up Network update
achievement is due to bilinear interpolation on the feature (@10th frame for long-term)
2.3 2.3 1x
maps and by using fixed-spaced candidates. First frame training 90 52 1.72x
Frame processing
7 0.8 8.8x
without first frame
*
MDNET_N:MDNET without offline training and bounding box regression

During the preparation of our paper, another recent work

[30] was published which adopts a bilinear interpolation
scheme similar to the one used in our paper. However, our
work has three distinct features: first, we do not offline-train
our network on any video dataset. Secondly, we input the
ROI only to the convolutional layers, while in [30], almost
the whole image is forwarded. Finally, we add a localization
network in order to test fixed-spaced candidates instead of
Fig. 6. Precision and success plots on OTB100
random candidates which would allow us to cover the whole
ROI and reduce the probability of missing the target.

Fig. 7. The success plot of eight challenge attributes

allows us to cover the whole ROI with fixed-spaced
V. CONCLUSION candidates instead of random candidates. Consequently, the
We present a fast CNN-based object tracker where we required computation is reduced. Our network is just
address the speed issues of the typical CNN-based trackers. initialized by the VGG-M network pertained on the
The main computation overhead in CNN-based trackers ImageNet dataset, and it is not offline-trained on any video
arises from forwarding the candidate and training patches dataset. Our proposed CNN-based tracker employs simple
through all the convolutional layers to extract the feature tracking and training stages which would facilitate the
maps of the image patches. In this paper, we reduce the embedded implementation including the hardware
redundant computation of feature maps for the candidate and implementation. Our tracker achieved the same performance
training patches by employing bilinear interpolation. In as the baseline tracker on the OTB dataset while featuring 8x
addition, we add a localization network and forward the speed-up improvement.
whole ROI once through the convolutional layers which
REFERENCES [17] C. Ma, J. B. Huang, X. K. Yang, and M. H. Yang, "Hierarchical
Convolutional Features for Visual Tracking," IEEE International
[1] A. Yilmaz, O. Javed, and M. Shah, “Object tracking: A survey,” ACM Conference on Computer Vision. pp. 3074-3082, 2015.
Computing Surveys, vol. 38, no. 4, 2006
[18] M. Danelljan, G. Hager, F. S. Khan, and M. Felsberg, “Convolutional
[2] A. Ali, A. Jalil, J. W. Niu, X. K. Zhao, S. Rathore, J. Ahmed, and M. Features for Correlation Filter Based Visual Tracking,” 2015 IEEE
Aksam Iftikhar, “Visual object tracking-classical and contemporary International Conference on Computer Vision Workshop (ICCVW),
approaches,” Frontiers of Computer Science, vol. 10, no. 1, pp. 167- pp. 621-629, 2015, 2015.
188, Feb, 2016.
[19] M. Danelljan, A. Robinson, F. S. Khan, and M. Felsberg, “Beyond
[3] A.-H A. El-Shafie and S. E. D. Habib, “A Survey on Hardware Correlation Filters: Learning Continuous Convolution Operators for
Implementations of Visual Object Trackers”, arXiv:1711.02441 Visual Tracking,” Computer Vision - ECCV 2016, Pt V, vol. 9909,
2017 pp. 472-488, 2016, 2016.
[4] D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object [20] R. Tao, E. Gavves and A. W. M. Smeulders, "Siamese Instance
tracking,” IEEE Transactions on Pattern Analysis and Machine Search for Tracking," IEEE Conference on Computer Vision and
Intelligence, vol. 25, no. 5, pp. 564-577, May, 2003. Pattern Recognition. pp. 1420-1429, 2016.
[5] M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A tutorial [21] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S.
on particle filters for online nonlinear/non-Gaussian Bayesian Torr, “Fully-Convolutional Siamese Networks for Object
tracking,” IEEE Transactions on Signal Processing, vol. 50, no. 2, Tracking,” Computer Vision - ECCV 2016 Workshops, Pt Ii, vol.
pp. 174-188, Feb, 2002. 9914, pp. 850-865, 2016.
[6] W. Zhong, H. C. Lu, and M. H. Yang, “Robust Object Tracking via [22] D. Held, S. Thrun, and S. Savarese, “Learning to Track at 100 FPS
Sparse Collaborative Appearance Model,” IEEE Transactions on with Deep Regression Networks,” Computer Vision - ECCV 2016, Pt
Image Processing, vol. 23, no. 5, pp. 2356-2368, May, 2014. I, vol. 9905, pp. 749-765, 2016.
[7] N. Dalal, and B. Triggs, "Histograms of oriented gradients for human [23] L. Yang, R. Liu, D. Zhang, and L. Zhang, “Deep Location-Specific
detection," IEEE Computer Society Conference on Computer Vision Tracking”. In Proceedings of the 2017 ACM on Multimedia
and Pattern Recognition. pp. 886-893, 2005. Conference (pp. 1309-1317). ACM.
[8] H. Nam and B. Han "Learning Multi-Domain Convolutional Neural [24] B. Han, J. Sim and H. Adam, "BranchOut: Regularization for Online
Networks for Visual Tracking," IEEE Conference on Computer Ensemble Tracking with Convolutional Neural Networks," IEEE
Vision and Pattern Recognition. pp. 4293-4302, 2016. Conference on Computer Vision and Pattern Recognition. pp. 521-
[9] H. Nam, M. Baek and B. Han “Modeling and Propagating CNNs in a 530, 2017.
Tree Structure for Visual Tracking”, arXiv:1608.07242 , 2016 [25] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. H.
[10] S. Yun, J. Choi, Y. Yoo, K. Yun and J. Y. Choi, "Action-Decision Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L.
Networks for Visual Tracking with Deep Reinforcement Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,”
Learning," IEEE Conference on Computer Vision and Pattern International Journal of Computer Vision, vol. 115, no. 3, pp. 211-
Recognition. pp. 1349-1358, 2017. 252, Dec, 2015.
[11] Y. Wu, J. Lim, and M. H. Yang, “Object Tracking [26] M. Jaderberg, S. Karen, and A. Zisserman. "Spatial transformer
Benchmark,” IEEE Transactions on Pattern Analysis and Machine networks." In Advances in neural information processing systems, pp.
Intelligence, vol. 37, no. 9, pp. 1834-1848, Sep, 2015. 2017-2025. 2015.
[12] M. Kristan, A. Leonardis, J. Matas, et al. “The Visual Object [27] K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, “Spatial Pyramid
Tracking VOT2016 Challenge Results,” Computer Vision - ECCV Pooling in Deep Convolutional Networks for Visual
2016 Workshops, Pt Ii, vol. 9914, pp. 777-823, 2016. Recognition,” IEEE Transactions on Pattern Analysis and Machine
[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton. “Imagenet Intelligence, vol. 37, no. 9, pp. 1904-1916, Sep, 2015
classification with deep convolutional neural networks” NIPS, 2012. [28] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-
[14] K. Simonyan and A. Zisserman. “Very deep convolutional networks CNN,” IEEE transactions on pattern analysis and machine
for large-scale image recognition” CoRR, abs/1409.1556, 2014. intelligence, 2018-Jun-05, 2018.
[15] R. Girshick, J. Donahue, T. Darrell and J. Malik, "Rich feature [29] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman “Return of
hierarchies for accurate object detection and semantic the devil in the details: Delving deep into convolutional nets” In
segmentation," IEEE Conference on Computer Vision and Pattern BMVC, 2014.
Recognition. pp. 580-587, 2014. [30] I. Jung, J. Son, M. Baek, and B. Han, "Real-Time MDNet," Computer
[16] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Vision – ECCV 2018. pp. 89-104.
Real-Time Object Detection with Region Proposal Networks,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 39,
no. 6, pp. 1137-1149, Jun, 2017.

COS324 Course Notes
No ratings yet
COS324 Course Notes
256 pages
Gyllsdorff_2018._Distribuited_machine_learning_for_embedded_devices
No ratings yet
Gyllsdorff_2018._Distribuited_machine_learning_for_embedded_devices
84 pages
S. Jayaraman - Digital Image Processing-Tata McGraw-Hill Education (2009)
No ratings yet
S. Jayaraman - Digital Image Processing-Tata McGraw-Hill Education (2009)
751 pages
(Ebook) Statistical Methods for Fuzzy Data by Reinhard Viertl ISBN 9780470699454, 9780470974421, 9780470974414, 0470699450, 0470974427, 0470974419 instant download
100% (1)
(Ebook) Statistical Methods for Fuzzy Data by Reinhard Viertl ISBN 9780470699454, 9780470974421, 9780470974414, 0470699450, 0470974427, 0470974419 instant download
56 pages
Self-Supervised Deep Correlation Tracking
No ratings yet
Self-Supervised Deep Correlation Tracking
10 pages
Object Tracking
No ratings yet
Object Tracking
20 pages
Sample DAA Lab Programs 8-13 Fin (1)
No ratings yet
Sample DAA Lab Programs 8-13 Fin (1)
25 pages
Notes Deep Learning
No ratings yet
Notes Deep Learning
57 pages
Dust3R: Geometric 3D Vision Made Easy
No ratings yet
Dust3R: Geometric 3D Vision Made Easy
23 pages
HOClass Paper SashaKiselev
No ratings yet
HOClass Paper SashaKiselev
18 pages
Tracking Holistic Object Representations
No ratings yet
Tracking Holistic Object Representations
17 pages
electronics-13-00091
No ratings yet
electronics-13-00091
16 pages
Lecture 9.2 Motion & Video Analysis in Computer Vision 2025
No ratings yet
Lecture 9.2 Motion & Video Analysis in Computer Vision 2025
49 pages
18. Visual Object Tracking
No ratings yet
18. Visual Object Tracking
42 pages
L 1 Intro
No ratings yet
L 1 Intro
16 pages
Journal of Robotics - 2023 - Nguyen - Study on Tracking Real‐Time Target Human Using Deep Learning for High Accuracy
No ratings yet
Journal of Robotics - 2023 - Nguyen - Study on Tracking Real‐Time Target Human Using Deep Learning for High Accuracy
11 pages
GROOT
No ratings yet
GROOT
14 pages
Artificial Intelligence AI Techniques for Intellig
No ratings yet
Artificial Intelligence AI Techniques for Intellig
11 pages
Youtube Comments Sentiment Analysis 2
No ratings yet
Youtube Comments Sentiment Analysis 2
11 pages
Tracking by Instance Detection- A Meta-Learning Approach
No ratings yet
Tracking by Instance Detection- A Meta-Learning Approach
10 pages
s10489-023-04998-3 (1)
No ratings yet
s10489-023-04998-3 (1)
19 pages
2023-1-7
No ratings yet
2023-1-7
8 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
Thut Toan LP LCH Lung Cong Vic Tron
No ratings yet
Thut Toan LP LCH Lung Cong Vic Tron
8 pages
A Detection-Based Multiple Object Tracking Method: Mei Han Amit Sethi Yihong Gong
No ratings yet
A Detection-Based Multiple Object Tracking Method: Mei Han Amit Sethi Yihong Gong
4 pages
A Review On Machine Learning For EEG Signal Processing in Bioengineering
No ratings yet
A Review On Machine Learning For EEG Signal Processing in Bioengineering
15 pages
Resolucion Capítulo 2 Econometria Stock y Watson PDF
No ratings yet
Resolucion Capítulo 2 Econometria Stock y Watson PDF
29 pages
Trackformer
No ratings yet
Trackformer
16 pages
Zhong 2023 Icra
No ratings yet
Zhong 2023 Icra
7 pages
DOC-20250612-WA0002.
No ratings yet
DOC-20250612-WA0002.
21 pages
IET Computer Vision - 2019 - Xu - Deep Learning For Multiple Object Tracking A Survey
No ratings yet
IET Computer Vision - 2019 - Xu - Deep Learning For Multiple Object Tracking A Survey
14 pages
Tianyu Yang Learning Dynamic Memory ECCV 2018 Paper
No ratings yet
Tianyu Yang Learning Dynamic Memory ECCV 2018 Paper
16 pages
Wang SPM-Tracker Series-Parallel Matching For Real-Time Visual Object Tracking CVPR 2019 Paper
No ratings yet
Wang SPM-Tracker Series-Parallel Matching For Real-Time Visual Object Tracking CVPR 2019 Paper
10 pages
0-9 and From A To F
No ratings yet
0-9 and From A To F
3 pages
Pedestrian Detection and Tracking
No ratings yet
Pedestrian Detection and Tracking
13 pages
SiamCAR Journal2022
No ratings yet
SiamCAR Journal2022
17 pages
Numerical Methods in Linear Equations - Direct Method
No ratings yet
Numerical Methods in Linear Equations - Direct Method
22 pages
Power Plays: Unleashing Machine Learning Magic in Smart Grids
No ratings yet
Power Plays: Unleashing Machine Learning Magic in Smart Grids
16 pages
Efficient Visual Tracking With Stacked Channel-Spatial Attention Learning
No ratings yet
Efficient Visual Tracking With Stacked Channel-Spatial Attention Learning
13 pages
Design of An Effective Multiple Objects Tracking Framework For Dynamic Video Scenes
No ratings yet
Design of An Effective Multiple Objects Tracking Framework For Dynamic Video Scenes
13 pages
1602 00763
No ratings yet
1602 00763
5 pages
Convolutional Neural Networks For Object Detection and 14h2qb6f
No ratings yet
Convolutional Neural Networks For Object Detection and 14h2qb6f
13 pages
11
No ratings yet
11
19 pages
A_Real_Time_Object_Distance_Measurement
No ratings yet
A_Real_Time_Object_Distance_Measurement
6 pages
Online Distance Metric Learning For Object Tracking: Grigorios Tsagkatakis, and Andreas Savakis
No ratings yet
Online Distance Metric Learning For Object Tracking: Grigorios Tsagkatakis, and Andreas Savakis
12 pages
Review of Object Tracking Algorithms in Computer V
No ratings yet
Review of Object Tracking Algorithms in Computer V
6 pages
Object Tracking
100% (1)
Object Tracking
22 pages
2022 Visual Object Tracking A Survey
No ratings yet
2022 Visual Object Tracking A Survey
42 pages
Analisis Studi Kelayakan Pengembangan Usaha Canopy Dan Atap Baja Ringan Pada Cv. Baja Jaya Las Muara Bulian Muryati, Feny Juhanti
No ratings yet
Analisis Studi Kelayakan Pengembangan Usaha Canopy Dan Atap Baja Ringan Pada Cv. Baja Jaya Las Muara Bulian Muryati, Feny Juhanti
11 pages
Zhang 2020
No ratings yet
Zhang 2020
5 pages
Object Tracking Using Radial Basis Function Networks
No ratings yet
Object Tracking Using Radial Basis Function Networks
11 pages
Latest Project
No ratings yet
Latest Project
43 pages
25 Object Tracking
No ratings yet
25 Object Tracking
29 pages
Durbin, J., & Watson, G. S. (1951) - Testing For Serial Correlation in Least Squares Regression. II. Biometrika, 38 (12), 159.
No ratings yet
Durbin, J., & Watson, G. S. (1951) - Testing For Serial Correlation in Least Squares Regression. II. Biometrika, 38 (12), 159.
20 pages
Materials Lec07
No ratings yet
Materials Lec07
6 pages
A High Performance Object Tracking Technique With An Adaptive Search Method in Surveillance System
No ratings yet
A High Performance Object Tracking Technique With An Adaptive Search Method in Surveillance System
4 pages
Object PDF
No ratings yet
Object PDF
6 pages
Synopsis Crime
No ratings yet
Synopsis Crime
7 pages
6 - Introduction To Optimization
No ratings yet
6 - Introduction To Optimization
8 pages
22CD1101
No ratings yet
22CD1101
2 pages
CORT: Class-Oriented Real-Time Tracking For Embedded Systems
No ratings yet
CORT: Class-Oriented Real-Time Tracking For Embedded Systems
10 pages
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
No ratings yet
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
12 pages
Scalable Recognition With A Vocabulary Tree PDF
No ratings yet
Scalable Recognition With A Vocabulary Tree PDF
8 pages
Unit-3 Part 2 Indexing and Hashing
No ratings yet
Unit-3 Part 2 Indexing and Hashing
36 pages
Moving Object Recognization, Tracking and Destruction
No ratings yet
Moving Object Recognization, Tracking and Destruction
45 pages
Bioimaging 2020 39 CR PDF
No ratings yet
Bioimaging 2020 39 CR PDF
4 pages
PHD MOT CNN Proposal
No ratings yet
PHD MOT CNN Proposal
3 pages
Object Tracking Using Background Subtraction
No ratings yet
Object Tracking Using Background Subtraction
6 pages
Object Tracking Methods-A Review
No ratings yet
Object Tracking Methods-A Review
7 pages
Study On MOT
No ratings yet
Study On MOT
17 pages
Real-Time Traffic Sign Recognition Based On Efficient Cnns in The Wild
No ratings yet
Real-Time Traffic Sign Recognition Based On Efficient Cnns in The Wild
10 pages
CNNTracking TNN10 Human
No ratings yet
CNNTracking TNN10 Human
14 pages
Proposed Multi Object Tracking Algorithm
No ratings yet
Proposed Multi Object Tracking Algorithm
10 pages
Image Processing Techniques For Object Tracking in Video Surveillance A Survey 2015 2
No ratings yet
Image Processing Techniques For Object Tracking in Video Surveillance A Survey 2015 2
6 pages
Moving Object Tracking and Detection in Videos Using MATLAB: A Review
No ratings yet
Moving Object Tracking and Detection in Videos Using MATLAB: A Review
9 pages
10 1109icraie 2018 8710421
No ratings yet
10 1109icraie 2018 8710421
7 pages
Single Object Tracking A Survey of Methods Dataset
No ratings yet
Single Object Tracking A Survey of Methods Dataset
15 pages
Hedge Tracking
No ratings yet
Hedge Tracking
9 pages
Real Time Object Detection and Tracking Using Deep Learning and Opencv
No ratings yet
Real Time Object Detection and Tracking Using Deep Learning and Opencv
4 pages
A Survey On Multiple Object Detection and Tracking IJERTV3IS10574
No ratings yet
A Survey On Multiple Object Detection and Tracking IJERTV3IS10574
3 pages
AlphaGo Paper
No ratings yet
AlphaGo Paper
20 pages
Object Tracking Techniques For Video Tracking: A Survey: Mansi Manocha, Parminder Kaur
No ratings yet
Object Tracking Techniques For Video Tracking: A Survey: Mansi Manocha, Parminder Kaur
5 pages
Ijaerv10n9spl 339
No ratings yet
Ijaerv10n9spl 339
9 pages
Calafut Multiple Object Tracking in Infrared
No ratings yet
Calafut Multiple Object Tracking in Infrared
6 pages
Computer Vision Paper
No ratings yet
Computer Vision Paper
3 pages
Object Detection and Tracking in Video Sequences
No ratings yet
Object Detection and Tracking in Video Sequences
6 pages
Smart Cards
No ratings yet
Smart Cards
39 pages
Real Time Unattended Object Detection and Tracking Using MATLAB
No ratings yet
Real Time Unattended Object Detection and Tracking Using MATLAB
8 pages
Members Directory
33% (3)
Members Directory
75 pages
Data Structures A Algorithms Multiple Choice Questions Mcqs Objective Set 2
No ratings yet
Data Structures A Algorithms Multiple Choice Questions Mcqs Objective Set 2
7 pages
Object Tracking Using Radial Basis Function Networks
No ratings yet
Object Tracking Using Radial Basis Function Networks
9 pages
Parallel
No ratings yet
Parallel
13 pages
Computer Vision Based Moving Object Detection and Tracking: Suresh Kumar, Prof. Yatin Kumar Agarwal
No ratings yet
Computer Vision Based Moving Object Detection and Tracking: Suresh Kumar, Prof. Yatin Kumar Agarwal
6 pages
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
From Everand
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
Fouad Sabry
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet