Kinect Is Awesome
Kinect Is Awesome
Kinect Is Awesome
858
859
tracking system. Figure 1 shows main components of the Kinect sensor which has
been used as the main device in this research. A standard CMOS image sensor
receives the projected structured infrared (IR) light pattern and processes the IR
image and produces an accurate per-frame depth image of the scene (PrimeSense,
2011).
Y (mm)
3D DEPTH SENSORS
RGB CAMERA
Kinect 3D Coordinates
(0,0,0)
Z (mm)
X (mm)
BACKGROUND
In the construction field, progress monitoring is an essential part of a project,
which assists project managers in formulating strategies and making decisions in
resource allocation in order to keep the project on track. There have been several
techniques used to achieve the task of construction progress monitoring. Image
processing based systems, 3D laser scanning methods, radio-frequency identification
(RFID) tags, bar codes and embedded sensor systems are the leading technologies.
The necessity of adding new tasks that need to be performed before, during or after
the utilization of such technologies at a construction site is the main drawback in the
application of laser scanners, RFID tags and embedded sensors (El-Omar & Moselhi,
2008; Kiziltas et al. 2008). RFID reduces performance when proximity of metals and
tag size increases with increasing transmitting power. RFID sensors and bar codes
need additional infrastructure to detect items and are time-consuming to set up. They
are often costly and cannot be attached to many types of components. Laser scanners
are also very costly, require operation expertise and may generate erroneous results
within dynamic scenes. Peddi et al. (2009) developed a human pose recognition
system to measure worker performance. However, this can be used only for selected
construction activities which need to have unique human working poses to recognize
the performance. The infrastructure cost, regular operational cost and range limitation
are the common drawbacks of these methods.
SYSTEM INFORMATION
The tracking system is developed to detect the construction worker locations
and their movements within a given work area. The software used to develop all
image processing algorithms is MATLAB R2010a. All real-time RGB images and
depth information are obtained from the kinect sensor. The worker tracking algorithm
is based on the skeletonise figures, and characteristics of the hardhat (i.e. unique
shape and colour of the hardhat). Furthermore, feature based template matching is
used as the pattern recognition technique to detect hardhat shapes of the image. To
reduce the complexity of the research study, two basic assumptions have been made
about the site-end condition as follows;
1.
2.
860
861
Specification
58 H, 45 V, 70 D
VGA (640x480)
3mm
1cm
60fps
0.8m - 3.5m
Two microphones
2.25W
Indoor
862
TRACKING PROCEDURE
According to the assumptions mentioned above, all workers are expected to
wear hardhats when they are in the site. Hence, hardhat can be used as the key
tracking object to represent the worker in the field. In brief, the system recognizes
human figures in the video by detecting the hardhat shape. Then the tracked human is
differentiated based on their job title (ex. labour, supervisor, etc.) based on the color
of the hardhat. Following diagram (Figure 3) illustrates the main tracking stages of
this proposed study.
Skeleton image
tracking
Color
Filtration
Pattern
Recognition
Object
Localization
863
(Jain et al. 2000). Template matching can be classified into two categories: feature
based and template based matching. In this study, template based template matching
is considered as the pattern recognition method since the template does not contain
strong features and whole image constitutes the matching image. Generally, two
common matching algorithms are often used to measure the similarity between two
images: minimum-distance measurement and the correlation measurement. Although
the minimum-distance measurement technique calculates the distance rapidly, it is
easily disturbed by noise (Chang et al. 2009). Hence, correlation measurement is
used. In intensity based correlation, the algorithm uses a certain similarity
measurement to compare the gray value of the corresponding pixel points in the
template and the image. The max-correlation measurement can match the target
exactly even if there is noise in the image. In order to recognize hardhat shapes in the
image, comprehensive database of hardhat images and their characteristic features are
developed (Weerasinghe and Ruwanpura 2010). The characteristic features will
include both low-level image and shape features. Each image in the database contains
a hardhat that is typically used in the construction site. The reference images are
captured based on a spherical grid. The angle between two grid points at the centre is
kept to 20 degrees and all the reference images are captured from a distance of 45cm
away from the camera. Figure 6 shows six raw images and the grid of the viewing
sphere.
864
865
Where , and , are intensities of reference image and the filtered color blob (grayscale format) with (m, n) dimensions. And , represent row and column vectors. If
correlation value exceeds the threshold level, the color blob is considered as a
hardhat.
Object localization
The matlab executable files (Kroon, 2011) used to extract kinect sensory information
are able to generate local 3D coordinates of a target with respect to the kinect device.
Therefore, for a better utilization, these local coordinates need to be transformed into
building coordinate system. First, the camera calibration procedure will be followed
to determine interior and exterior parameters of the kinect camera (Brown, 1971).
These parameters will be used to apply 3D transformation from the camera coordinate
system to the building coordinate system. A set of ground control points onsite is
used to calibrate the location and the rotation angles with respect to the building
coordinate system. The mathematical model for the 3D transformation is illustrated
below (Zeng, 2010);
Where
is ground coordinate vector,
is translational (shift) vector, scale
factor,
is the rotational relationship between the image and the ground
coordinate systems and is image coordinate vector. Once the above equation is
applied, ground coordinates of the kinect device can be determined.
RESULTS
Figure 8 shows the results of the worker tracking system when multiple workers in
different disciplines are on-site. As an example it recognizes people on-site and
differentiates them according to their coloured hardhats. Then worker identity number
and category are displayed in the middle of the person. In addition, transformed
building coordinates of tracked workers are displayed in a separate table which also
includes block/area number currently they are working on.
866
activities in a future research. This study may lead to determine worker performance
and assists project managers and planners as a planning tool in developing strategies
for improving labour productivity and labour allocation.
ACKNOWLEDGEMENT
The authors wish to acknowledge the support and funding for this research
project provided by Canadian Foundation for Innovation and CANA, PCL, Ellis Don,
Graham, Ledcor, Revay & Associates, Stuart Olson, Canadian Construction Research
Board, Calgary Construction Association, and the Natural Sciences Engineering
Research Council under their Collaborative Research and Development Grant CRDPJ
341047 06.
REFERENCE
Brown, D. (1971). Close range camera calibration. Journal of Photogrammetric
Engineering & Remote Sensing , 37 (8), 855-866.
Chang, F., Chen, Z., Wang, W., & Wang, L. (2009). The Hausdorff Distance
Template Matching Algorithm Based On Kalman Filter for Target Tracking.
Proceedings of the IEEE: International Conference on Automation and
Logistics (pp. 836-40). Shenyang: IEEEE.
El-Omar, S., & Moselhi, O. (2008). Integrating 3D laser scanning and
photogrammetry for progress measurement of construction work. Automation
in Construction , 18 (1), 1-9.
Jain, A. K., Duin, R. P., & Jianchang, M. (2000). Statistical pattern recognition: a
review. IEEE Transactions on Pattern Analysis and Machine Intelligence , 22
(1), 4-37.
Kiziltas, S., Akinci, B., Ergen, E., Pingbo, T., & Gordon, C. (2008). Technological
assessment and process implications of field data capture technologies for
construction and facility/infrastructure management. Electronic Journal of
Information Technology in Construction , 13, 134-54.
Kroon, D.-J. (2011). Matlab Central. Retrieved 11 20, 2011, from Kinect Matlab:
http://www.mathworks.com/matlabcentral/fileexchange/30242
Peddi, A., Huan, L., Bai, Y., & Kim, S. (2009). Development of Human Pose
Analyzing Algorithms for the Determination of Construction Productivity in
Real-time. Building a Sustainable Future - Proceedings of the 2009
Construction Research Congress (pp. 11-20). Seattle, WA, United states:
American Society of Civil Engineers.
PrimeSense. (2011). Kinect Technical Overview. Retrieved 12 04, 2011, from
PrimeSensor
Reference
Design
(Kinect
IR
laser):
http://primesense.360.co.il/files/FMF_2.PDF
Weerasinghe, I. P., & Ruwanpura, J. (2010). Automated Multiple Objects Tracking
System (AMOTS). Proceedings of the 2010 Construction Research Congress
(pp. 11-20). Banff, AB, Canada: ASCE.
Zeng, H. (2010). A 3D coordinate transformation algorithm. 2nd Conference on
Environmental Science and Information Application Technology, ESIAT 2010
(pp. 195-198). Wuhan, China: IEEE.
867