Artemis 12
Artemis 12
Artemis 12
S = UD
T
into a set of matrices using singular value decomposition, where
D is a 2nm 2nm dimension matrix of right singular vectors, is a 60 2mn
dimension diagonal matrix of singular values, and U is a 60 60 dimension ma-
trix of left singular vectors. Since most of the singular values are very small or
zero, only the 10 largest singular values are considered. Therefore the reduce
space is now represented by
that has dimension 10 10,
D that has dimen-
sion 2nm 10, and
U that has dimension 60 10. The reduced dimension row
vectors in
D are then used to train a multi-class SVM, and because the data is
not linearly separable a non-linear Gaussian Radial Basis Function kernel learns
the hyperplanes used in classication.
Given a test Kinect video V the reduced dimension matrices
and
U are
used to insert each frame {F}
n
k=1
into the space spanned by the row vectors
in
D using
d
k
= s
T
k
1
, where vector s
k
contains the origin translated 3D
locations of the 20 skeletal joints in frame F
k
. Once each frame in V is inserted
into the high dimensional space they are labeled as a walk or run action by the
trained SVM. The action is then recognized using a majority vote algorithm by
determining the action that appears the most among all n frames.
2.2 Identity Classication
Given a test Kinect video of an unknown person X performing a recognized
action, the identity cost
min
P
{
M
(X, P)
A
(X, P) }
is calculated for each person P in the training data set, where
M
(X, P) is the
motion dierence and
A
(X, P) is the anthropometric dierence. The identity
of person X is recognized when the person with the least cost is found.
The motion biometric is trained as follows: Let V
w
= (F
w
1
, F
w
2
, . . . , F
w
n
) be a
training Kinect video that captures a person performing the walk action. Like
Section-2.1, S
w
= [s
1
s
k
s
n
] is constructed using each frame in the
video, however column vector s
k
now denes the r radius, azimuth, and
elevation values of the 20 skeletal joints in the kth frame
3
, which is then seperated
into three dierent 20 n dimension matrices namely: M
r
the radius matrix,
M
M
(X, P) = 1
R(h
x
r
, h
p
r
) + R(h
x
, h
p
) + R(h
x
, h
p
)
3
where R() is the correlation coecient, (h
x
r
, h
x
, h
a
, h
p
A
(X, P) =
1
2
( log
|D
p
|
|D
x
|
+ Tr(D
1
p
D
x
) +
( p
x
p
p
)
T
D
1
p
( p
x
p
p
) d)
where N( p
x
, D
x
) is the anthropometric statistical model for unknown person
X, and N( p
p
, D
p
) is the learned anthropometric statistical model for person
P, and d = 20 the dimension of the covariance matrix. In general, a small
anthropometric dierence value indicates the two people have very similar joint
proportions for the recognized action.
3 Experiments
In this section experiments are performed that evaluate the proposed systems
ability to correctly classify unknown people and their action in Kinect videos.
The performance of the proposed method is compared to the Gait Energy Vol-
ume (GEV) method [16]. In general, GEV is the 3D extension of the 2D Gait
Energy Image (GEI) [19]. Using the depth images, the tracked human silhouettes
are segmented and the segmentation results are used to isolate each gait cycle
in the video sequence. For each isolated gait cycle the results are aligned and
averaged to form the GEV. Principal Component Analysis (PCA) and Multiple
Discriminant Analysis (MDA) are used to nd a reduced dimension feature vec-
tor that well describes the GEV. This unknown feature vector is compared to
known feature vectors using a distance based measurement to recognize the iden-
tity of the tracked person in the Kinect video. In these experiments we manually
identied the gait cycle and then used the recommended settings to perform
PCA and MDA dimensionality reduction.
In Section 3.1 we describe the Kinect data sets used to train the action and
identity classiers, and in Section 3.2 we describe the Kinect data sets used to
test the accuracy of the system and for performance comparison. Both data sets
were collected using a Kinect sensor mounted on a movable cart that faced the
person performing the action. During data collection the distance between the
apparatus and the subject was roughly 1.5 to 3 meters. Lastly, we evaluate the
4
For example, p
18
= ( d(1, 2) + d(2, 3) + d(3, 5) + d(5, 6) + d(6, 7) + d(7, 8) )/d
total
.
Full-Body Motion and Anthropometric Biometrics 97
performance of the action and person identity classiers in Section 3.3 using the
well known Receiver Operating Characteristic (ROC) curve and the Cumulative
Match Curve (CMC) [20]. The ROC is also used to evaluate the sensitivity of
the method when: 1. only one biometric is used for identity classication, and
2. the number of frequencies k
f
(see Section 2.2) used to construct the (r, , )
motion histograms are changed over a range of values (Note: This is the only
free parameter in the proposed person identication system).
3.1 Training Data
The training data set included 10 people, 6 males and 4 females, where each
person executed each of the 2 basic actions 2 times. For instance, each person
has 4 Kinect videos: 2 walking and 2 running. In total, the training data set has
40 videos. Example 3D skeletons found by the Kinect sensor that illustrate the 2
basic actions are shown in Fig. 2. The activity classier was trained using all 40
videos. For each walk identity classier the motion and anthropometric biomet-
rics were trained using the two collections for that person. Likewise, for each run
identity classifer the motion and anthropometric biometrics were trained using
the two collections for that person. For the 6 male subjects the age range was
between 25 and 40 years old, and the height range was roughly between 1.73
and 1.8 meters. For the 4 female subjects the age range was between 25 and 37
years old, and the height range was roughly between 1.55 to 1.6 meters.
W
a
l
k
R
u
n
Fig. 2. Example training data. Top row: example skeletons of a person performing a
normal running action. Bottom row: example skeletons of a person performing a normal
walking action.
3.2 Test Data
Using the same 10 people in the training data set, each person in the test data
set executed each of the 2 basic actions 3 times, i.e. each person has 6 Kinect
videos: 3 walking and 3 running. In total, the test data set has 60 videos. To make
the test data set challenging, each person was asked to perform the additional
actions: First Collection (least challenging) wear a backpack that contained 20
lbs of books, Second Collection (moderately challenging) wear the same 20 lb
backpack, and carry an object in their right hand, and Third Collection (most
challenging) perform the slow moving S motion shown in Fig. 3. In general,
these collections simulate real scenarios that may be found in public gathering
areas such as airports, train stations, or shopping malls.
98 B.C. Munsell et al.
W
a
l
k
R
u
n
Fig. 3. Example Third Collection testing data. Top row: example skeletons of a person
performing the S the running action. Bottom row: example skeletons of a person
performing the S walking action.
3.3 Results
Action classication performance using majority vote was 100% for both actions,
where action classication performance per video ranges from 50.30% to 100%
with the average being 93.47%. The ROC curves in Fig. 4(a) show the Verica-
tion Rate (VR) and Equal Error Rate (EER) performance for our method and
GEV. This gure also shows the CMC Rank-1 through 6 performance for our
method and GEV. For both actions the EER and CMC Rank-1 performance of
our method is better than GEV. For the walk and run actions our method shows
a 11% and 4% EER increase in performance respectively. For the walk action
our method is 90% accurate by Rank-3, whereas GEV is still hovering around
88% by Rank-6, and for the run action the Rank-1 performance of our method
is 90% while GEV does not achieve 90% until Rank-3. The ROC curves in this
gure also show the motion biometric is not overly sensitive to k
f
the number
of frequencies used to construct the motion histograms. In fact, the ROC curves
show roughly the same performance when k
f
is three or ve times greater than
10. This suggests the joint motion patterns may be adequately described by the
top 10 frequencies with the largest magnitude.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RUN ROC Curve
False Alarm Rate
V
e
r
if
ic
a
t
io
n
R
a
t
e
k
f
=10
(b) (a)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RUN ROC Curve
False Alarm Rate
V
e
r
if
ic
a
t
io
n
R
a
t
e
k
f
=10
k
f
=20
k
f
=50
GEV
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
WALK ROC Curve
False Alarm Rate
V
e
r
if
ic
a
t
io
n
R
a
t
e
k
f
=10
k
f
=20
k
f
=50
GEV
1 2 3 4 5 6
0
10
20
30
40
50
60
70
80
90
100
WALK CMC
Rank
C
u
m
u
la
t
iv
e
A
c
c
u
r
a
c
y
(
%
)
k
f
=10
GEV
1 2 3 4 5 6
0
10
20
30
40
50
60
70
80
90
100
RUN CMC
Rank
C
u
m
u
la
t
iv
e
A
c
c
u
r
a
c
y
(
%
)
k
f
=10
GEV
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RUN ROC Curve
False Alarm Rate
V
e
r
if
ic
a
t
io
n
R
a
t
e
k
f
=10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
WALK ROC Curve
False Alarm Rate
V
e
r
if
ic
a
t
io
n
R
a
t
e
k
f
=10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
WALK ROC Curve
False Alarm Rate
V
e
r
if
ic
a
t
io
n
R
a
t
e
k
f
=10
EER=11%
EER=19%
EER=18%
EER=15%
EER=26%
EER=20% EER=20%
EER=22%
Fig. 4. For both actions (a) Top Row: VR and EER performance comparison between
our method (k
f
= 10, 30, 50) and GEV [16]. Bottom Row: CMC Rank-1 through 6
performance comparison between our method (k
f
= 10) and GEV. (b) Top Row: VR
and EER performance using only the motion biometric (k
f
= 10). Bottom Row: VR
and EER performance using only the anthropometric biometric (k
f
= 10).
Full-Body Motion and Anthropometric Biometrics 99
Figure 4(b) shows the VR and EER performance when only the motion and
anthropometric biometric is used by the identity classier. As seen in these
ROC curves, person identication is more accurate when both biometrics are
used by the identity classier. For the walk action the anthropometric EER
performance is slightly better than the motion biometric, which suggests the
anthropometric biometric guides the motion biometric. However, for the run
action the discriminative power of the motion biometric is high, requiring less
help from the anthropometric biometric.
Since the computation complexity of the SVD and SVM algorithms are
O(pq
2
+ p
2
q + q
3
) [21] and O(q
2
) [22] respectively, the computational complex-
ity of the action classier O(q
3
) where q = 2mn. The space complexity of
the action classier is O(q
2
), i.e. the size of the right singular value matrix D.
An analysis of Algorithm-1 shows the computational complexity of the identity
classier is O(nNlogN), and the space complexity of the identity classier is
O(n), i.e. the column dimension of the radius, azimuth, and elevation matrices.
On a 2.4GHz Intel Core 2 Quad CPU, the total time needed to train the ac-
tion classier was 32 min, and the time needed to train one identity classifer
was 30 ms.
4 Conclusion
In conclusion, a novel person identity method that uses full-body motion and
anthropometric biometrics derived from Kinect videos was presented. Dierent
form traditional gait-based methods that attempt to isolate and examine the gait
cycle in the video sequence, our method considers the entire track sequence and
examines the periodic motion of upper and lower extremity joints found by the
Kinect SDK that have the largest contribution to the action being performed.
Challenging test data sets where constructed that have a variety of basic actions
with varying levels of complexity. Experiments showed that the proposed method
has an average ROC EER of 13% and an average CMC Rank-1 identication rate
of 90%. Performance comparisons where conducted using a gait-based method
that uses depth images produced by the Kinect sensor. The results showed our
method to have better performance. Experiments were also conducted to assess
the individual sensitivities of the two biometrics, and the results suggest both
biometrics are needed for person identication. We also show the motion biomet-
ric is not overly sensitive to the number of frequencies used to build the motion
histograms.
References
1. Turk, M.A., Pentland, A.P.: Face recognition using eigenfaces. In: IEEE Conference
on Computer Vision and Pattern Recognition, pp. 586591 (1991)
2. Jain, A., Hong, L., Bolle, R.: Online ngerprint verication. IEEE Transactions on
Pattern Analysis and Machine Intelligence 19, 302314 (1997)
100 B.C. Munsell et al.
3. Ma, L., Tan, T., Wang, Y., Zhang, D.: Personal identication based on iris texture
analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 25,
15191533 (2003)
4. Ross, A., Dass, S., Jain, A.: Fingerprint warping using ridge curve correspondences.
IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 1930 (2006)
5. Lu, X., Jain, A.: Deformation modeling for robust 3D face matching. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence 30, 13461357 (2008)
6. Pillai, J., Patel, V., Chellappa, R., Ratha, N.: Secure and robust iris recognition us-
ing random projections and sparse representations. IEEE Transactions on Pattern
Analysis and Machine Intelligence 33, 18771893 (2011)
7. Hong, L., Jain, A.: Integrating faces andngerprints for personal identication. IEEE
Transactions on Pattern Analysis and Machine Intelligence 20, 12951307 (1998)
8. Chang, K., Bowyer, K., Sarkar, S., Victor, B.: Comparison and combination of ear
and face images in appearance-based biometrics. IEEE Transactions on Pattern
Analysis and Machine Intelligence 25, 11601165 (2003)
9. Murase, H., Sakai, R.: Moving object recognition in eigenspace representation: gait
analysis and lip reading. Pattern Recognition Letters 17, 155162 (1996)
10. Cutler, R., Davis, L.S.: Robust real-time periodic motion detection, analysis, and
applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 22,
781796 (2000)
11. Boyd, J.E., Little, J.J.: Biometric Gait Recognition. In: Tistarelli, M., Bigun, J.,
Grosso, E. (eds.) Advanced Studies in Biometrics. LNCS, vol. 3161, pp. 1942.
Springer, Heidelberg (2005)
12. Gafurov, D., Helkala, K., Sndrol, T.: Biometric gait authentication using ac-
celerometer sensor. JCP 1, 5159 (2006)
13. Abdelkader, C.B., Davis, L., Cutler, R.: Stride and cadence as a biometric in
automatic person identication and verication. In: IEEE International Conference
on Automatic Face and Gesture Recognition, pp. 372377 (2002)
14. Campbell, L., Bobick, A.: Recognition of human body motion using phase space
constraints. In: International Conference on Computer Vision, pp. 624630 (1995)
15. Little, J., Boyd, J.E.: Recognizing people by their gait: The shape of motion.
Videre 1, 132 (1996)
16. Sivapalan, S., Chen, D., Denman, S., Sridharan, S., Fookes, C.B.: Gait energy
volumes and frontal gait recognition using depth images. In: International Joint
Conference on Biometrics (2011)
17. Gu, J., Ding, X., Wang, S., Wu, Y.: Action and gait recognition from recovered
3-D human joints. Trans. Sys. Man Cyber. Part B 40, 10211033 (2010)
18. Green, R.D., Guan, L.: Quantifying and recognizing human movement patterns
from monocular video images - part ii: Applications to biometrics. IEEE Transac-
tions on Circuits and Systems for Video Technology 14, 179190 (2003)
19. Han, J., Bhanu, B.: Individual recognition using gait energy image. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence 28, 316322 (2006)
20. Bolle, R.M., Connell, J.H., Pankanti, S., Ratha, N.K., Senior, A.W.: The relation
between the roc curve and the cmc. In: Proceedings of the Fourth IEEE Workshop
on Automatic Identication Advanced Technologies, pp. 1520 (2005)
21. Brand, M.: Incremental Singular Value Decomposition of Uncertain Data with
Missing Values. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV
2002, Part I. LNCS, vol. 2350, pp. 707720. Springer, Heidelberg (2002)
22. Fan, R.E., Chen, P.H., Lin, C.J.: Working set selection using second order informa-
tion for training support vector machines. Journal of Machine Learning Research 6,
18891918 (2005)