urfd

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Kwolek and Kepski

Human fall detection on embedded platform


using depth maps and wireless accelerometer
Bogdan Kwolek1* and Michal Kepski2

Abstract
Since falls are a major public health problem in an ageing society, there is considerable demand for low-cost fall
detection systems. One of the main reasons for non-acceptance of the currently available solutions by seniors is
that the fall detectors using only inertial sensors generate too much false alarms. This means that some daily
activities are erroneously signaled as fall, which in turn leads to frustration of the users. In this paper we
present how to design and implement a low-cost system for reliable fall detection with very low false alarm
ratio. The detection of the fall is done on the basis of accelerometric data and depth maps. A tri-axial
accelerometer is used to indicate the potential fall as well as to indicate whether the person is in motion. If the
measured acceleration is higher than an assumed threshold value, the algorithm extracts the person, calculates
the features and then executes the SVM-based classifier to authenticate the fall alarm. It is a 365/7/24
embedded system permitting unobtrusive fall detection as well as preserving privacy of the user.
Keywords: Fall detection; Depth image analysis; Assistive technology; Sensor technology for smart homes

1 Introduction advantages over other sensors in terms of cost, weight,


size, power consumption, ease of use and, most impor-
Assistive technology or adaptive technology is an um-
tantly, portability. Therefore, in the last decade, many
brella term that encompasses assistive and adaptive
different methods based on inertial sensors were de-
devices for people with special needs [1, 2]. Special
veloped to detect human falls. Falls are a major cause
needs and daily living assistance are often associated
of injury for older people and a significant obstacle
with seniors, disabled, overweight and obese, etc. As-
sistive technology for ageing-at-home has become a hot in independent living of the seniors. They are one of
research topic since it has big social and commercial the top causes of injury-related hospital admissions in
value. One important aim of assistive technology is to people aged 65 years and over. The statistical results
allow elderly people to stay as long as possible in their demonstrate that at least one-third of people aged 65
home without changing their living style. years and over fall one or more times a year [5]. An in-
Wearable sensor-based systems for health monitor- jured elderly may be laying on the ground for several
ing are an emerging trend and in the near future hours or even days after a fall incident has occurred.
they are expected to make possible proactive personal Therefore, significant attention has been devoted to
health monitoring along with better medical treat- developing an efficient wearable system for human fall
ment. Inertial measurement units (IMUs) are low-cost detection [6, 7, 8, 9].
and low power consumption devices with many po-
tential applications. Current miniature inertial sensors 1.1 IMU based approaches to fall detection
can be integrated into clothes or shoes [3]. Inertial The most common method for wearable-device based
tracking technologies are becoming widely accepted for fall detection consists in the use of a tri-axial ac-
the assessment of human movement in health moni- celerometer and a threshold-based algorithm for trig-
toring applications [4]. Wearable sensors offer several gering an alarm. Such algorithms raise the alarm when
the acceleration is larger than a threshold value [10].
*
Correspondence: bkw@agh.edu.pl A variety of accelerometer-based methods and tools
1
AGH University of Science and Technology, 30 Mickiewicza Av., 30-059
Kraków, Poland have been proposed for fall detection [11]. Typically,
Full list of author information is available at the end of the article such algorithms require relatively high sampling rate.
Kwolek and Kepski Page 2 of 14

However, most of them discriminate poorly between data collection is no longer constrained to laboratory
activities of daily living (ADLs) and falls, and none environments. In fact, it is the only technology that
of which is universally accepted by elderly. One of the was successfully used in large scale collection of people
main reasons for non-acceptance of the currently avail- motion data.
able solutions by seniors is that the fall detectors using
only accelerometers generate too much false alarms. 1.2 Camera based approaches to fall detection
This means that some daily activities are erroneously Video-cameras have largely been used for detecting
signaled as fall, which in turn leads to frustration of falls on the basis of single CCD camera [15, 16], multi-
the users. ple cameras [17], specialized omni-directional ones [18]
The main reason of high false ratio of accelerometer- and stereo-pair cameras [19]. Video based solutions of-
based systems is the lack of adaptability together with fer several advantages over others including the ca-
insufficient capabilities of context understanding. In pability of detection of various activities. The further
order to reduce the number of false alarms, many at- benefit is low intrusiveness and the possibility of re-
tempts were undertaken to combine both accelerome- mote verification of fall events. However, the currently
ter and gyroscope [6, 12]. However, several ADLs like available solutions require time for installation, camera
quick sitting have similar kinematic motion patterns calibration and they are not cheap. As a rule, CCD-
with real falls and in consequence such methods might camera based systems require a PC computer or a
trigger many false alarms. As a result, it is not easy to notebook for image processing. While these techniques
distinguish real falls from fall-like activities using only might work well in controlled environments, in order
to be practically applied they must be adapted to non-
accelerometers and gyroscopes. Another drawback of
controlled environments in which neither the lighting
the approaches based on wearable sensors, from the
nor the subject tracking is fully controlled. Typically,
user’s perspective, is the need to wear and carry vari-
the existing video-based devices for fall detection can-
ous uncomfortable devices during normal daily life ac-
not work in nightlight or low light conditions. Addi-
tivities. In particular, the elderly may forget to wear
tionally, the lack of depth information can lead to lots
such devices. Moreover, in [13] it is pointed out that
of false alarms. What is more, their poor adherence to
the common fall detectors, which are usually attached
real-life applications is particularly related to privacy
to a belt around the hip, are inadequate to be worn
preserving. Nevertheless, these solutions are becom-
during the sleep and this results in the lack of ability of
ing more accessible, thanks to the emergence of low-
such detectors to monitor the critical phase of getting cost cameras, the wireless transmission devices, and
up from the bed. the possibility of embedding the algorithms. The ma-
In general, the solutions mentioned above are some- jor problem is acceptance of this technology by the se-
how intrusive for people as they require wearing con- niors as it requires the placement of video cameras in
tinuously at least one device or smart sensor. On the private living quarters, and especially in the bedroom
other hand, these systems, comprising various kinds and the bathroom.
of small sensors, transmission modules and processing The existing video-based devices for fall detecting
capabilities, promise to change the personal care, by cannot work in nightlight or low light conditions. In ad-
supplying low-cost wearable unobtrusive solutions for dition, in most of such solutions the privacy is not pre-
continuous all-day and any-place health and activity served adequately. On the other hand, video cameras
status monitoring. An example of such solutions with offer several advantages in fall detection over wear-
a great potential are smart watches and smartphone- able devices-based technology, among others the abil-
based technologies. For instance, in iFall application ity to detect and recognize various daily activities. Ad-
[14], data from the accelerometer is evaluated using ditional advantage is low intrusiveness and the possi-
several threshold-based algorithms and position data bility of remote verification of fall events. However,
to determine the person’s fall. If a fall is inferred, a no- the lack of depth information may lead to many false
tification is raised requiring the user’s response. If the alarms. The existing technology permits reaching quite
user does not respond, the system sends alerts message high performance of fall detection. However, as men-
via SMS. tioned above it does not meet the requirements of the
Despite several shortcomings of the currently avail- users with special needs.
able wearable devices, the discussed technology has a Recently, Kinect sensor has been proposed to achieve
great potential, particularly, in the context of grow- fall detection [20, 21, 22]. The Kinect is a revolutionary
ing capabilities of signal processors and embedded sys- motion-sensing technology that allows tracking a per-
tems. Moreover, owing to progress in this technology, son in real-time without having to carry sensors. It is
Kwolek and Kepski Page 3 of 14

the world’s first low-cost device that combines an RGB to short overview of the algorithm. A threshold-based
camera and a depth sensor. Thus, if only depth images detection of the person fall is described in Section 4.
are used it preserves the person’s privacy. Unlike 2D In Section 5 we give details about extraction of the
cameras, it allows tracking the body movements in 3D. features representing the person in depth images. The
Since the depth inference is done using an active light classifier responsible for detecting human falls is pre-
source, the depth maps are independent of external sented in Section 6. The experimental results are dis-
light conditions. Owing to using the infrared light, the cussed in Section 7. Section 8 provides some concluding
Kinect sensor is capable of extracting the depth maps remarks.
in dark rooms. In the context of reliable fall detection
systems, which should work 24 hours a day and 7 days 2 The embedded system for human fall
a week it is very important capability, as we already detection
demonstrated in [21].
This Section is devoted to presentation of the main
1.3 Overview of the method ingredients of the embedded system for human fall
In order to achieve reliable and unobtrusive fall de- detection. At the beginning, the architecture of the
tection, our system employs both the Kinect sensor embedded system for fall detection is outlined. Next,
and a wearable motion-sensing device. When both de- the PandaBoard is drafted briefly. Following that, the
vices are used our system can reliably distinguish be- wearable device is presented in detail. Then, the Kinect
tween falls and activities of daily living. In such a con- sensor and its usefulness for fall detection are discussed
figuration of the system the number of false alarms shortly. Finally, data processing, feature extraction
is diminished. The smaller number of false alarms is along with classification modules are discussed briefly
achieved owing to visual validation of the fall alert in the context of the limited computational resources
generated on the basis of motion data only. The au- of the utilized embedded platform.
thentication of the alert is done on the basis of depth
data and analysis of the features extracted on depth 2.1 Main ingredients of the embedded system
maps. Owing to the determined in advance parame- Our fall detection system uses both data from Kinect
ters describing the floor the system analyses not only and motion data from a wearable smart device contain-
the shape of the extracted person but also the distance ing accelerometer and gyroscope sensors. On the basis
between the person’s center of gravity and the floor. of data from the inertial sensor the algorithm extracts
In situations in which the use of the wearable sensor motion features, which are then used to decide if a fall
might not be comfortable, for instance during changing took place. In the case of the fall the features repre-
clothes, bathing, washing oneself, etc., the system can senting the person in the depth images are dispatched
detect falls using depth data only. In the areas of the to a classifier, see Fig. 1.
room being outside of the Kinect field of view the sys-
tem can operate using data from motion-sensing device 2.2 Embedded platform
consisting of an accelerometer and a gyroscope only. The computer used to execute depth image analysis
Thanks to automatic extraction of the floor no calibra- and signal processing is the PandaBoard ES, which
tion of the system is needed and Kinect can be placed is a mobile development platform, enabling software
according to the user preferences at the height of about developers access to an open OMAP4460 processor-
0.8 − 1.2 m. Owing to using of depth maps only our based development platform. It features a dual-core
system preserves privacy of people undergoing moni- 1 GHz ARM Cortex-A9 MPcore processor with Sym-
toring as well as it can work at nighttime. The price of metric Multiprocessing (SMP), a 304 MHz PowerVR
the system along with working costs are low thanks to SGX540 integrated 3D graphics accelerator, a pro-
the use of low-cost Kinect sensor and low-cost Pand- grammable C64x DSP, and 1 GB of DDR2 SDRAM.
aBoard ES, which is a low-power, single-board com- The board contains wired 10/100 Ethernet along with
puter development platform. The algorithms were de- wireless Ethernet and Bluetooth connectivity. The
veloped with respect to both computational demands PandaBoard ES can support various Linux-based op-
as well as real-time processing requirements. erating systems such as Android, Chrome and Linux
The rest of the paper is organized as follows. Sec- Ubuntu. The booting of the operating system is from
tion 2 gives an overview of the main ingredients of SD memory card. Linux is well suited operating sys-
the system, together with the main motivations for tem for real-time embedded platforms since it provides
choosing the embedded platform. Section 3 is devoted various flexible inter-process communication methods,
Kwolek and Kepski Page 4 of 14

Figure 1 The architecture of the embedded system for fall detection.

among others message queues. Another advantage of also allow on-board storage of data for later analy-
using Linux in an embedded device is rich availability sis. The x-IMU consists of triple axis 16-bit gyroscope
of tools and therefore it has been chosen for managing and triple axis 12-bit accelerometer. The first sensor
the hardware and software of the selected embedded measures acceleration, the rate of change in velocity
platform. across time, whereas the gyroscope delivers us the rate
The data acquired by x-IMU inertial device with of change of the angular position over time (angular
256 Hz are transmitted wirelessly via Bluetooth to the velocity) with a unit of [deg/s]. The acceleration is
processing device, whereas the Kinect sensor was con- measured in units of [g].
nected to the device via USB, see Fig. 1. The fall detec- The measured acceleration components were median
tion system runs under Linux operating system. The filtered with a window length of three samples to sup-
application consists of five main concurrent processes press the sensor noise. The accelerometric data were
that communicate via message queues, see Fig. 1. Mes- utilized to calculate the acceleration’s vector length.
sage queues are appropriate choice for well structured Figure 2 shows a sample plot of acceleration vector
data and therefore they were selected as a communi- length vs. time for a person walking up and down the
cation mechanism between the concurrent processes. stairs, and after that sitting down. The plot depicts
They provide asynchronous communication that is motion data of a person older than 65 years of age.
managed by Linux kernel. The first process is respon- As illustrated on the discussed plot, for typical daily
sible for acquiring data from the wearable device, the activities of an elderly the acceleration assumes quite
second one acquires depth data from the Kinect, third considerable values. As we can observe, during a rapid
process continuously updates the depth reference im- sitting down the acceleration value equal to 3.5 has
age, fourth one is responsible for data processing and been exceeded. Such a value is assumed very often as
feature extraction, whereas the fifth process is ac- a decision threshold in simple threshold-based algo-
countable for data classification and triggering the fall rithms for fall detection [10, 8]. Therefore, in order
alarm. The extraction of the person on the basis of to reduce the number of false alarms, in addition to
the depth reference maps has been chosen since the the measurements from the inertial sensor we employ
segmentation can be done with relatively low compu- the Kinect sensor whenever it is only possible. The
tational costs. The dual-core processor allows parallel depicted plots were obtained for the IMU device that
execution of processes responsible for the data acqui- was worn near the pelvis region. It is worth noting that
sition and processing. the attachment of the wearable sensor near the pelvis
region or lower back is recommended in the literature
2.3 The inertial device [11] because such a body part represents the major
The person movement is sensed by an x-IMU [23], component of body mass and undergoes movement in
which is a versatile motion sensing platform. Its host most activities.
of on-board sensors, algorithms, configurable auxil-
iary port and real-time communication via USB, Blue- 2.4 Kinect sensor
tooth or UART make it powerful smart motion sens- The Kinect sensor simultaneously captures depth and
ing sensor. The on-board SD card, USB-based battery color images at a frame rate of about 30 fps. The
charger, real-time clock and motion trigger wake up device consists of an infrared laser-based IR emitter,
Kwolek and Kepski Page 5 of 14

Figure 3 Color images (top row) and the corresponding depth


images (bottom row) shot by Kinect in various lighting
conditions, ranging from day lighting to late evening lighting.

well as it provides the interface for physical devices and


Figure 2 Acceleration over time for typical ADLs performed for middleware components. The acceleration compo-
by an elderly. nents were median filtered with a window length of
three samples. The size of the window has been deter-
mined experimentally with regard to noise supression
an infrared camera and an RGB camera. The depth as well as computing power of the utilized platform. A
sensor consists of an infrared laser emitter combined nearest neighbor-based interpolation was executed on
with a monochrome CMOS sensor, which captures 3D the depth maps in order to fill the holes in the depth
data streams under any ambient light conditions. The map and to get the map with meaningful values for
CMOS sensor and the IR projector form a stereo pair all pixels. The median filter with a 5 × 5 window on
with a baseline of approximately 75 mm. The sensor the depth array has been executed to make the data
has an angular field of view of fifty-seven degrees hor- smooth. Afterwards, the features are extracted both
izontally and forty-three degrees vertically. The mini- on motion data and the depth maps, see Feature Ex-
mum range for the Kinect is about 0.6 m and the max- traction block on Fig. 4. The depth features were then
imum range is somewhere between 4-5 m. The device forwarded to the SVM classifier responsible for distin-
projects a speckle pattern onto the scene and infers guishing between ADLs and falls.
the depth from the deformation of that pattern. In
order to determine the depth it combines such a struc- 3 Overview of the algorithm
tured light technique with two classic computer vision
techniques, namely depth from focus and depth from At the beginning, motion data from IMU along with
stereo. Pixels in the provided depth images indicate depth data from the Kinect sensor are acquired. The
the calibrated depth in the scene. The depth resolu- data is then median filtered to suppress the noise. Af-
tion is about 1 cm at 2 m distance. The depth map is ter such a preprocessing the depth maps are stored in
supplied in VGA resolution (640 × 480 pixels) on 11 circular buffer see Fig. 5. The storage of the data in a
bits (2048 levels of sensitivity). Figure 3 depicts sam- circular buffer is needed for the extraction of the depth
ple color images and the corresponding depth maps, reference image, which in turn allows us extraction of
which were shot by the Kinect in various lighting con- the person. In the next step the algorithm verifies if
ditions, ranging from day lighting to late evening one. the person is motion. This operation is carried out on
As we can observe, owing to the Kinect’s ability to the basis of the accelerometric data and thus it is real-
extract the depth images in unlit rooms the system is ized with low computational cost. When the person is
able to detect falls in the late evening or even in the at rest, the algorithm acquires new data. In particular,
nighttime. no update of the depth reference map takes place if no
movement of the person has been detected in one sec-
2.5 The system for fall detection ond period. If a movement of the person takes place,
The system detects falls using Support Vector Machine the algorithm extracts the foreground. The foreground
(SVM), which has been trained off-line, see Fig. 4. The is determined through subtraction of the current depth
system acquires depth images using OpenNI (Open map from the depth reference image.
Natural Interaction) library. OpenNI framework sup- Given the extracted foreground, the algorithm de-
plies an application programming interface (API) as termines the connected components. In the case of the
Kwolek and Kepski Page 6 of 14

Figure 4 An overview of fall detection process.

scene change, for example, if a new object appears


in the scene, the algorithm updates the depth refer-
ence image. We assume that the scene change takes
place, when two or more blobs of sufficient area ap-
pear in the foreground image. Subsequently, given the
accelerometric data, the algorithm examines whether
a fall took place. In the case of possible fall the al-
gorithm extracts the person along with his/her fea-
tures in the depth map. The extraction of the fore-
ground is done through differencing the current depth
map from the depth reference map. Next, the algo-
rithm removes from the binary image all connected
components (objects) that consist of fewer pixels than
assumed number of pixels. After that, the person is
segmented through extracting the largest connected
component in the thresholded difference map. Finally,
the classifier is executed to acknowledge the occurrence
of the fall.

4 Threshold-based fall detection


On the basis of the data acquired by the IMU device
the algorithm indicates a potential fall. In the flow
chart of the algorithm, see Fig. 5, a block Potential
fall represents the recognition of the fall using data
from the inertial device. Figure 6 represents sample
plots of the acceleration and angular velocities for
falling along with daily activities like going down the
stairs, picking up an object, and sitting down – stand-
ing up.
The x-IMU inertial device consists of triple-axis 12-
bit accelerometer and triple-axis 16-bit gyroscope. The
sampled acceleration components were used to calcu-
late the total sum vector SVT otal (t) as follows:
q
SVT otal (t) = A2x (t) + A2y (t) + A2z (t) (1)

Figure 5 Flow chart of the algorithm for fall detection. where Ax (t), Ay (t), Az (t) is the acceleration in the x−,
y−, and z−axes at time t, respectively. The SVT otal
Kwolek and Kepski Page 7 of 14

Figure 6 Acceleration (top row) and angular velocity (bottom row) over time for walking downstairs and upstairs, picking up an
object, sitting down - standing up and falling.

contains both the dynamic and static acceleration 5.1 Extraction of the object of interest in depth maps
components, and thus it is equal to 1 g for standing, In order to make the system applicable in a wide
see plots of acceleration change curves in upper row range of scenarios we elaborated a fast method for up-
of Fig. 6. As we can observe on the discussed plots, dating the depth reference image. The person was de-
during the process of falling the acceleration attained tected on the basis of a scene reference image, which
the value of 6 g, whereas during walking downstairs was extracted in advance and then updated on-line.
and upstairs it attained the value of 2.7 g. It is worth In the depth reference image each pixel assumes the
noting that the data were acquired by x-IMU, which median value of several pixels values from the past im-
was worn by a middle aged person (60+). The plots ages, see Fig. 7. In the set-up stage we collect a number
shown in the bottom row illustrate the corresponding of the depth images, and for each pixel we assemble a
change of angular velocities. As we can see, the change list of the pixel values from the former images, which is
of the angular velocities during the process of falling then sorted in order to extract the median. Given the
is the most significant in comparison to non-fall activ- sorted lists of pixels the depth reference image can be
ities. However, in practice, it is not easy to construct a updated quickly by removing the oldest pixels and up-
reliable fall detector with almost null false alarms ratio dating the sorted lists with the pixels from the current
using the inertial data only. Thus, our system employs depth image and then extracting the median value. We
a simple threshold-based detection of falls, which are found that for typical human motions, satisfactory re-
then verified on the basis of analysis of the depth im- sults can be obtained using 13 depth images. For the
ages. If the value of SVT otal is greater than 3 g then Kinect acquiring the images at 30 Hz we take every
the system starts the extraction of the person and then fifteenth image.
executes the classifier responsible for the final decision
The images shown in the 3rd row of Figure 8 are the
about the fall, see also Fig. 5.
binary images with the foreground objects, which were
obtained using the discussed technique. In the middle
5 Extraction of the features representing row there are the raw depth images, whereas in the
person in depth images upper one there are the corresponding RGB images.
In this Section we demonstrate how the features rep- The RGB images are not processed by our system and
resenting the person undergoing monitoring are ex- they are only depicted for illustrative purposes. In the
tracted. At the beginning we discuss the algorithm for image #410 the person closed the door, which then
person delineation in the depth images. Then, we ex- appears on the binary image being a difference map
plain how to automatically estimate the parameters of between the current depth image and the depth ref-
the equation describing the floor. Finally, we discuss erence image. As we can see, in frame 610, owing to
the features representing the lying person, given the adaptation of the depth reference image, the door dis-
extracted equation of the floor. appears on the binary image and the person under-
Kwolek and Kepski Page 8 of 14

Figure 7 Extraction of depth reference image.

going monitoring is properly separated from the back- In the detection mode the foreground objects are ex-
ground. Having on regard that the images are acquired tracted through differencing the current image from
with 25 frames per second as well as the number of such a reference depth map. Subsequently, the fore-
frames that were needed to update of the depth refer- ground object is determined through extracting the
ence image, the time required for removing the moved largest connected component in the thresholded dif-
or moving objects in the scene is about six seconds. In ference map.
the binary image corresponding to the frame 810 we The images shown in the middle row of Fig. 8 are the
can see a chair, which has been previously moved, and raw depth images. As we already mentioned, the near-
which disappears in the binary image corresponding est neighbor-based interpolation is executed on the
to frame 1010. Once again, the update of the depth depth maps in order to fill the holes in the maps and
reference image has been achieved in about six sec- to get the maps with meaningful values for all pixels.
onds. As we can observe, the updated depth reference Thanks to such an interpolation the delineated persons
image allows us to extract the person’s silhouette in contain a smaller amount of artefacts.
the depth images. In order to eliminate small objects
the depth connected components were extracted. Af- 5.2 V-disparity based ground plane extraction
terwards, small artifacts were removed. Otherwise, the In [24] a method based on v-disparity maps between
depth images can be cleaned using morphological ero- two stereo images has been proposed to achieve reliable
sion. obstacle detection. Given a depth map provided by the

#210 410 610 810 1010

Figure 8 Delineation of person using depth reference image. RGB images (upper row), depth (middle row) and binary images
depicting the delineated person (bottom row).
Kwolek and Kepski Page 9 of 14

Kinect sensor, the disparity d can be determined in the noting that ordinary HT operating on thresholded v-
following manner: disparity images often gives incorrect results. For visu-
alization purposes the accumulator values were divided
by 1000. As we can see on Fig. 10, the highest peak
b·f of the accumulator is for a line with Θ approximately
d= (2)
z equal to zero degrees. This means that it corresponds
to a vertical line, i.e. line corresponding to the room
where z is the depth (in meters), b stands for the
horizontal baseline between the cameras (in meters), walls, see Fig. 9c. In order to simplify the extraction of
whereas f stands for the (common) focal length of the the peak corresponding to the floor, only the bottom
cameras (in pixels). The IR camera and the IR projec- half of the v-disparity maps is subjected to processing
tor form a stereo pair with a baseline of approximately by HT, see also Fig. 9c. Thanks to such an approach
b = 7.5 cm, whereas the focal length f is equal to 580 as well as executing the HT on a predefined range of Θ
pixels. and ρ, the line corresponding to floor can be estimated
Let H be a function of the disparities d such that reliably.
H(d) = Id . The Id is the v-disparity image and H
accumulates the pixels with the same disparity from
a given line of the disparity image. Thus, in the v-
disparity image each point in the line i represents the
number of points with the same disparity occurring in
the i-th line of the disparity image. Figure 9c illustrates
the v-disparity image that corresponds to the depth
image acquired by the Kinect sensor and depicted on
Fig. 9b.

a) b) c)
Figure 10 Accumulator of the Hough transform operating on
v-disparity values from the image shown on Fig. 9c.

Given the extracted line in such a way, the pixels


Figure 9 V-disparity map extracted on depth images from
Kinect sensor: RGB image a), corresponding depth image b), belonging to the floor areas were determined. Due to
v-disparity map c). the measurement inaccuracies, pixels falling into some
disparity extent dt were also considered as belonging to
The line corresponding to the floor pixels in the v- the ground. Assuming that dy is a disparity in the line
disparity map was extracted using the Hough trans- y, which represents the pixels belonging to the ground
form (HT). The Hough transform finds lines by a vot- plane, we take into account the disparities from the
ing that is carried out in a parameter space, from which range d ∈ (dy − dt , dy + dt ) as a representation of the
line candidates are obtained as local maxima in a so- ground plane. Given the line extracted by the Hough
called accumulator space. Assuming that the Kinect transform, the points on the v-disparity image with
is placed at height about 1 m from the floor, the line the corresponding depth pixels were selected, and then
representing the floor should begin in the disparities transformed to the point cloud.
ranging from 15 to 25 depending on the tilt angle of After the transformation of the pixels representing
the Kinect sensor. As we can observe on Fig. 9c the line the floor to the 3D points cloud, the plane described
corresponding to the floor begins at disparity equal to by the equation ax + by + cx + d = 0 was recovered.
twenty four. The parameters a, b, c and d were estimated using the
The line corresponding to the floor was extracted us- RANdom SAmple Consensus (RANSAC) algorithm.
ing HT operating o v-disparity values and a predefined RANSAC is an iterative algorithm for estimating the
range of the parameters. Figure 10 depicts the accu- parameters of a mathematical model from a set of ob-
mulator of the HT, that has been extracted on the served data, which contains outliers [25]. The distance
v-disparity image shown on Fig. 9c. The accumulator to the ground plane from the 3D centroid of points
was incremented by the v-disparity values. It is worth cloud corresponding to the segmented person was de-
Kwolek and Kepski Page 10 of 14

termined on the basis of the following equation: Given the delineated person in the depth image along
with the automatically extracted parameters of the
|aXc + bYc + cZc + d| equation describing the floor, the aforementioned fea-
D= √ (3)
a2 + b2 + c2 tures are easy to calculate.

where Xc , Yc , Zc stand for the coordinates of the per- 6 The classifier for fall detection
son’s centroid. The parameters should be re-estimated
subsequent to each change of the Kinect location or At the beginning of this Section we discuss the dataset
orientation. A relevant method for estimating 3D cam- that was recorded in order to extract the features for
era extrinsic parameters has been proposed in [26]. It training as well as evaluating of the classifier. After
operates on three sets of points, which are known to that, we overview the SVM-based classifier.
be orthogonal. These sets can either be identified us-
ing a user interface or by a semi-automatic plane fitting 6.1 The training dataset
method. A dataset consisting of images with normal activi-
ties like walking, sitting down, crouching down and
5.3 Depth features for person detection lying has been composed in order to train the classi-
The following features were extracted in a collection fier responsible for examination whether a person is
of the depth images in order to acknowledge the fall lying on the floor and to evaluate its detection per-
hypothesis, which is signaled by the threshold-based formance. In total 612 images were selected from UR
procedure: Fall Detection Dataset (URFD)[1] and another image
• h/w - a ratio of width to height of the person’s sequences, which were recorded in typical rooms, like
bounding box office, classroom, etc. The selected image set consists
• h/hmax - a ratio expressing the height of the per- of 402 images with typical ADLs, whereas 210 images
son’s surrounding box in the current frame to the depict a person lying on the floor. The aforementioned
physical height of the person depth images were utilized to extract the features dis-
• D - the distance of the person’s centroid to the cussed in Subsection 5.3. The whole UR Fall Detec-
tion dataset consists of 30 image sequences with 30
floor
falls. Two types of falls were performed by five per-
• max(σx , σz ) - standard deviation from the cen-
sons, namely from standing position and from sitting
troid for the abscissa and the applicate, respec-
tively. [1]
http://fenix.univ.rzeszow.pl/~mkepski/ds/uf.html

Figure 11 Multivariate classification scatter plot for features utilized for training of the fall classifier.
Kwolek and Kepski Page 11 of 14

on the chair. All RGB and depth images are synchro- a fall occurred, and thus it is the classifier’s ability to
nized with motion data, which were acquired by the identify a condition correctly.
x-IMU inertial device. The specificity is the number of true negative (TN)
Figure 11 depicts the scatter plot, in which a collec- decisions divided by the number of actual negative
tion of scatter plots is organized in a two-dimensional cases (number of true negatives plus number of false
matrix simultaneously to provide correlation informa- positives). It is the probability of non-fall, given that a
tion among the attributes. As we can observe, the non-fall ADL took place, and thus it shows how good
overlaps in the attribute space are not too significant. a classifier is at avoiding false alarms. The accuracy
Thus, a linear SVM was utilized for classifying lying is the number of correct decisions divided by the total
poses and typical ADLs. Although the non-linear SVM number of cases, i.e. the sum of true positives plus sum
has usually better effectiveness in classification of non- of true negatives divided by total instances in popu-
linear data than its linear counterpart, it has much lation. That is, the accuracy is the proportion of true
higher computational demands for prediction. results (both true positives and true negatives) in the
population. The precision or positive predictive value
6.2 Support Vector Machines (SVM) (PPV) is equal to true positives divided by sum of
The basic idea of the SVM classification is to find a true positives and false positives. Thus, it shows how
separating hyperplane that corresponds to the largest many of the positively classified falls were relevant. In
possible margin between the points of different classes Tab. 1 are shown results that were obtained in 10-fold
[27]. The optimal hyperplane for an SVM means cross-validation by the classifier responsible for the ly-
the one with the largest margin between the two ing pose detection and the aforementioned dataset. As
classes, so that the distance to the nearest data we can see, both specificity and precision are equal
point of both classes is maximized. Such a largest to 100%, i.e. the ability of the classifier to avoid false
margin means the maximal width of the tile paral- alarms and its exactness assume perfect values.
lel to the hyperplane that contains no interior data Table 2 shows results of experimental evaluation of
points and thus incorporating robustness into deci- the system for fall detection, which were obtained on
sion making process. Given a set of datapoints D: depth image sequences from URFD dataset. They were
n
D = {(xi , yi )|xi ∈ Rp , yi ∈ {−1, 1}}i=1 where each ex- obtained on thirty image/acceleration sequences with
ample xi is a point in p-dimensional space and yi is the falls and thirty image/acceleration sequences with typ-
corresponding class label, we search for vector ω ∈ Rp ical ADLs like sitting down, crouching down, picking-
and bias b ∈ R, forming the hyperplane H: ω T x+b = 0 up an object from the floor and lying on the sofa. The
that seperates both classes so that: yi (ω T xi + b) ≥ 1. number of images in the sequences with falls is equal
The optimization problem that needs to be solved is: to 3000, whereas the number of images with sequences
minω,b 21 ω T ω subject to: yi (ω T xi +b) ≥ 1. The problem with ADLs is equal to 9000. All images have corre-
consists in optimizing a quadratic function subject to
sponding motion data. In the case of incorrect response
linear constraints, and can be solved with an off-the-
of the system the remaining part of the sequence has
shelf Quadratic programming (QP) solver. The linear
been omitted. This means that the detection scores
SVM can perform prediction with p summations and
were determined on the basis of the number of the cor-
multiplications, and the classification time is indepen-
rectly/incorrectly classified sequences. As we can ob-
dent of the number of support vectors. We executed
serve, the Threshold UFT method [10] achieves good
LIBSVM software [28] on a PC computer to train the
results. The results obtained by SVM-classifier oper-
fall detector.
ating on only depth features are slightly worse than
results of Threshold UFT method. The reason is that
7 Experimental results the update of the depth reference image was realized
We evaluated the SVM-based classifier and compared without the support of the motion information. This
it with a k-NN classifier (5 neighbors). The classifiers means that a simplified system has been built using
were evaluated in 10-fold cross-validation. To exam- only blocks, which are indicated in Fig. 5 as numer-
ine the classification performances we calculated the als in circles. In particular, in such a configuration of
sensitivity, specificity, precision and classification ac- the system all images are processed in order to ex-
curacy. The sensitivity is the number of true positive tract the depth reference image. The algorithm using
(TP) responses divided by the number of actual pos- both motion data from accelerometer and depth maps
itive cases (number of true positives plus number of for verification of IMU-based alarms achieves the best
false negatives). It is the probability of fall, given that performance. Moreover, owing to the use of the IMU
Kwolek and Kepski Page 12 of 14

Table 1 Performance of lying pose classification.


True
Fall No Fall
Fall 208 0
SVM Accuracy=99.67%
No fall 2 402

Estimated
Precision=100%
Sens.=99.05% Spec.=100%
Fall 208 0
k-NN Accuracy=99.67%
No fall 2 402 Precision=100%
Sens.=99.05% Spec.=100%

device the computational efforts associated with the a) b) c)


detection of the person in the depth maps are much
smaller.
Five volunteers with age over 26 years attended in
evaluation of our developed algorithm and the embed- d) e) f)
ded system for fall detection. Intentional falls were per-
formed in an office by five persons towards a carpet
with thickness of about 2 cm. The x-IMU device was
worn near the pelvis. Each individual performed three
types of falls, namely forward, backward and lateral Figure 12 Color images (top row) and the corresponding
at least three times. Each individual performed also depth images (bottom row) acquired by Kinect in sunlight.
ADLs like walking, sitting, crouching down, leaning
down/picking up objects from the floor, as well as ly-
ing on a settee. All intentional falls have been detected
The system that was evaluated in such a way has
correctly. In particular, quick sitting down, which is
been implemented on PandaBoard-ES platform. In
not easily distinguishable ADL from an intentional fall
particular, we trained off-line the SVM classifier, and
when only an accelerometer or even both an accelerom-
then used the parameters that were obtained in such
eter and a gyroscope are used, has been classified as
a way in a fall predictor, executed on the Pand-
an ADL.
aBoard. Prior the implementation of the system on
It is well known that the precision of Kinect measure-
the PandaBoard we compared processor performances
ments decreases in strong sunlight. In order to investi- using Dhrystone 2 and Double-Precision Whetstone
gate the influence of sunlight on the performance of fall benchmarks. Our experimental results show that the
detection we analyzed the person extraction in depth Dhrystone 2 score on Intel i7-3610QM 2.30 GHz and
maps acquired in strong sunlight, see sample images PandaBoard ES OMAP4460 is equal to 37423845 and
on Fig. 12. As noted in [29], the in-home depth mea- 4214871 [lps], respectively, whereas Double-Precision
surements on a person being in sunlight, i.e. in sunlight Whetstone is equal to 4373 and 836 [MWIPS], respec-
that passes a closed window can be made with limited tively. This means that PandaBoard offers consider-
complications. Some body parts of a person in sun- able computational power. Finally, the whole system
light may not return measurements, see images d-e in was implemented and evaluated on PandaBoard. The
Fig. 12. As we can see, the measurements in image f) code profiler reported about 50% CPU usage by the
are better in comparison to the measurements shown module responsible for update of the depth reference
in image e) due to smaller sunlight intensity. In order map.
to assess the influence of such strong sunlight on the
performance of fall detection we calculated the per-
son depth features in a collection of such depths maps
8 Conclusions
with missing measurements. As expected, the change In this paper we demonstrated an embedded system for
of the features in comparison to the features extracted reliable fall detection with very low false alarm. The
on manually segmented persons is not significant, i.e. detection of the fall is done on the basis of acceleromet-
within several percent. Such change of the values of ric data and depth maps. A tri-axial accelerometer is
the features does not degrade the high performance of used to indicate the potential fall as well as to indicate
fall detection given the algorithms used here. if the person is in motion. If the measured acceleration
Kwolek and Kepski Page 13 of 14

Table 2 Performance of fall detection.


Method
SVM - depth only SVM - depth + acc. Threshold UFT [10] Threshold LFT [10]
Accuracy 90.00% 98.33% 95.00% 86.67%
Results

Precision 83.30% 96.77% 90.91% 82.35%


Sensitivity 100.00% 100.00% 100.00% 93.33%
Specificity 80.00% 96.67% 90.00% 80.00%

is higher than an assumed threshold value, the algo- 10. Bourke, A.K., O’Brien, J.V., Lyons, G.M.: Evaluation of a
rithm extracts the person, calculates the features and threshold-based tri-axial accelerometer fall detection algorithm. Gait &
Posture 26(2), 194–199 (2007)
then executes the SVM-based classifier to authenticate 11. Kangas, M., Konttila, A., Lindgren, P., Winblad, I., Jamsa, T.:
the fall alarm. We demonstrate that person surround- Comparison of low-complexity fall detection algorithms for body
ing features together with the distance between the attached accelerometers. Gait & Posture 28(2), 285–291 (2008)
12. Bourke, A.K., Lyons, G.M.: A threshold-based fall-detection algorithm
person center of gravity and floor lead to reliable fall using a bi-axial gyroscope sensor. Medical Engineering & Physics
detection. The parameters of the floor equation are 30(1), 84–90 (2008)
determined automatically. The extraction of the per- 13. Degen, T., Jaeckel, H., Rufer, M., Wyss, S.: SPEEDY: A fall detector
in a wrist watch. In: Proc. of the 7th IEEE Int. Symp. on Wearable
son is only executed if the accelerometer indicates that Comp., p. 184. IEEE Computer Society, Washington, DC, USA (2003)
he/she is in motion. The person is extracted through 14. Sposaro, F., Tyson, G.: ifall: An Android application for fall monitoring
the differencing the current depth map from the on-line and response. In: IEEE Int. Conf. on Engineering in Medicine and
updated depth reference map. The system permits un- Biology Society, pp. 6119–6122 (2009)
15. Anderson, D., Keller, J.M., Skubic, M., Chen, X., He, Z.: Recognizing
obtrusive fall detection as well as preserves privacy of falls from silhouettes. In: Annual Int. Conf. of the Engineering in
the user. However, a limitation of the Kinect sensor Medicine and Biology Society, pp. 6388–6391 (2006)
is that sunlight interferes with the pattern-projecting 16. Rougier, C., Meunier, J., St-Arnaud, A., Rousseau, J.: Monocular 3D
head tracking to detect falls of elderly people. In: Annual Int. Conf. of
laser, so the proposed fall detection system is most the IEEE Eng. in Medicine and Biology Society, pp. 6384–6387 (2006)
suitable for indoor use. 17. Cucchiara, R., Prati, A., Vezzani, R.: A multi-camera vision system for
fall detection and alarm generation. Expert Syst. 24(5), 334–345
(2007)
Acknowledgements 18. Miaou, S.-G., Sung, P.-H., Huang, C.-Y.: A customized human fall
This work has been supported by the National Science Centre (NCN) detection system using omni-camera images and personal information.
within the project N N516 483240. Distributed Diagnosis and Home Healthcare, 39–42 (2006)
Author details 19. Jansen, B., Deklerck, R.: Context aware inactivity recognition for
1
AGH University of Science and Technology, 30 Mickiewicza Av., 30-059 visual fall detection. In: Proc. IEEE Pervasive Health Conference and
Kraków, Poland. 2 University of Rzeszow, 16c Rejtana Av., 35-959 Workshops, pp. 1–4 (2006)
Rzeszów, Poland. 20. Kepski, M., Kwolek, B., Austvoll, I.: Fuzzy inference-based reliable fall
detection using Kinect and accelerometer. In: The 11th Int. Conf. on
References Artificial Intelligence and Soft Computing. LNCS, vol. 7267, Springer,
1. Chan, M., Esteve, D., Escriba, C., Campo, E.: A review of smart pp. 266–273 (2012)
homes - present state and future challenges. Computer Methods and 21. Kepski, M., Kwolek, B.: Fall detection on embedded platform using
Programs in Biomedicine 91(1), 55–81 (2008) Kinect and wireless accelerometer. In: 13th Int. Conf. on Computers
2. Cook, A.M.: The future of assistive technologies: A time of promise Helping People with Special Needs. LNCS, vol. 7383, Springer, pp.
and apprehension. In: Proc. of the 12th Int. ACM SIGACCESS Conf. 407–414 (2012)
on Comp. and Accessibility, pp. 1–2. ACM, New York, USA (2010) 22. Mastorakis, G., Makris, D.: Fall detection system using Kinect’s
3. Hoflinger, F., Muller, J., Zhang, R., Reindl, L.M., Burgard, W.: A infrared sensor. J. of Real-Time Image Processing, 1–12 (2012)
wireless micro inertial measurement unit (imu). Instrumentation and 23. 3D Orientation Sensor IMU. http://www.test.org/doe/
Measurement, IEEE Transactions on 62(9), 2583–2595 (2013) 24. Labayrade, R., Aubert, D., Tarel, J.-P.: Real time obstacle detection in
4. Buesching, F., Kulau, U., Gietzelt, M., Wolf, L.: Comparison and stereovision on non flat road geometry through ”v-disparity”
validation of capacitive accelerometers for health care applications. representation. In: Intelligent Vehicle Symposium, 2002. IEEE, vol. 2,
Comp. Methods and Programs in Biomedicine 106(2), 79–88 (2012) pp. 646–6512 (2002)
5. Heinrich, S., Rapp, K., Rissmann, U., Becker, C., Kőnig, H.-H.: Cost 25. Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm
of falls in old age: a systematic review. Osteoporosis Int. 21, 891–902 for model fitting with applications to image analysis and automated
(2010) cartography. Commun. ACM 24(6), 381–395 (1981)
6. Noury, N., Fleury, A., Rumeau, P., Bourke, A.K., Laighin, G.O., Rialle, 26. Deklerck, R., Jansen, B., Yao, X.L., Cornelis, J.: Automated estimation
V., Lundy, J.E.: Fall detection - principles and methods. In: IEEE Int. of 3d camera extrinsic parameters for the monitoring of physical
Conf. on Eng. in Medicine and Biology Society, pp. 1663–1666 (2007) activity of elderly patients. In: XII Mediterranean Conference on
7. Yu, X.: Approaches and principles of fall detection for elderly and Medical and Biological Engineering and Computing. IFMBE
patient. In: 10th Int. Conf. on E-health Networking, Applications and Proceedings, vol. 29, pp. 699–702 (2010)
Services, pp. 42–47 (2008) 27. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3),
8. Igual, R., Medrano, C., Plaza, I.: Challenges, issues and trends in fall 273–297 (1995)
detection systems. BioMedical Engineering OnLine 12(1), 1–24 (2013) 28. Chang, C.-C., Lin, C.-J.: Libsvm: A library for Support Vector
9. Mubashir, M., Shao, L., Seed, L.: A survey on fall detection: Principles Machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)
and approaches. Neurocomputing 100, 144–152 (2013)
Kwolek and Kepski Page 14 of 14

29. Stone, E.E., Skubic, M.: Unobtrusive, continuous, in-home gait Engineering 60(10), 2925–2932 (2013)
measurement using the microsoft kinect. IEEE Trans. on Biomedical

You might also like