1 s2.0 S0007850623000951 Main
1 s2.0 S0007850623000951 Main
1 s2.0 S0007850623000951 Main
A R T I C L E I N F O A B S T R A C T
Article history: Human-Robot Collaboration (HRC) has played a pivotal role in today’s human-centric smart manufacturing sce-
Available online 15 April 2023 narios. Nevertheless, limited concerns have been given to HRC uncertainties. By integrating both human and
artificial intelligence, this paper proposes a Collaborative Intelligence (CI)-based approach for handling three
Keywords: major types of HRC uncertainties (i.e., human, robot and task uncertainties). A fine-grained human digital twin
Human-robot collaboration
modelling method is introduced to address human uncertainties with better robotic assistance. Meanwhile, a
Manufacturing system
Collaborative intelligence
learning from demonstration approach is offered to handle robotic task uncertainties with human intelligence.
Lastly, the feasibility of the proposed CI has been demonstrated in an illustrative HRC assembly task.
© 2023 The Authors. Published by Elsevier Ltd on behalf of CIRP. This is an open access article under the CC BY
license (http://creativecommons.org/licenses/by/4.0/)
https://doi.org/10.1016/j.cirp.2023.04.057
0007-8506/© 2023 The Authors. Published by Elsevier Ltd on behalf of CIRP. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)
2 P. Zheng et al. / CIRP Annals - Manufacturing Technology 72 (2023) 14
Based on the adaptive HRC task planning result, the ability to con-
stantly perceive and model the human body is also essential, which
can provide necessary information for the collaborative robot to fur-
ther cope with human related uncertainties during HRC. Previous
endeavours in the HRC literature have been devoted to perceiving
human body skeleton for active collision avoidance [11] or recognis-
ing human action intention [12] for robotic decision-makings. How-
ever, they can only model the human operator in a relatively coarse
manner with insufficient representation fidelity or limited recogni-
tion accuracy. To this end, a vision-based fine-grained HDT modelling
scheme of a human operator is proposed and depicted in Fig. 3, which
mainly consists of two parts: 1) fine-grained human pose reconstruc-
Fig. 1. The proposed CI-based approach for handling HRC uncertainties.
tion, and 2) spatial-temporal human behaviour intention recognition.
2.1. Dynamic graph embedding-based adaptive HRC task planning Fine-grained human pose reconstruction. In the first part of the
HDT, a deep learning model that can simultaneously reconstruct the fine-
Adaptive task planning algorithms free existing HRC systems from grained 3D dense mesh and skeleton joints of the human body is pro-
predefined instructions, for enhanced adaptability in various posed. Concretely, the RGB-D images of the human operator will be proc-
manufacturing scenarios. Our previous works have introduced the scene essed by a ResNet-50 backbone network to extract the geometric
graph (SG) and KG approaches [4] to generate task planning strategies features, which is then utilised to regress the pose parameters
for HRC systems. Meanwhile, Raatz et al. [9] utilised a genetic algorithm ubody 2 R3K , shape parameters bbody 2 RM , 3D rotation Rbody 2 R33 , and
to optimise task scheduling based on capabilities and time assumptions 3D translation Tbody 2 R3 . The pose and shape parameters are subse-
in HRC. However, these methods focus on distilling temporal knowledge quently sent to the SMPL (Skinned Multi-Person Linear model) human
representation of HRC configurations, while ignoring updating the infor- body model a differentiable function that outputs a triangulated mesh
mation to next-stage task arrangement. Mðu; bÞ 2 R3N . The adoption of the SMPL model can largely simplify the
With perceptual results, a DynGraphGAN model [10] is leveraged to reconstruction process to achieve real-time performance by relying on a
temporally construct and update HRC KG for on-demand task allocation, template human body mesh as a priori which will be bended and
as shown in Fig. 2. Considering input of human behaviours, detected stretched to the target human pose according to the estimated pose and
objects and task structures, there are numerous possible configurations shape parameters. The predicted global 3D rotation and translation will
of their relations. Firstly, a generator generates adjacency matrices to then be applied to obtain the correctly posed human body mesh. The 3D
represent the relation evolution between Human-Robot-Task-Work- skeleton points Xðu; bÞ 2 R3K are further obtained by linear regression
piece-Environment (HRTWE) nodes over time. The updated HRTWE from the final mesh vertices. To further refine hand pose reconstruction
connections may introduce fake edges. Thus, a continuous discriminator of human operators, the ResNet-50 backbone will also regress hand
is leveraged to distinguish the authentic and fake links between HRTWE parameters including pose parameters uhand 2 R3K , shape parameters
nodes via Gated Recurrent Unit (GRU) algorithm. Optimised operation bhand 2 RM , and 3D translation Thand 2 R3 , which will be processed by the
MANO (hand Model with Articulated and Non-rigid defOrmations) to
enhance the SMPL model and to facilitate the human-assisted robot LfD.
Once the fine-grained human mesh is reconstructed, it can represent the
precise geometric occupancy of the human body, which can substantially
reduce the perception error of human body during HRC.
Spatial-temporal human behaviour intention recognition. The
spatial-temporal human behaviour intention estimation amounts to a
higher semantic level, which is essential to complete a holistic HDT in
HRC scenarios. In this module, the RGB video stream and associated
skeleton stream are regarded as the input data source. For the RGB
stream, a 3D ResNet-50 will be used to process the spatial-temporal fea-
tures in a unified convolution structure and extract the image feature.
Fig. 2. Dynamic graph embedding-based adaptive HRC task planning. Meanwhile, the skeleton stream will be split into 4 branches including
P. Zheng et al. / CIRP Annals - Manufacturing Technology 72 (2023) 14 3
head, trunk, arm, and leg branches, each of which will be processed by Algorithm 1
an ST-GCN (Spatial Temporal Graph Convolutional Network) model. Pseudo code of Dagger algorithm.
After the spatial-temporal feature extraction of the local body parts, an Input: Original Dataset D
aggregator network will be utilised to fuse them into the global skeleton Output: Optimal updated policy p ^ update , Aggregated dataset Dagg
feature, which will be fused with the extracted image feature to discrim- Initialisation: Original policy po
1 pi¼1 po
inate the type of current human behaviour and if it is abnormal. Since 2 For episode i = 1, 2, . . ., T do
the model is exclusively trained on normal behaviour data, it will only 3 Sample T step trajectories with trained policy pi
4 Generate/Demonstrate dataset Di ¼ f s; p ðsÞg via expert p for unsolved case trial
be able to provide a random guess with extremely low confidence score (determined by expert) by pi
for an unseen abnormal behaviour sequence, of which setting up a tol- 5 Aggregate the Dataset D D [ Di
erance on the confidence scores can eliminate any wrong detection. For 6 Re-train the control policy function piþ1 via behavioural cloning
7 End For
normal human behaviours, the recognised action intention will be 8 Return optimal policy pi determined by evaluating the success rate of task
passed to robots to make action and motion planning ahead of time,
while for abnormal behaviours that cannot be properly parsed by the However, owing to the diversity of samples, the effectiveness of LfD is
system, a warning will be signalled to notify human for behaviour cor- limited by the number and variance amongst expert demonstrations.
rection or optionally trigger the LfD protocol. Therefore, the robot policy that trained by BC algorithm still lacks the
flexibility and adaptability towards newly occurred task or robot uncer-
2.3. Human-assisting-robot via LfD tainties. To address that, an online learning approach with dataset aggre-
Considering robot and task uncertainties, a human operator can gation (Dagger) mechanism is introduced in the LfD process, whose
potentially transfer one’s experience to robot manipulation skills for workflow is shown in Algorithm 1. With the Dagger mechanism, not
flexible and adaptive task execution via the LfD approach. only existing uncertainties but also similar but new ones can be
Human-in-the-loop robot control. To better transfer human addressed. Thereby, the robot can better resolve the dynamic
experts’ engineering practice, the hand pose extracted from HDT is manufacturing tasks with the help of expert intelligence more efficiently.
introduced to implement a seamless hand gesture-enabled robot
control system. As shown in Fig. 4, the method extracts the worker’s 2.4. Robot-assisting-human via DRL
hand poses and corresponds them to the robot end-effector poses, Considering abnormal behaviours caused by human uncertainties,
which could intuitively enable the robot to mimic the hand move- a robot may dynamically re-plan its motion to complete assigned
ment. The worker’s left-hand pose can be accurately extracted via tasks from HRC KG, and also to ensure human safety. A DRL-based
his palm location/orientation and transformed into the end-effector approach is introduced in this work to achieve human uncertainty-
pose of the robot accordingly. aware robot control for safe and adaptive HRC.
The human uncertainty factors detected from HDT are embedded
in the full-body skeleton position, abnormal behaviours warnings,
and intentions. In the implementation, the DRL-approach leverages
uncertainty factors, motion planning success rate, and safe con-
straints in HRC scenarios as the optimisation indexes. Following that,
the robot motion planning process can be formulated as a Markov
Decision Process to optimise the control policy p and reinforce the
Fig. 4. Examples of mapping between human hand and robot end-effector. robot to choose an action at 2 A in the state st 2 S with the largest
cumulated reward. The DRL settings are shown as follows:
Robot learn from demonstration & dataset aggregation. In
Observation state (O) is a state representation of the human-
addition to explicit imitation, the robot movement trajectories dem-
robot working scene, which consists of the human data extracted
onstrated by the experts are recorded as datasets and fed into the LfD
from the abovementioned HDT, including whole-body skeleton posi-
algorithm, a promising learning approach for empowering the skills
tion (P), abnormal behaviours warnings (B), human intentions (I),
of robots. With the dataset, the workers’ patterns could be implicitly
and also robot own state (M). The information can be concatenated in
extracted and drive robots to learn the uncertainties-oriented control
a state vector, O ¼ ðP; B; I; MÞ.
policy for subsequent adaptive robot programs.
Action space (A) refers to the reachability of the robot. In the
In this work, to approximate the control policy function pu ðajsÞ, an LfD
experiment, the inverse kinematics is combined to transform the
approach adopted is Behavioural Cloning (BC) algorithm. To fit into the
robot’s joint space into the three-dimensional spatial coordinates of
LfD algorithm settings, the data trajectories i.e., t 1 ; t 2 ; . . . ; t m consist of
the end effector A ¼ ðX; Y; ZÞ 2 R3 and enable DRL algorithm to
the environment observation si1 and robot motion action ai1 . The element
explore the feasible trajectory.
of s includes the task information, working environment, task conditions,
Reward (R) combines multiple safe motion planning tolerance
human information etc., and a is the robot movement demonstrated by
settings, including safety (e.g., human-robot distance 10 mm), and
the human-expert regarding the cases. Each demonstration of the dataset
task completion index evaluation (e.g., execution time 30 s), task
is represented as t i ¼ h si1 ; ai1 ; si2 ; ai2 ; . . . ; sinþ1 i and the whole dataset D
completion progress and success rate (e.g., target reaching deviation
is denoted as D ¼ fðS1 ; A1 Þ; ðS2 ; A2 Þ; ðS3 ; A3 Þ:::g. Essentially, the BC learn-
1 mm), which is denoted as R= (Rs,Rt).
ing is to approximate the maximum likelihood estimation of policy func-
Meanwhile, the actor-critic is employed to learn and control the
tion, which could minimise the difference between the model-generated
corresponding actions, which maximises the expected return JðuÞ and
state-action trajectory probability distribution (robot control policy) and
optimises the moving path in terms of safety:
the input trajectory probability distribution (human-expert policy):
" #
maxu Eðs;aÞ » D ½pu ðajsÞ ð1Þ XT
t
J ðuÞ ¼ Et » pu ðtÞ g rt ð2Þ
X t¼0
Q
s:t: b u ðajsÞ ¼ 1; 8 s 2 S
p where pu ðtÞ ¼ pðs0 Þ T1 t¼0 ½ðstþ1 jst ; at Þpu ðat jst Þ is the probability distri-
a2A
bution over all possible state-action trajectories t ¼ ðs0 ; a0 ; s1 ; . . . ;
During the parameter optimisation process of the maximum like- aT1 ; sT Þ; g t 2 ½0; 1 is the discount factor at time t, and du ðst Þ is the
lihood estimation, a policy pu ðsÞ is trained to minimise the difference state distribution under the policy up .
between robot behaviour patterns and human demonstration with
the objective function Eðs;aÞ » D k pu ðsÞ a k 2 . In practice, pu ðsÞ 3. Case study
approximates the expert policy by using the deep neural network, To illustrate the performance of the HRC system when tackling
which is optimised via the gradient descent with an aim to gain the various uncertainties, comparative experiments are conducted on
optimal robot control policy function. some HRC assembly tasks in our lab environment, containing visual
4 P. Zheng et al. / CIRP Annals - Manufacturing Technology 72 (2023) 14
sensors (Azure Kinect and Intel D435), a GPU server (RTX 3080), one Table 3
human operator and collaborative robot (UR5). Firstly, the effective- Experimental results of the DRL-based safe motion planning.
ness of the proposed HDT model is evaluated with other baseline
Overlapped distance Success rate
methods for finding human abnormal behaviours and assisting No overlapping 98.7%
robots. Then, the LfD and DRL control policies are demonstrated and 015 cm (Blue) 91.7%
assessed in several human-assisting-robot and robot-assisting-human 1530 cm (Red) 85.8%
uncertain tasks, while robot uncertainties are handled manually by
emergency stop and human inspection.
4. Conclusions and future work
3.1. Fine-grained HDT modelling for accurate robot assistance
The performance evaluation of the proposed HDT model mainly To ensure the successful implementation of HRC activities, this
consists of two parts: (1) human behaviour recognition accuracy and research proposed a systematic CI-based approach to handling the
(2) 3D human pose reconstruction error. For the prior one, RGB-D data human, task, and robot uncertainties integrally. The main scientific
was collected via an Azure Kinect camera capturing the HRC activities contributions of this work include: 1) the proposed dynamic graph
including 5 types of human-centric HRC subtasks: (1) dismantling, (2) embedding-based adaptive HRC task planning approach, 2) a novel
part picking, (3) robot handover, (4) robot guiding, and (5) robot stop- vision-based HDT modelling method for handling human uncertain-
ping. After trimming and cleaning the raw data, a total of 939 valid ties, and 3) the introduced LfD and DRL approaches for handling task/
action clips remained for evaluation purpose, of which 751 were used robot uncertainties and human ones, respectively. The performance
for training and 188 for testing. As for the evaluation of the fine- of the proposed CI has been reported in handling several HRC assem-
grained human pose reconstruction, we utilise MPJPE (Mean Per Joint bly task uncertainties with preliminary experimental results. In the
Position Error) metric to evaluate body and hand posture reconstruc- future, both multimodal intelligence- based HDT and advanced robot
tion errors, respectively. The comparative results are listed in Table 1, learning mechanisms can be explored to tackle multiple HRC scenar-
which clearly shows that our proposed HDT modelling scheme per- ios with uncertainties.
forms better on both behaviour recognition and human pose recon-
struction compared with previous approaches which only targeted at
a sole recognition task. The evaluation results demonstrated that the Declaration of Competing Interest
proposed HDT model is capable of conducting subsequent robot assis-
tance action planning based on human activity recognition. The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to influ-
Table 1 ence the work reported in this paper.
Comparative experimental results on the collected HRC data.
References