Feedback MPC For Torque-Controlled Legged Robots

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Macau, China, November 4-8, 2019
Feedback MPC for Torque-Controlled Legged Robots

Ruben Grandia1 , Farbod Farshidian1 , René Ranftl2 , Marco Hutter1
Abstract— The computational power of mobile robots is

currently insufficient to achieve torque level whole-body Model
Predictive Control (MPC) at the update rates required for
complex dynamic systems such as legged robots. This problem
is commonly circumvented by using a fast tracking controller
to compensate for model errors between updates. In this
work, we show that the feedback policy from a Differential
Dynamic Programming (DDP) based MPC algorithm is a
viable alternative to bridge the gap between the low MPC
update rate and the actuation command rate. We propose to
augment the DDP approach with a relaxed barrier function
to address inequality constraints arising from the friction
cone. A frequency-dependent cost function is used to reduce
the sensitivity to high-frequency model errors and actuator Fig. 1. ANYmal, the torque controlled-legged robot used in this work.
bandwidth limits. We demonstrate that our approach can find
stable locomotion policies for the torque-controlled quadruped, tracks arbitrary motions while satisfying the many constraints
ANYmal, both in simulation and on hardware. arising from the locomotion task. The fundamental problem
is that such motion trackers do not look ahead in the
I. I NTRODUCTION
horizon and therefore cannot anticipate changes in contact
Model Predictive Control (MPC) has gained broad interest configuration. As an alternative, projected (time-varying)
in the robotics community as a tool for motion control of Linear Quadratic Regulators (LQR) have been proposed as
complex and dynamic systems. The ability to deal with a framework to automatically design feedback controllers
nonlinearities and constraints has popularized the technique around a given reference trajectory [7], [8]. However, the
for many robotic applications, such as quadrotor control [1], stabilizing feedback policy is always designed in a secondary
autonomous racing [2], and legged locomotion [3], [4], [5]. stage and often with a different objective function than the
MPC strategies typically optimize an open-loop control one used for computing the optimal trajectories, which leads
sequence for a given cost function over a fixed time horizon. to inconsistency between the feedback policy and the MPC
The control sequence is then executed until a new control trajectories.
update is calculated based on the current state estimate. In this work, we propose a feedback MPC approach for
While this strategy assumes that the model is exact and that motion control of a legged system and show that the opti-
there are no external disturbances, the repeated optimization mized feedback policy can directly be deployed on hardware.
provides a feedback mechanism that can correct for mod- We achieve stable locomotion under a very low update
eling errors provided that the control loop can be executed rate (15 Hz), and the optimized feedback policy removes
at a sufficiently high rate. However, for high dimensional the need for a separate motion controller. Furthermore, the
systems such as legged robots and due to the computational modification of the control inputs is consistent with the
restrictions of mobile platforms, the achievable update rate MPC and thus produces a continuous signal across update
of the MPC loop is insufficient to effectively deal with model instances.
uncertainty and external disturbances. To be able to directly apply the feedback strategy on a
As a remedy, a separately designed, light-weight motion legged system, the optimized policy needs to respect all
tracker is often used in practice [6]. The motion tracker the constraints of the locomotion task such as friction and
runs at a higher rate than the MPC loop and provides unilateral constraints of contact forces. To achieve this, we
feedback correction to the control sequence that was de- propose to extend the SLQ (Sequential Linear Quadratic)
signed by the MPC. For complex systems such as legged algorithm [9]. We use SLQ in a real-time iteration MPC
robots it is a challenging task to design a controller that scheme [10] where the algorithm optimizes a constrained
This research was supported by Intel Network on Intelligent Systems,
feedback policy, π(x, t) : X × R+ → U,
the Swiss National Science Foundation through the National Centre of
Competence in Research Robotics (NCCR Robotics), the European Unions π(x, t) = u∗ (t) + K(t) (x − x∗ (t)) , (1)
Horizon 2020 research and innovation programme under grant agreement
No 780883. This work has been conducted as part of ANYmal Research, a where u∗ (t) ∈ U and x∗ (t) ∈ X are locally optimal input and
community to advance legged robotics. state trajectories. K(t) is a time-varying LQR gain matrix
1 First, second, and last authors are with Robotic Systems Lab, ETH
Zurich, Switzerland rgrandia@ethz.ch which maps the state deviation from x∗ to an admissible
2 Third author is with Intel Labs, Munich, Germany. control correction. We extend the algorithm to problems
978-1-7281-4004-9/19/$31.00 ©2019 IEEE 4730

Authorized licensed use limited to: Yonsei Univ. Downloaded on July 03,2024 at 08:47:55 UTC from IEEE Xplore. Restrictions apply.
with inequality constraints using a barrier function method limitations in our optimization problem. In this work, we use
to accurately handle the constraints arising from the friction the frequency-aware MPC approach introduced in [11]. This
cone. We further use the frequency-aware MPC approach MPC formulation penalizes control actions in the frequency
introduced in [11] to render the resulting feedback policy domain and automatically finds a trade-off between the
robust to the bandwidth limitations imposed by real actuators. bandwidth limitation of actuators and the stiffness of the
We perform experiments in simulation and on a real legged high-level feedback policy.
system (Fig. 1) and demonstrate that our approach is able to
B. Contributions
find robust and stable locomotion polices at MPC update
rates as low as 15 Hz, which facilitates onboard execution We propose a whole-body MPC approach for legged
on mobile platforms with limited computational power. robots, where the actuation commands are computed directly
based on the MPC feedback policy. Specifically we present
A. Related Work the following contributions which we empirically validate
Methods to incorporate robustness explicitly into the MPC on the ANYmal platform (Fig. 1) in simulation and on real
methodology have been previously studied in the litera- hardware:
ture [12]. Min-max MPC [13], for example, optimizes an • We propose to apply feedback MPC for whole-body
open-loop control sequence for the worst-case disturbance control of a legged system. To the best of our knowl-
inside a predefined set. While this formulation appears edge, this is the first time that such a control scheme
attractive, it can be overly conservative due to its inability is applied on hardware for motion control of legged
to include the notion of feedback that is inherently present robots.
in the receding-horizon implementation of the control [14]. • The SLQ algorithm is extended to include inequality
Min-max Feedback MPC was proposed to address this constraints through a barrier function method, which
shortcoming by planning over a state-dependent control allows us to formulate friction cone constraints.
policy instead of an open-loop feedforward sequence [15]. • We show that our feedback MPC algorithm directly
Unfortunately optimizing the feedback policy for all possible designs constraint-satisfactory LQR gains without ad-
disturbance realizations does not yet scale to the problem ditional computational cost.
dimensions encountered in legged robotics. However, even • A frequency domain design approach is used to incorpo-
without considering disturbances, optimizing over the feed- rate actuation bandwidth limits in the MPC formulation
back policy has an additional advantage. When the update to avoid rendering stiff gains. Thus, the feedback gains
rate of the MPC loop is low, the feedback policy can provide can be directly applied to the robot.
local correction to the deviation of the real platform from • We show that the feedback MPC algorithm is capable
the optimal trajectories. Exploiting this additional aspect of of bridging the gap between low update-rate MPC and
feedback MPC has not yet been fully explored in robotic high rate execution of torque commands using only an
applications that are subject to path constraints. onboard computer with moderate computational power.
The feedback policy that minimizes a cost function for
II. M ETHOD
a given dynamical system and path constraints can be
computed using the Hamilton-Jacobi-Bellman (HJB) equa- A. Problem Definition
tion [16]. While directly solving this equation for high Consider the following nonlinear optimal control problem
dimensional systems is prohibitively complex, a variant of with cost functional
the dynamic programming approach known as Differential Z T
Dynamic Programming (DDP) [17] has proven to be a pow- min Φ(x(T )) + L(x(t), u(t), t) dt, (2)
u(·) 0
erful tool in many practical applications. The SLQ method
that we use in this work is a DDP-based approach which where x(t) is the state and u(t) is the input at time t. L(·)
uses a Gauss-Newton approximation. Consequently, it only is a time-varying running cost, and Φ(·) is the cost at the
considers the linearized dynamics instead of a second-order terminal state x(T ). Our goal is to find an input trajectory
approximation. u(·) that minimizes this cost subject to the following system
Although using the LQR gains derived from a DDP-based dynamics, initial condition, and general equality and inequal-
approach directly for motion tracking generates promising ity constraints:
results in simulation, it dramatically fails on real hardware.
ẋ = f (x, u, t) (3)
This phenomenon has been reported before in other real-
world applications of LQR on torque-controlled robots [8], x(0) = x0 (4)
[18]. Focchi et al. [19] have shown that instability can g1 (x, u, t) = 0 (5)
occur if the limitations of the low-level torque controller g2 (x, t) = 0 (6)
are neglected in the high-level control design. They have
h(x, u, t) ≥ 0. (7)
argued that the bandwidth of the low-level controller in-
versely relates to the achievable impedance of the high- The feedback policy which minimizes this problem can
level controller. To this end, to apply the SLQ feedback be calculated using a DDP-based method. A variant of
policy on hardware, we need to encode these bandwidth this method for continuous-time systems known as SLQ is
4731
introduced in [9], where it solves the above optimization
problem in the absence of the inequality constraints in
equation (7). This method computes a time-varying, state-
affine control policy based on a quadratic approximation of
the optimal value function in an iterative process. The SLQ
approach uses a Lagrangian method to enforce the state-input
equality constraints in (5). The pure state constraints in (6)
are handled by a penalty method.
In SLQ, the simulation (forward pass) and the optimization
(backward pass) iterations alternate. Once the backward pass Fig. 2. Comparison of the log-barrier function Blog = − ln(h) and relaxed
is completed, a forward pass computes a new trajectory based barrier function Brel as defined in (9) with δ = 5.
on the improved feedback policy. The local, Linear Quadratic
value in Fig. 2. The quadratic extension puts an upper bound
(LQ), approximation of the nonlinear optimal control prob-
to the curvature of the barrier function, which prevents ill-
lem is constructed after each forward pass. The LQ model
conditioning of the LQ approximation. Note that by letting
permits an efficient solution of the approximate problem by
δ → 0, the standard logarithmic barrier is retrieved. Further-
solving the Riccati differential equation. The feedback policy
more, it has been shown that the optimal solution can be
is then updated with an appropriate linesearch procedure in
obtained for a nonzero value of δ [22], when the gradient of
the direction of the LQ problem’s solution.
the penalty term is larger than the Lagrange multiplier of the
We follow the same SLQ approach and extend the method
associated constraint. Optimization with the relaxed barrier
with inequality constraints through a relaxed barrier function
function can thus be interpreted as an augmented Lagrangian
approach.
approach when h < δ and as a log-barrier method for h ≥ δ.
B. Relaxed Barrier Functions C. LQ Approximation
Using a barrier function is a well know technique to With the inequality constraints embedded in the cost
absorb inequality constraint into the cost function. For each function, we obtain the following linearization of the system
constraint in a given set of Nin inequality constraints, a dynamics in (3) and state-inputs constraints in (5) for a given
barrier term B(h) is added to the cost state trajectory xk−1 (t) and input trajectory uk−1 (t):
Nin
X δ ẋ = Aδx + Bδu, (11)
L̂(x, u, t) = L(x, u, t) + µ B(hi (x, u, t)). (8)
i=1 Cδx + Dδu + e = 0, (12)
A widely used barrier function is the logarithmic barrier where δx = x(t) − xk−1 (t), and δu = u(t) − uk−1 (t) are
used in interior-point methods. The optimal solution is deviations from the previous iteration, around which the LQ
approached by letting µ → 0 over successive iterations. approximation is made. Note that the time-dependency of the
However, a downside of the log-barrier is that it is only matrices was dropped to shorten the notation. The quadratic
defined over the feasible space, and evaluates to infinity approximation of the cost in (8) is given by
outside. Due to the rollout mechanism in the SLQ approach, 1
one cannot ensure that successive iterations remain inside Φ(x(T )) ≈ qf + qTf δx + δxT QTf δx, (13)
2
the feasible region at all time. Furthermore, the Hessian
L̂(x, u, t) ≈ qL (t) + qTL δx + rT δu+
of the log-barrier goes to infinity as one approaches the
1 T 1
constraint boundary, which results in an ill-conditioned LQ δx QL δx + δuT Rδu + δuT Pδx, (14)
approximation. 2 2
The relaxed barrier functions previously proposed for which requires access to the second-order approximation of
MPC problems addresses both these issues [20] and is the barrier term and inequality constraints.
therefore particularly suitable for the SLQ approach. This With the optimal control problem reduced to an equal-
barrier function is defined as a log-barrier function on the ity constrained LQ approximation, the constrained Riccati
interior of the feasible space, and switched to a different backward pass in [9] yields the quadratic value function
function at a distance δ from the constraint boundary. V (x, t) = 12 xT S(t)x + sT (t)x + s(t). This value function
( induces the optimal feedback policy in equation (1) with
− ln(h), h ≥ δ, feedback gains computed as
B(h) = (9)
β(h; δ), h < δ. K(t) = I − D† D R−1 PT + BT S + D† C,

(15)
We use the quadratic extension proposed in [21]: where I is the identity matrix and D† is the right pseudo-
!
1

h − 2δ
2 inverse of the full row rank matrix D. Notice how the feed-
β(h; δ) = − 1 − ln(δ). (10) back gains ensure that the equality constraints are satisfied
2 δ
by projecting the first term to the nullspace of the constraints,
The relaxed barrier function, which is continuous and and by adding the term D† C to satisfy the constraint when
twice differentiable, is plotted as a function of the constraint the state deviates from the plan.
4732
D. Frequency Shaping The Equations of Motion (EoM) are given by

As discussed in Section I-A, it has been proven difficult  θ̇ = T(θ)ω

to use feedback gains from an LQR design on a torque-  ṗ =W RB(θ) v



controlled robot. We propose to use the frequency-dependent P4
ω̇ = I−1 −ω × Iω + i=1 rEE j (q) × λEE j
cost function introduced in our previous work [11], which  v̇ = g(θ) + 1 P4 λEE


was used to render the feedforward solution robust to high

 m i=1 j
q̇ = uJ ,

frequency disturbances. In this work, we show that it has a
similar effect on the feedback structure. We briefly summa- where WRB and T are the rotation matrix of the base
rize how the problem is adapted and refer to [11] for further with respect the global frame and the transformation matrix
details. from angular velocities in the base frame to the Euler
A frequency-dependent cost on the inputs can be intro- angles derivatives in the global frame. g is the gravitational
duced by evaluating the cost function on auxiliary inputs ν. acceleration in body frame, I and m are the moment of
The auxiliary inputs ν are defined by frequency-dependent inertia about the CoM and the total mass respectively. The
shaping functions ri (ω) applied to the system inputs such inertia is assumed to be constant and taken at the default
that configuration of the robot. rEE j is the position of the foot
j with respect to CoM. θ is the orientation of the base in
ν̂i (ω) = ri (ω)ûi (ω), (16) Euler angles, p is the position of the CoM in world frame, ω
is the angular rate, and v is the linear velocity of the CoM.
where i denotes elements associated to individual inputs, ω q is the vector of twelve joint positions. The inputs of the
is the signal frequency in rad s−1 , and ν̂i (ω) and ûi (ω) are model are the joint velocity commands uJ and end-effector
the Fourier transform of the auxiliary input and system input contact forces λEE j in the body frame.
respectively. Following our previous work, we use high pass
filters to achieve increased costs at higher input frequencies: A. Equality Constraints
The equality constraints depend on the mode of each leg at
1 + βi jω a certain point in time. We assume that the mode sequence is
ri (ω) = , βi > αi . (17)
1 + αi jω a predefined function of time. The resulting mode-depended
constraints are
The transfer function s(ω) = r−1 (ω), with state space
realization (As , Bs , Cs , Ds ), is constructed such that vEE j = 0, if i is a stance leg,
ûi (ω) = si (ω)ν̂i (ω). The original system is augmented with vEE j · n̂ = c(t), λEE j = 0, if i is a swing leg,
an additional filter state, xs , such that x̃ = [xT , xTs ]T , and where vEE j is the end-effector velocity in world frame.
optimization is performed w.r.t the auxiliary inputs ν. The These constraints ensure that a stance leg remains on the
augmented system dynamics and state-input constraints are ground and a swing leg follows the predefined curve c(t)
defined as in the direction of the local surface normal n̂ to avoid foot
scuffing. Furthermore, the constraints enforce zero contact
f (x, Cs xs + Ds ν) g1 (x, Cs xs + Ds ν, t) = 0
, . (18) force at swing legs.
As xs + Bs ν h(x, Cs xs + Ds ν, t) ≥ 0
B. Inequality Constraints
The feedback policy obtained from this augmented system
is of the form Our proposed relaxed barrier method allows to model the
friction cone without the commonly used polytope approxi-
x − x∗ (t)

mation. The cone constraint for each end-effector,
ν(x, t) = ν ∗ (t) + Kν,x (t) Kν,xs (t)

.
xs − x∗s (t)
λEE j ∈ C(n̂, µc ), (20)
After optimization, the original input is retrieved by sub- is defined by the surface normal and friction coeffi-
stituting this policy into the output function of the filter, cient µc = 0.7. After projecting the contact forces to
u = Cs xs + Ds ν, resulting in the complete feedback policy the local frame of the surface, a canonical second-order
cone constraint is found in terms of local contact forces
Cs x∗s (t) + Ds ν ∗ (t)

u(x̃, t)
= + F = [Fx , Fy , Fz ]. An effective cone constraint used in con-
ν(x̃, t) ν ∗ (t)
junction with barrier methods [23] is given by
Ds Kν,x (t) Ds Kν,xs (t) + Cs x − x∗ (t)

. (19)
q
Kν,x (t) Kν,xs (t) xs − x∗s (t) hcone = µc Fz − Fx2 + Fy2 ≥ 0. (21)
III. I MPLEMENTATION However, the gradient of this constraint is not defined at

F = 0, which causes numerical issues close to the origin.
We apply our approach to the kinodynamic model of a While for interior point methods this problem can be solved
quadruped robot, which describes the dynamics of a single by using the squared constraint, µ2c Fz2 − Fx2 + Fy2 ≥ 0,
free-floating body along with the kinematics for each leg. this strategy does not work well together with the relaxed
4733
the obtained feedback matrices and compare the difference
between those obtained with frequency shaping and those
without. Finally, we show that the proposed elements, when
taken together, lead to a method that can be successfully
executed on the onboard hardware of a torque-controlled
robot.
We use a diagonal cost on the state and control inputs
for all experiments. When frequency shaping is used, we
set αi = 0.01, βi = 0.2 for the contact force inputs and
Fig. 3. Comparison of the friction cone constraint hcone and the perturbed αi = 0.01, βi = 0.1 for the joint velocity inputs in (17).
cone hcone, for = 5, µc = 0.7. The contour of hcone, is shown
together with the zero crossing of hcone . A. Feedback MPC
We first investigate the effect of low update frequencies
barrier function due to the saddle point it introduces at the
and the use of feedback policies from the SLQ-MPC in sim-
origin. Since the relaxed barrier function allows infeasible
ulation. We introduce model errors to show the performance
iterates, the solutions can cross the origin and end up in
of the different strategies. The mass of the control model
the negative reflection of the cone, which became a feasible
is increased by 10 % with respect to the simulation model.
region through the squaring operation. We, therefore, use the
Each MPC controller is brought to a stable trot gait and
perturbed cone
q commanded to move 1 m forward.
hcone, = µc Fz − Fx2 + Fy2 + 2 ≥ 0, (22) First, feedforward MPC is used with an update rate equal
to the control frequency of 400 Hz. Every control loop thus
which is differentiable at the origin, remains infeasible for has access to the optimal solution from the current state. The
any negative Fz , and is a conservative lower bound for the resulting desired linear accelerations are shown in the top of
original cone (21). It therefore holds that Fig. 4. Discontinuities only arise around a contact switch; the
desired accelerations are continuous otherwise. In the middle
hcone, ≥ 0 =⇒ hcone ≥ 0. (23)
plot, the desired accelerations are shown when the update rate
In Fig. 3 the level sets of this constraint are compared to is restricted to an update frequency of 20 Hz. The updates are
the original cone. It can be seen that the constraint is convex clearly visible because every time the feedforward trajectory
and the zero crossing of hcone, is strictly inside the feasible is updated, the accumulated deviation from the feedforward
region. plan is reset and a new open loop trajectory is tracked. In
the bottom plot, we show the performance when the SLQ
C. Torque Computation feedback policy is used. The discontinuities at updates are
The control inputs u consist of contact forces and joint significantly reduced and smooth trajectories comparable to
velocities. These commands have to be translated to torques. MPC with high update rate are retrieved.
When using only the feedforward trajectories, desired accel- These experiments show that using the feedback from the
erations and contact forces are extracted and tracked by a MPC can recover some of the performance lost due to lower
hierarchical inverse dynamics controller [24]. When using update rates. By using a policy that is consistent with future
the feedback policy, we forward simulate the system under MPC updates, the discontinuities at the updates are reduced
the feedback policy for a short time and extract desired up to the validity of the linear quadratic approximation.
accelerations from this rollout. The inverse dynamics is
then only used to convert the desired accelerations into B. Feedback Gains Near Inequality Constraints
torques, without adding additional feedback. This inverse In the following experiment we prescribe a task where
dynamics controller is evaluated at 400 Hz, while the SLQ- we require to lift the left front leg and simultaneously set
MPC algorithm runs asynchronously on a second onboard the desired body location towards the front left, outside of
Intel i7-4600U@2.1GHz dual core processor. the support polygon. This task requires the algorithm to
Finally, each individual motor has a local, embedded, coordinate the step with the body movement. The experiment
control loop. For the stance legs a torque controller is used, is performed without the frequency shaped cost to allow us
and for the swing legs the motors take the commanded torque to focus on the effect of using the inequality constraints.
as a feedforward term and close the loop over the desired In the left side of Fig. 5 we show the optimized solution
position and joint velocities. without inequality constraints. Without the cone constraint,
the optimal strategy is to produce negative contact forces
IV. R ESULTS in the right hind leg, such that the desired body position
We first show the qualitative differences between a feed- can be reached as early as possible. Furthermore, we plot
back policy and a feedforward policy when operating under the maximum gain in the row associated with each vertical
low update rates and disturbances. Then, the influence of contact force, which shows that the zero force feedback
the relaxed barrier function cone constraint on the planned terms for the swing leg are compliant with the corresponding
feedback gains is shown. We also examine the structure in equality constraint.
4734
Fig. 5. Optimal vertical contact force trajectory and maximum feedback
gain associated with that contact force when planning a simultaneous step
Fig. 4. Desired linear acceleration for the center of mass, as commanded and reach task without (left), and with (right) inequality constraints. The
by the MPC. In the top two plots a feedforward MPC strategy is used with time where the left front (LF) is in the air is marked in gray.
update rates of 400 Hz and 20 Hz respectively. In the bottom plot, feedback
MPC is used with an update rate of 20 Hz. The trajectories are for a trotting
gait with a command to move 1 m ahead send at 0.3 s state acting on each column of the matrix is shown above the
figure, and the control input affected by each row is shown
The resulting solution when adding inequality constraints on the left. The color intensity shows the magnitude of each
is shown on the right of Fig. 5. Here, we used fixed barrier entry in the feedback matrix, with zero shown as white and
parameters µ = 0.5, δ = 0.1, under which all constraints are the highest gain shown in black.
strictly satisfied. With the inequality constraints, the contact In the joint velocity part of the feedback matrix, one can
forces remain positive on the right hind leg. Additionally, see how the equality constraints that require zero velocity at
where before the feedback gains are about equal for the the end-effectors are reflected: The joint velocity commands
three remaining stance legs, now, the feedback gains for the are highly dependent on the linear and angular velocity of the
right hind leg are significantly reduced. When the contact base to achieve this constraint. This empirically verifies that
force approaches zero, the feedback gains go to zero as (15) indeed produces feedback matrices that are consistent
well, which ensures that the inequality constraint is not with the constraints.
violated after applying the feedback policy at a disturbed The feedback matrix in Fig 6a can be compared to the
state. This interaction between the inequality constraints and feedback matrix obtained when using the frequency shaped
the feedback gains is a result of the barrier function. As the cost function, shown in Fig 6b. We split the feedback matrix
constraint boundary is approached, the Hessian of the barrier in Fig 6b into four parts, with the vertical split between
function w.r.t. the contact forces increases. This increases the system and filter state, and horizontal split between system
input costs R in (15), and thus reduces the feedback gains. and auxiliary input, corresponding to the partitioning in (19).
This gradual decrease in feedback gains cannot be ob- First, the left side is inspected. Here we recognize a feedback
tained with a clamping strategy, where the gains are unaf- pattern similar to that in Fig 6a, obtained without the
fected, or with an active set method, where the gains in- frequency-dependent cost function. Indeed since the shaping
stantaneously decrease to zero when the constraint becomes functions in (17) have unit DC-gain, the feedback matrix
active [25]. Kν,x is expected to be approximately equal to the matrix
As a secondary effect, we see that the feedback gains on obtained without frequency shaping. Furthermore, we can
the left front leg increase before lifting the leg. Because the recognize that the direct gains from state to contact forces
right hind leg is forced to have low feedback gains in the are an order of magnitude smaller than before.
upcoming phase, the body position before lifting the foot is For the frequency shaped feedback policy, the main
of high importance, which is reflected in the gains. feedback flows from system state to auxiliary inputs. The
In addition, we note that the distance to the constraint auxiliary inputs then drive the filter states, xs , which in turn
boundary can be regulated by choosing a different barrier provide the adaptation of the system contact force inputs.
scaling µ. This means that using a finite µ, in contrast to For the joint velocities, however, the feedback from filter
decreasing it to zero as done in an interior point method, states to joint velocities is zero, as expected. The fact that
can be used to trade some optimality for a larger stability end-effector velocities are constrained to be zero is still
margin. reflected in the feedback matrix in exactly the same way
as before. Feedback terms therefore appear in the bottom
C. Feedback Structure right partition of Fig 6b, to satisfy the equality constraint,
We visualize the feedback matrix for ANYmal in a full u = Cs xs + Ds ν , for the rows associated with joint
stance configuration. The gains obtained without using the velocities.
frequency-dependent cost function are shown in Fig. 6a. The This analysis shows that using the frequency-dependent
4735
10 3 10 3
10 2
10 1 10 2
10 0
10 -1
10 1
(a)
Fig. 6. The feedback matrix in stance configuration
without (a) and with (b) frequency shaped cost. The
color of each state-input entry represents the magnitude 10 0
of the feedback. In (b), the matrix is split into four parts
with input and states associated with each block marked
at the left and bottom. The top left for example shows
the gains between system states, x, and system inputs.
10 -1
(b)
In the top plot, the points at which the feedforward policy

is updated are clearly visible. At each update, the state
reference is reset to the measured reference, which effectively
nullifies the feedback of the tracking controller. Since the
feedforward control signal does not account for the additional
disturbance, the system deviates from the desired trajectory
and builds up feedback in the tracking controller until the
next update arrives. In the bottom plot, where the feedback
policy updates arrive at the same rate as the feedforward
Fig. 7. Desired and measured torque signals in the left front knee under case, there are no discontinuous jumps in desired torque.
a constant disturbance of 5.7 kg. The desired torque is the result of both This verifies our earlier observation in simulation.
feedforward and feedback signals. Where in the top plot the feedback is
provided by the conventional tracking controller, in the bottom plot the E. Hardware Experiments: Dynamic Walking
feedback from the frequency shaped SLQ algorithm is used.
Finally, we demonstrate that the proposed method achieves
cost function introduces smoothness and reduced direct gains stable walking with all computations running on the onboard
where possible, but at the same time still respects hard computers. We use a gait known as dynamic walk. The
equality constraints on the original system inputs. gait pattern is shown in Fig. 8. It consists of a mixture of
underactuated and overactuated contact configurations when
D. Hardware Experiments: Disturbance Rejection two and three feet are on the ground, respectively. The
proposed friction cone constraint ensures that the trajectory
As seen in the accompanying video1 , when LQR gains
does not require negative contact forces and thus successfully
obtained from the SLQ-MPC are used on hardware, the
navigates the intriguing pattern of support polygons.
system becomes unstable even in full stance phase. As
A receding horizon of 1.0 s was used, for which the MPC
demonstrated in Section IV-C, the frequency shaped formula-
reaches an update frequency of approximately 15 Hz. The
tion reduces the direct gains between state and input, which
resulting desired and measured torque and joint velocity
enables successful deployment on hardware. All hardware
trajectories are shown in Fig. 9 for the left front leg. The
experiments are therefore performed with both frequency
desired and measured signals are close to each other at all
shaping and inequality constraints active.
time, showing that the applied feedback policy respects the
We first perform a simple experiment to verify the qualita- bandwidth limits of the actuators. More dynamic gaits such
tive difference observed in simulation between using only a as pace and trot motions are feasible as well. In the linked
feedforward policy with a conventional tracking controller video a continuous transition between the two gaits is shown.
or when using a feedback policy. The robot is put in a The feedback MPC is able to skilfully coordinate leg and
standing configuration with the desired position set to the body motions in these unstable and underactuated situations.
initial position with zero velocity. Afterward, we place a
5.7 kg mass (≈ 15% of the total mass) on top of the robot V. C ONCLUSION
to induce a constant disturbance. Fig. 7 shows the resulting
desired and measured torque trajectories in the left front knee In this work, we proposed to use feedback MPC as an
after the system reaches an equilibrium. effective way to handle the slow update rate associated
with the computational restrictions of mobile platforms. The
1 A video of the experiments is available at https://youtu.be/ sensitivity of CoM control to uncertainties and sampling
KrTrLGDA6FQ period has been recently analyzed in [26], which provides
4736
[2] A. Liniger, A. Domahidi, and M. Morari, “Optimization-based
autonomous racing of 1:43 scale rc cars,” Optimal Control
Applications and Methods, vol. 36, no. 5, pp. 628–647, 2015.
[3] Y. Tassa, T. Erez, and E. Todorov, “Synthesis and stabilization of
complex behaviors through online trajectory optimization,” in IROS,
2012, pp. 4906–4913.
[4] J. Koenemann, A. D. Prete, Y. Tassa, E. Todorov, O. Stasse, M. Ben-
newitz, and N. Mansard, “Whole-body model-predictive control ap-
Fig. 8. Gait pattern for the dynamic walk used in the hardware experiments. plied to the hrp-2 humanoid,” in IROS, 2015, pp. 3346–3351.
Colored areas represent that a leg is in contact. [5] F. Farshidian, E. Jelavic, A. Satapathy, M. Giftthaler, and J. Buchli,
“Real-time motion planning of legged robots: A model predictive
control approach,” in Humanoids, 2017, pp. 577–584.
[6] R. M. Murray, Optimization-based control. California Institute of
Technology, CA, 2009.
[7] M. Posa, S. Kuindersma, and R. Tedrake, “Optimization and stabi-
lization of trajectories for constrained dynamical systems,” in ICRA.
IEEE, 2016, pp. 1366–1373.
[8] S. Mason, N. Rotella, S. Schaal, and L. Righetti, “Balancing and
walking using full dynamics lqr control with contact constraints,” in
2016 IEEE-RAS 16th International Conference on Humanoid Robots
(Humanoids). IEEE, 2016, pp. 63–68.
[9] F. Farshidian, M. Neunert, A. W. Winkler, G. Rey, and J. Buchli,
“An efficient optimal planning and control framework for quadrupedal
locomotion,” in ICRA. IEEE, 2017, pp. 93–100.
[10] M. Diehl, H. G. Bock, and J. P. Schlöder, “A real-time iteration scheme
for nonlinear optimization in optimal feedback control,” SIAM Journal
on control and optimization, vol. 43, no. 5, pp. 1714–1736, 2005.
[11] R. Grandia, F. Farshidian, A. Dosovitskiy, R. Ranftl, and M. Hutter,
Fig. 9. Torque and joint velocity trajectories during three cycles of a “Frequency-aware model predictive control,” IEEE Robotics and Au-
dynamic walk with ANYmal. The desired (dashed line) and measured (full tomation Letters, vol. 4, no. 2, pp. 1517–1524, 2019.
line) are shown for the HAA (Hip Abduction Adduction), HFE (Hip Flexion [12] D. Mayne, J. Rawlings, C. Rao, and P. Scokaert, “Constrained model
Extension), and KFE (Knee Flexion Extension). predictive control: Stability and optimality,” Automatica, vol. 36, no. 6,
pp. 789 – 814, 2000.
[13] A. Bemporad and M. Morari, “Robust model predictive control: A
a theoretical basis for our observation that stable walking is survey,” in Robustness in identification and control, A. Garulli and
possible with low update rates. A. Tesi, Eds. London: Springer London, 1999, pp. 207–226.
[14] J. Lee and Z. Yu, “Worst-case formulations of model predictive control
We proposed a relaxed barrier function method to extend for systems with bounded parameters,” Automatica, vol. 33, no. 5, pp.
the SLQ algorithm to optimization problems with inequality 763 – 781, 1997.
[15] P. O. M. Scokaert and D. Q. Mayne, “Min-max feedback model
constraints. In particular, the friction cone is implemented predictive control for constrained linear systems,” IEEE Transactions
through a perturbed second-order cone constraint. This for- on Automatic Control, vol. 43, no. 8, pp. 1136–1142, 1998.
mulation adds a convex penalty to the cost function and [16] D. P. Bertsekas, Dynamic Programming and Optimal Control. Athena
Scientific, 1995.
avoids numerical ill-conditioning at the origin of the cone. [17] D. Mayne, “A second-order gradient method for determining optimal
A frequency-aware MPC approach was used to system- trajectories of non-linear discrete-time systems,” International Journal
atically include the bandwidth limit of the actuators in the of Control, vol. 3, no. 1, pp. 85–95, 1966.
[18] S. Mason, L. Righetti, and S. Schaal, “Full dynamics lqr control of a
feedback policy design. This was a key factor to achieve humanoid robot: An experimental study on balancing and squatting,”
closed-loop stability on hardware without any detuning of in Humanoids. IEEE, 2014, pp. 374–379.
the low-level actuator controllers as suggested in [8]. The [19] M. Focchi, G. A. Medrano-Cerda, T. Boaventura, M. Frigerio, C. Sem-
ini, J. Buchli, and D. G. Caldwell, “Robot impedance control and
frequency-aware approach effectively allows to set high gains passivity analysis with inner torque and velocity feedback loops,”
in the low-frequency spectrum and to attenuate gains in high Control Theory and Technology, vol. 14, no. 2, pp. 97–112, 2016.
frequency. It thus increases the robustness of the feedback [20] C. Feller and C. Ebenbauer, “Relaxed logarithmic barrier function
based model predictive control of linear systems,” IEEE Transactions
policy in the presence of high-frequency disturbances. on Automatic Control, vol. 62, no. 3, pp. 1223–1238, 2017.
We showed that the feedback policy is consistent with the [21] J. Hauser and A. Saccon, “A barrier function method for the optimiza-
constraints of the locomotion task. We empirically confirmed tion of trajectory functionals with constraints,” in Proceedings of the
45th IEEE Conference on Decision and Control, 2006, pp. 864–869.
that the MPC policy reduces the feedback gains near the [22] A. P. Aguiar, F. A. Bayer, J. Hauser, A. J. Häusler, G. Notarstefano,
boundaries of the friction cone to respect the inequality A. M. Pascoal, A. Rucco, and A. Saccon, Constrained Optimal Motion
constraints. We also demonstrated that the optimized policy Planning for Autonomous Vehicles Using PRONTO. Cham: Springer
International Publishing, 2017, pp. 207–226.
sets zero gains on the contact force of the swing legs and [23] M. S. Lobo, L. Vandenberghe, S. Boyd, and H. Lebret, “Applications
encodes the zero end-effector velocity constraint for stance of second-order cone programming,” Linear algebra and its applica-
legs to satisfy state-input equality constraints. tions, vol. 284, no. 1-3, pp. 193–228, 1998.
[24] C. D. Bellicoso, C. Gehring, J. Hwangbo, P. Fankhauser, and M. Hut-
ter, “Perception-less terrain adaptation through whole body control and
R EFERENCES hierarchical optimization,” in Humanoids, 2016, pp. 558–564.
[25] Y. Tassa, N. Mansard, and E. Todorov, “Control-limited differential
dynamic programming,” in ICRA. IEEE, 2014, pp. 1168–1175.
[1] K. Alexis, C. Papachristos, G. Nikolakopoulos, and A. Tzes, “Model [26] N. A. Villa, J. Englsberger, and P. Wieber, “Sensitivity of legged
predictive quadrotor indoor position control,” in 2011 19th Mediter- balance control to uncertainties and sampling period,” IEEE Robotics
ranean Conference on Control Automation (MED), June 2011, pp. and Automation Letters, vol. 4, no. 4, pp. 3665–3670, Oct 2019.
1247–1252.
4737

Feedback MPC For Torque-Controlled Legged Robots

Uploaded by

Copyright:

Available Formats

Feedback MPC For Torque-Controlled Legged Robots

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Feedback MPC For Torque-Controlled Legged Robots

Uploaded by

Copyright:

Available Formats

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Macau, China, November 4-8, 2019

Feedback MPC for Torque-Controlled Legged Robots

Abstract— The computational power of mobile robots is

978-1-7281-4004-9/19/$31.00 ©2019 IEEE 4730

III. I MPLEMENTATION However, the gradient of this constraint is not defined at

In the top plot, the points at which the feedforward policy

You might also like