Model Predictive Neural Control for Aggressive
Helicopter Maneuvers
Eric A. Wan, Alexander A. Bogdanov, Richard Kieburtz
Antonio Baptista, Magnus Carlsson, Yinglong Zhang, and Mike Zulauf
OGI School of Science and Engineering, OHSU
20000 NW Walker Rd, Beaverton, Oregon 97006
Editor’s Summary
This chapter shares with Chapter 9 the adoption of a model predictive control (MPC) framework for
flight control applications, but the details differ substantially. In particular, the control feedback in this
case is a superposition of a neural-network-based nonlinear mapping and a nonlinear state-dependent
Riccati equation (SDRE) controller. The neural network is optimized (trained) online for high performance using a high-fidelity dynamic simulation model of the vehicle. The SDRE controller design,
repeated at every sample time, provides initial local asymptotic stability. The relative contributions of
each controller vary depending on the training error of the neural network.
The application considered is maneuver control of autonomous helicopters. The controller is multivariable with five actuator command outputs. Simulation results for a variety of maneuvers are presented,
including rapid take-off and landing and a difficult ”elliptic” maneuver. The authors also incorporate
wind effects in the optimization. Instead of the conventional tactic of treating environmental conditions
as a random disturbance, this allows control commands to be optimized for localized wind flows. This
approach relies on wind predictions over the optimization horizon; sensors and models to support such
predictions at the appropriate scale and resolution are the object of intense research today.
Both the SDRE design and the neural network training are computationally demanding. The authors consider possible tradeoffs between computational effort and controller performance in order to
approach real-time feasibility. Various approximations to some of the especially complex calculations
are considered. Computing time and control performance data are presented, comparing different neural
1
network complexities, optimization horizons, and update intervals. Substantial CPU time speedup can
be achieved with minor loss of performance.
The autonomous helicopter control theme is continued in Chapter 12 where also neural networks
feature in the solution approach.
10.1. INTRODUCTION
Advances in technology and modeling in commercially available software makes possible highly accurate simulations of aircraft and their environmental interactions. With increased on-board computational
resources, this allows for the design of sophisticated nonlinear controllers that exploit these simulations
on-line, in order to achieve high-performance autonomous control of vehicles capable of rapid adaptation
and aggressive maneuvering. In this chapter, we describe a method for helicopter control through the
use of the FlightLab simulator [1] coupled with nonlinear control techniques. FlightLab is a commercial
software product developed by Advanced Rotorcraft Technologies. Details on the modeling capabilities
of FlightLab are given in Section 10.2.2. A challenge in using FlightLab and similar flight simulators
to design controllers, is that the governing dynamic equations are not readily available (i.e., the aircraft
represents a ‘black-box’ model). This precludes the use of most traditional nonlinear control approaches
that require an analytic model. Our methodology is based on the model predictive control (MPC) approach [18, 15]. MPC is an optimization based framework for learning a stabilizing control sequence
that minimizes a specified cost function. For general nonlinear systems, this requires a numerical optimization procedure involving iterative forward (and backward) simulations of the model dynamics. The
resulting control sequence represents an ’open-loop’ control law, which can then be re-optimized on-line
at periodic update intervals to improve robustness.
Our approach to MPC, referred to as model predictive neural control (MPNC), utilizes a combination
of a state-dependent Riccati equation (SDRE) controller and an optimal neural controller. In contrast to
traditional MPC, the architecture implements an explicit feedback controller. The SDRE technique [5, 6]
is an improvement over traditional linearization based Linear Quadratic (LQ) controllers (SDRE control
will be elaborated on in a later section). SDRE design, however, requires an analytic representation of an
aircraft model. To provide an analytic representation, the numeric simulator model is approximated by a
six-degree of freedom (6DOF) rigid body dynamic model, providing a set of governing equations at each
2
time instant necessary to design the SDRE. In our framework, the SDRE controller provides an initial
stabilizing controller and it is then augmented by a neural network (NN) controller. The NN controller
is optimized on-line using a calculus of variations approach to minimize the MPC cost function. Note
that this differs from either the use of a NN for system identification of the nonlinear plant as part
of a traditional MPC approach (see [17]), or the use of a NN for error feedback to account for model
uncertainty (see [4, 12]). We also make use of the SDRE solution to provide a control Lyapunov function
(CLF) for use in a receding horizon approach to MPC. In addition, we explore a number of numeric
approximations in order to improve computational performance of the approach. The basic framework
of our approach has been described in [22, 3].
10.2. MPC CONTROL
The general MPC optimization problem involves minimizing a cost function
X
tfinal
Jt
=
xk uk )
Lk (
k=t
;
;
x
u
which represents the accumulated cost of the sequence of states k and controls k from the current
discrete time t to the final time, tfinal . For regulation problems tfinal
=
1. Optimization is done with
respect to the control sequence subject to constraints of the system dynamics,
xk+1
x u
=
f (xk uk )
;
:
(10.1)
x Qxk + uTk Ruk corresponds to the standard Linear Quadratic
As an example, choosing Lk ( k ; k ) = Tk
cost. For linear systems, this leads to linear state-feedback control, which is found by solving a Riccati
equation [7]. More general costs allow for inequality constraints on state and control, minimum time,
minimum control (fuel), etc. In this paper we consider general multi-input-multi-output (MIMO) nonlinear systems with tracking error costs of the form
Lk (
T
over
xk uk ) = eTk Qek + uTk Ruk + (uover
k ) Rsat uk
;
3
(10.2)
where ek
=
xdes
k , with corresponding to a desired reference state trajectory. The last term assesses
xk
a penalty for exceeding control saturation level usat , where each element (j
uover
k is defined as
of the vector
8
sat
<
over = 0, if j k j j
k
: k sat
j sign( k ), otherwise.
u
u
= 1; : : : ; m)
j
u
j
u
j
u
u
j
t
In general, a numerical optimization approach is used to solve for the sequence of controls, fu k gtfinal
arg min Jt ,
=
corresponding to an open-loop control law, which can then be re-optimized on-line at pe-
riodic update intervals. The complexity of the approach is a function of the final time t final , which
determines the length of the optimal control sequence. In practice, we can reduce the number of computations by taking a Receding Horizon (RH) approach, in which optimization is performed over a shorter
fixed length time interval. This is accomplished by rewriting the cost function as
J
where the last term
V
(xt+N )
t
=
t+X
N 1
k=t
k (xk ; uk ) + V (xt+N )
(10.4)
L
denotes the cost-to-go from time
t
+
N
to time t final . Note that for
trajectory following, the cost-to-go is also implicitly a function of the given desired trajectory x des
t+N
through xdes
tfinal (and resulting error). The advantage is that this yields an optimization problem of fixed
length N . In practice, the true value of V (xt+N ) is unknown, and must be approximated. Most common
is to simply set
V
(x t+N ) = 0;
however, this may lead to reduced stability and poor performance for
short horizon lengths [10]. Alternatively, we may include a control Lyapunov function (CLF), which
guarantees stability if the CLF is an upper bound on the cost-to-go, and results in a region of attraction
for the MPC of at least that of the CLF [10].
10.2.1. HELICOPTER MODELING AND DESIGN CONSIDERATIONS
10.2.2. FlightLab Helicopter Model
The FlightLab package provides a simulation tool for multidisciplinary concurrent engineering. Consisting of a number of modular and reconfigurable modeling primitives (called ”components”), it uses a
high-level interpretive language (Scope) for rapid prototyping. With these reconfigurable components,
the user is able to construct a model of selective fidelity, as well as create or link custom components.
4
Numerical integration of dynamics is accomplished through iterative techniques (e.g., Newton-Raphson)
to solve systems of equations, using a selective compartmental approach for integrating subsystem components. FlightLab also has a number of GUI tool, such as editors for developing models and control
systems, and a dynamic system analyzer. It allows real-time interfacing with external hardware or software components. One of the outstanding features of the FlightLab is its ability to accurately model the
main rotor and the interaction between it and other components, such as the tail rotor (through the use
of empirical formulas), external wind (e.g., through an external wind server), ground effects (including
both in- and out-of-ground effects with an image method, a free-wake model, and a horse-shoe ground
vortex model), etc. It also includes the modeling of dynamic stall, transonic flow, aeroelastic response,
vortex wake, blade element aerodynamics, and can also provide finite element structure analysis.
For our research purposes, we generated a high-fidelity helicopter model having a rigid fuselage with
empirical airloads, and elastic blades with quasi-unsteady airloads 1 , 3-state inflow2 , and direct control
of the swashplate angles. The model is presented as a numerical discrete-time nonlinear system with 92
internal state variables.
10.2.3. Control Specifications
For helicopter control, we define the state vector xk to correspond to the standard states of a 6DOF rigid
body model. This 12 dimensional state vector consists of Cartesian coordinates in the inertial frame
x; y; z , Euler angles
; ; (yaw, roll, pitch), linear velocities u; v; w and angular velocities p; q; r in
the body coordinate frame. Technically, this represents a reduced state-space, as our FlightLab model
utilizes a total of 92 internal state variables (e.g., main and tail rotor states). However, we treat these
as both unobservable and uncontrollable for purposes of deriving the controller. There are 4 control
inputs, 0 ; 1C ; 1S ; T 0 , corresponding to the main collective, lateral cyclic, longitudinal cyclic, and
tail collective controls (incident angles of rotor blades) 3 .
1
Quasi-unsteady airloads incorporates both theory and table look-up for the calculation of airloads for the effective angle
of attack, side-slip angle and Mach number; certain dynamics (e.g., dynamic stall) are simplified or neglected.
2
The general finite state inflow model allows for the variation of the rotor induced flow with arbitrary harmonics azimuthally and an arbitrary order of polynomial radially. The 3-state model is truncated from the higher order finite state
inflow model. Its first state is for uniform inflow (both azimuthally and radially), the 2nd state is the 1st cosine harmonics
of azimuthal variation with linear radial variation, the 3rd state is for the 1st sine harmonics of azimuthal variation with the
similar linear radial variation [16].
3
We set control constraints as follows: Max. 0 = 20 deg., Max. 1C = 30 deg., Max. 1S = 32 deg., Max.
T 0 = 39 deg.
5
The tracking error for the helicopter,
e
k
=
x x
k
des
k
, is determined by the trajectory of a reference
target (provided by a mission planner for higher-level control coordination). The target specifies desired
coordinates, velocities, and attitude in the inertial frame. The reference state is then projected into the
body frame to produce the desired state,
x
where
des
k
=
T(x )x
tar
k
k
;
(10.5)
T(x ) is an appropriate projection matrix consisting of necessary rotation operators:
k
0
BB c c
BB s
BB
B 0
T(x) = B
BB 0
BB
B s c
with s(), c() denoting sin and
s s + c cs
0
0
c c
0
0
c s
0
1
0
0
0
0
1
0
0
0
c s + s s c
s c + c s s
;
c c + ss s
O7;5
os,
x
k
1
C
C
C
C
C
O5 7 C
C
C
C
C
C
C
C
A
I7;7
u; w; q; ; v; p; ; r; ; x; y; z ) . Minimization of the tracking
T
=(
error causes the helicopter to move in the direction of the target motion.
10.3. MPNC
Difficulties with application of traditional MPC include a) the need for a ‘good’ initial sequence of
controls that is capable of stabilizing the model, and b) the need to re-optimize at short intervals to
avoid problems associated with open-loop control (e.g, lack of robustness to model uncertainty or disturbances). We address these issues by directly implementing a feedback controller as a combination of
an SDRE stabilizing controller and a neural controller,
u
where 0
<
k
=
Nnet(x ; e ; w) + (1
k
k
)
K(x )e
k
k
(10.6)
< 1 is a relative weighting constant. A standard multi-layer feedforward neural network
is used for its universal mapping capabilities (not for its biological motivation). The SDRE controller
K(x )e
k
k
provides a robust stabilizing control, while the weights of the neural network,
6
w, are optimized
to minimize the overall receding horizon MPC cost. Note that this control structure is a function of
the state xk (and tracking error ek ), providing explicit feedback. As stated earlier, the SDRE controller
requires a set of governing equations, which are not available in FlightLab. Thus we derive a 6DOF rigid
body model as an analytical approximation. The NN controller, however, is designed using the full flight
simulator model. The NN is trained on-line, and is updated for each horizon interval. For this, the SDRE
solution and the 6DOF model are again utilized in a number of different ways in order to implement
terms used for minimizing the MPC cost. Design of the MPNC controller, which includes the SDRE and
neural controller, is detailed in the following subsections.
10.3.1. SDRE CONTROLLER
Referring to the system state-space Equation 10.1, an SDRE controller [5] is designed by reformulating
f (xk ; uk ) as
f (xk ; uk ) = (xk )xk + (xk )uk :
This representation is not a linearization. To illustrate the principle, consider a simple scalar example,
xk
sin x
+1 = sin xk + xk os xk uk . A valid state-space representation is then (xk ) = xk k and (xk ) =
os xk .
xk
Based on the new state-space representation, we design an optimal LQ controller to track the desired
4
state xdes
k . This leads to the nonlinear controller ,
1 T (x )P(x )(x
usd
xdes
k
k
k
k = R
k ) K(xk )ek ;
where P(xk ) is a solution of the standard Riccati Equations using state-dependent matrices (x k ) and
(xk ), which are treated as being constant. The procedure is repeated at every time step at the current
state xk and provides local asymptotic stability of the plant [5]. In practice, the approach has been
found to be far more robust than LQ controllers based on standard linearization techniques. See [5] for
a discussion on the class of dynamic equations that can be presented in a state-dependent form.
u
Kx e
tr
In the case where the SDRE is used as a stand-alone controller, we formulate sd
( k ) k + utr
k =
k , where uk is
scheduled to compensate for steady state errors related to various trim conditions. Alternatively, way may also include an
integral control in the SDRE framework. Note that when the full MPNC control is used, the trim control is accounted for
automatically through the optimization of the NN.
4
7
Dynamic equations for the FlightLab helicopter model are not available. Thus we use the simplified
dynamics given by a 6DOF rigid body model,
u_ =
(wq
vr )
v_
=
(ur
wp) + g os sin + Fy =Ma
_ = p + q sin tan + r os tan
w_
=
(vp
uq ) + g os os + Fz =Ma
_ = q os
Ixx p_ = (Iyy
Iyy q_ = (Izz
where
Rot1 (
g sin + Fx =Ma
Izz r_ = (Ixx
Izz )qr + Ixz (r_ + pq ) + L
Ixx )rp + Ixz (r 2
p2 ) + M
_
T
x_ y_ z_
Iyy )pq + Ixz (p_
qr ) + N
r sin
q sin se + r os se
=
Rot1(
=
; ; )
u v
w
T
; ; ) is a rotation matrix (coordinate transformation) from body frame to inertial frame,
Ma is the aircraft mass, Ixx , Iyy , Izz are moments of inertia, Ixz is the product inertia, and Fx ; Fy ; Fz ; L; M; N
are rotor-induced forces and moments. The forces and moments are nonlinear functions of helicopter
states and control inputs. We then rewrite this into a state-dependent continuous canonical representation
x_ = A(x)x + B(x)u. The matrix A(x) is given explicitly as,
0 0
q=2
B
q=
2
0
B
B
0
0
B
B
0
0
B
B
r=2
p=2
B
B
0
B 00
0
A(x) = B
B
B
0
0
B
B
0
0
B
s s+
B
c c
B
+
c c s
B
B
B s c c s+
+ss c
0
a2 p + a3 r
s tan
a4 p + a5 r
where s(), c() denote
and
s
0
sin
r=
p=
g c 1
0
0
0
0
0
s=c
2
2
0
0
0
0
0
0
0
s c+
g s
0
0
0
0
0
0
0
0
c
c c
os, a0 =
0
0
, a1
2
w=
a2 q
I
0
0
g c s
a4 q
0
0
0
0
0
0
0
0
0
0
0
1
s s
c c+
+sss
c s
2
2
v=
g c c 1
a0 r a1 p
+c
Izz Ixx
Iyy
0
= Ixz , a2 =
yy
1
g=z C
C
0 C
C
0 C
C
0 C
C
0 C
C
0 C
C
0 C
CC
0 C
0 C
CC
C
0 0 0 0 C
A
2 0 0 0
0
0 0 0
a0 p + a1 r 0 0 0
s
0 0 0
u=2
0 0 0
a3 q
0 0 0
c tan 0 0 0
a5 q
0 0 0
c=c
0 0 0
0
0 0 0
v=
0
0
0
0 0 0 0
2 I2
Ixz (Ixx Iyy +Izz )
Iyy Izz Izz
xz
2 ) , a3 = 2(Ixx Izz I 2 ) ,
2(Ixx Izz Ixz
xz
( Ixx+Iyy Izz ) , x = (u; w; q; ; v; p; ; r; ; x; y; z )T . (x ) is then obtained
2 )
k
2(IxxIzz Ixz
from A(x) by discretization at each time step (i.e., (x k ) = eA(xk )t ) 5 .
a4 =
+ ,a =
) 5
2
2
w=
u=
2
2
Ixx
Iyy Ixx Ixz
2
Ixx Izz Ixz
2(
Ixz
Explicit analytic equations are unavailable for the rotor-induced forces and moments, which would
relate the 6DOF model to the FlightLab model. These terms are highly nonlinear and include vehiclespecific aerodynamic look-up tables. Thus
B(x) cannot be specified in closed-form.
Therefore, we
Parameters settings to match FlightLab are M a = 16308 lbs, Ixx = 9969 lbs ft2 , Iyy = 44493 lbs ft2 , Izz =
44265 lbs ft2 , Ixz = 1478 lbs ft2 .
5
8
approximate
x
( k)
with a constant
by linearizing6 the full FlightLab model with respect to the control
inputs uk around the hover trim state 7 .
Finally, given
(xk ) and
, we can design the SDRE control gain
K(xk ) at each time step.
Note
that while () is based on the 6DOF model, the state argument x k comes directly from the FlightLab
model. We have found that this mixed approach using the approximate model plus the linearized control
matrix,
, is far more robust that simply using a standard LQ approach based on linearization of all
system matrices (see Experimental Results Section).
u overT
R sat u over
+ uTk Ru k
k
k
ϕ k ,θ k ,ψ k
sin(⋅)
Neural control
cos(⋅)
xk
u nn
k
Neural
Network
uk
a
1−α
ek
u
Target
x
Helicopter
dynamics
tar
k
T ( xk )
x
xk
sd
k
ek
des
k
K (x k )
xk
x k +1
Vehicle
xk
q -1
SDRE control
V ( xt + N )
eTkQ ek
Figure 10.1: MPNC signal flow diagram
10.3.2. NEURAL NETWORK CONTROLLER
The neural controller is designed using the full FlightLab helicopter model and augments the SDRE
controller. The overall flowgraph of the system in shown in Figure 10.1. The neural controller is specified
as follows:
unn
=
N net(u; v; w; p; q; r; z; s ; s; s; c ; c; c; e; w);
where we have included sines and cosines of the yaw, pitch, and roll angles. This is motivated since
the simplified helicopter dynamics depend on such trigonometric functions of the Euler angles. The
6
Linearization is performed over multiple rotor revolutions with averaging.
Alternatively, we may explicitly schedule (x k ) using combinations of different trim states, as is often done with Linear
Parameter Varying (LPV) approaches [23, 19, 2].
7
9
coordinates x and y of the aircraft in the inertial frame do not influence dynamics, and are excluded as
inputs (the altitude, z is included due to modeling of ground effects and air density).
The NN represents an optimal feedback controller. Optimization is performed by learning the
weights,
w, of the NN in order to minimize the receding horizon MPC cost function (Equation 10.4)
subject to the system dynamics and the composite form of overall feedback controller (Equation 10.6).
The problem is solved by taking a standard calculus of variations approach, where k and k are vectors
of Lagrange multipliers in the augmented cost function
J
X1
t+N
=
t
L
k(
x ;u ) + x
T
k
k
k
f (x ; u )
k +1
k
k =t
+
where
u
nn
k
=
T
k
u
u
k
nn
(1
k
)
k
Nnet(x ; e ; w). The cost-to-go V (x +
k
k
t
N
V (x +
+
k
+
k
K(x )e
t
N
)
(10.8)
) is approximated using the solution of the SDRE
at time t + N ,
1n
X
V (x +
t
x x
(
k
des
t+N
)
e
o
Q(x x ) + u Ru :
N
T
e P(x
T
)
t+N
t+N
des
t+N
T
t+N
k
)
(10.9)
k
k
k =N
This CLF provides the exact cost-to-go for regulation assuming a linear system at the horizon time. A
similar formulation was used for nonlinear regulation control in [21].
We can now derive the recurrent Euler-Lagrange equations
f (x ; u )
L (x ; u )
+1 +
x
x
Nnet(x ; e ; w) e
Nnet(x ; e ; w)
+
+
x
e
x
8
k
T
k
=
T
k
k
k
k
k
k
k
i
k
with t+N
>
>
>
<
=
>
>
>
:
=
k
k
L
f (xk ;uk )
uk
k (xk ;uk )
uk
t+ N )
xt+N
V
(x
k
k
T
T
k +1
T
k
k
k (xk ;uk )
uk
+ (1
k
T
)
K(x )e
x
T
k
k
(10.10)
k
k
, if juki j usat
i
L
+
k
i
, if juki j > usat
i
i
t+N )
xt+N
V
(x
+
t+N ) et+N
et+N
xt+N
V
(x
10
T
,k
t N)
=( +
t N)
1, ( +
2,
: : : ; t, and
2 ( u overT
R sat + uTk R )
k
cos(⋅)
Neural control
− sin(⋅)
du nn
k
Neural
Network
Jacobian
µk
a
du k
λk +1
Helicopter
Jacobians
-a
denn
k
dx nn
k
dx k
du ksd
de nn
k
∂ T ( x k ) ⋅ x tar
k
de k
Vehicle
∂x k
λk
∂K (x k )
ek
∂x k
K(xk )
q +1
SDRE
Jacobian
dx des
k
dx nn
k
∂V ( x t + N )
∂x t + N
2eTk Q
Figure 10.2: Adjoint system
i = 1; : : : ; m (m is the dimension of control vector u). From Equations 10.2 and 10.5,
L
k
(x ; u ) = L (x ; u ) e = 2e
x
e
x
k
k
k
k
k
k
T
k
k
k
Q I
( )
T xk xtar
k
k
x
(10.11)
k
( )
where the partial is evaluated analytically by specifying T x k . Each element i of the gradient vector
e uk )
uk
Lk ( k ;
i
=2
uTk R i , if
ju j u
sat
ki
i
.
These equations correspond to an adjoint system (shown graphically in Figure 10.2), with optimality
condition
J
w
t
=
X1
t+N
k =t
T
k
Nnet(x ; e ; w)
= 0:
w
k
k
The overall training procedure for the NN can now be summarized as follows:
1. Simulate the system forward in time for N time steps (Figure 10.1 ). Note that the SDRE controller
is updated at each time step.
2. Run the adjoint system backward in time to accumulate the Lagrange Multipliers (Figure 10.2).
Jacobians are evaluated analytically or by perturbation.
3. Update the weights using gradient descent 8 ,
8
w =
Jk
w.
In practice we use an adaptive learning rate for each weight in the network using a procedure similar to delta-bar-delta
[9].
11
4. Repeat until an acceptable level of cost reduction is achieved, or simply for a preset number of
iterations.
For the first horizon, the NN weights are initialized by pre-training the NN to behave similar to the
SDRE controller. The SDRE controller provides for stable tracking and good conditions for subsequent
training. As training progresses, the NN decreases the tracking error. This reduces the SDRE control
output, which in turn gives more authority to the neural controller. The training process is repeated at
the update interval, with weights from the previous horizon used as the initialization.
Stability of MPNC is closely related to that of traditional MPC. Ideally, in the case of unconstrained
x
optimization, stability is guaranteed provided V ( t+N ) is a CLF and is an (incremental) upper bound on
the cost-to-go [10]. In this case, the minimum region of attraction of the receding horizon optimal control
is determined by the CLF used and horizon length. The guaranteed region of operation contains that of
the CLF controller and may be made as large as desired by increasing the optimization horizon (restricted
to the infinite horizon domain) [11]. In our case, the minimum region of attraction of the receding
horizon MPNC is determined by the SDRE solution used as the CLF to approximate the terminal cost.
In addition, we also restrict the controls to be of the form given by Equation 10.6 and the optimization
is performed with respect to the NN weights
w. In theory, the universal mapping capability of neural
networks implies that the stability guarantees are equivalent to that of the traditional MPC framework.
However, in practice stability is affected by the chosen size of the NN (which affects the actual mapping
capabilities), as well as the horizon length and update interval length (how often NN is re-optimized).
This can be clearly traced in Table 10.2 (Section 10.4), where additional hidden neurons results in a
higher overall performance. When the horizon is short, performance is more affected by the chosen CLF.
On the other hand, when the horizon is long, performance is limited by the NN properties. An additional
factor affecting stability is the specific algorithm used for numeric optimization. Gradient descent, which
we use to minimize the cost function (Equation 10.8) is guaranteed to converge to only a local minimum
(the cost function is not guaranteed convex with respect to the NN weights), and thus depends on the
initial conditions. In addition, convergence is assured only if the learning rate is kept sufficiently small.
To summarize these points, stability of the MPNC is guaranteed under certain restricted ideal conditions.
In practice, the designer must select appropriate settings and perform sufficient flight experiments to
assure stability and performance over a desired flight envelope.
12
10.3.3. ENGINE SPEED CONTROL
Typically, engine speed is maintained by a separate throttle regulator. If this is not implemented correctly,
a stall situation may arise due to an improper coupling with the main flight controller. During aggressive
maneuvers, increased engine load may result in a reduced rotor speed and a loss of lift. To overcome the
altitude loss, the controller reacts by increasing the collective. However, this results in even higher loads
and further slowing down of the rotor, and thus further loss of lift and altitude. To prevent this, one must
decrease the engine load and gain rotor speed back, while momentarily sacrificing tracking performance.
In our implementation, we chose to by-pass a separate throttle regulator, and build this directly into
the MPNC framework. We accomplish this by adding an additional state variable corresponding to the
main rotor speed as well as a direct throttle (rotor speed) control command. For the SDRE controller,
we simply augment
(xk ) with an additional row and column found by linearization with respect to the
rotor speed at hover. The neural network is provided with extra inputs corresponding to the main rotor
speed and its error, as well as an additional throttle (rotor speed) command output. Optimization within
the MPC framework remains the same.
10.3.4. INCORPORATING WIND FLOWS
With the increased sophistication in atmospheric modeling and sensing, it is becoming more feasible to
directly optimize for wind effects. This is in contrast to traditional approaches that often treat wind as
a disturbance to be countered through feedback error correction. Modeling predictions of atmospheric
conditions are well established at global scales and coarse resolution, and the subject of intense research
at regional scales and fine resolution9. Global and regional predictions, however, are most supportive of
mission-planning level control of autonomous vehicles. For aggressive maneuvers, data from on-board
or land-based radar appears to offer the most promising path to realistically incorporating atmospheric
9
The Medium-Range Forecast Model, MRF [13, 14], maintained by the National Oceanic and Atmospheric Administration, is an example of a global model with a long track record. It covers the entire earth surface, and extends in the vertical
to a layer at the pressure of 2 hPa. The horizontal grid has 512x256 cells, each roughly equivalent to 0.7 X 0.7 degree
latitude/longitude. In the vertical, the grid has 42 unequally spaced sigma levels. Recent efforts in regional prediction are
exemplified by the Advanced Regional Prediction System (ARPS), based on a multi-scale non-hydrostatic atmospheric model
[24]. ARPS targets storm-scale and mountain scale predictions, with horizontal and vertical resolutions as fine as 1km and
100m, respectively. Applications at this level of resolution have the potential for simulation of boundary layer eddies and
wind gusts. The impetus for predictions at these scales is credited to the deployment Doppler radars in the US, and of techniques for retrieving unobserved quantities from Doppler radar data to yield mass and wind fields appropriate for initialization
of storm-scale prediction models [20].
13
conditions in on-line control strategies.
FlightLab has the built-in capability of incorporating some forms of winds (e.g., sinusoidal gusts,
stochastic atmospheric turbulent wind etc.) into the calculation of total airloads on the rotors as well
as fuselage. For our purposes, we replaced the existing atmospheric turbulent wind component (ATMTUR), with one we developed to allow inputs from an external wind server. The new component was
compiled separately and linked to other components that require wind information (e.g., the aerodynamic
components for main rotor etc.). This enables us to study the responses of a helicopter to controlled wind
patterns (e.g., wind shear and Large-eddy Simulations (LES) - Section 10.4.3), as well as future integration with measurements from on-board or land-based radar.
Given the helicopter and wind interaction model, no additional modifications are necessary for incorporation into the MPNC framework. In section 10.4.3 we report a number of simulation experiments
illustrating performance trade-offs associated with how the wind is approximated.
10.3.5. COMPUTATIONAL SIMPLIFICATIONS
Generally, MPC design implies multiple simulations of the system forward in time and the adjoint system
backward in time. Computations scale with the number of training epochs and the horizon length. The
most computationally demanding operations correspond to solving the Riccati equation for the SDRE
controller in the forward simulation, and the numeric (i.e., by perturbation) computation of FlightLab
Jacobians in the backward simulation. In order to approach real-time feasibility, we consider possible
trade-offs between computational effort and controller performance through a number of successive
simplifications:
1. The SDRE Jacobian is approximated as
K(xk )ek
xk
K(x ) xekk , where xekk , can then be calcuk
lated analytically as in Equation 10.11.
2. Same as (1), with the additional simplification that the matrices
K(x ) are memo-ized (i.e., stored)
k
at each time step k during the first epoch, and again used for the all subsequent epochs within the
current horizon. This simplification allows us to avoid resolving the SDRE in the multiple forward
simulations.
3. Same as (2), with the addition that the Jacobians
14
f (xk uk )
xk
;
and
f (xk uk )
uk
;
(which must be calcu-
lated by perturbation of FlightLab) are memo-ized during the first epoch and again used for all
subsequent epochs within the current horizon. This assumes the aircraft trajectory and thus the
Jacobians are not changing significantly during training within the same horizon from epoch to
epoch.
In the following section, we will provide experimental results to illustrate the performance of the
MPNC approach, including the computational versus performance trade-offs associated with these simplifications.
10.4. EXPERIMENTAL RESULTS
Figure 10.3a and b shows a test trajectory for the helicopter (vertical rise, forward flight, left turn, u-turn,
forward flight to hover). The figure compares tracking performance at a velocity of 6.0
m=s for the
MPNC system (SDRE+NN) versus a pure SDRE controller (MPNC settings are: horizon = 10 , update
interval = 2, training epochs = 10, sampling time = 0:097 sec,
= 0:7). (Note that a standard LQ
controller based on linearization exhibits loss of tracking in executing this maneuver and crashes for
velocities above 3.0
m=s.) The smaller tracking error for the MPNC controller is apparent (note the
reduced overshooting and oscillations at mode transitions). The total normalized accumulated cost 10 for
the MPNC is 49:25 in comparison to 100:00 for the SDRE controller.
Table 10.1 illustrates the trade-offs between computational effort and control performance (accumulated cost) for the simplifications discussed in the previous section. Clearly, substantial speed-up in CPU
time can be achieved with only a minor loss of performance. The actual CPU times should only be
viewed as indicative of relative requirements. The experiments were performed in MATLAB (with a C
module for the vehicle model generated by FlightLab) and were not optimized for efficient implementation or hardware consideration. Note that with all simplifications the MPNC control still achieves a
substantial performance improvement over the standard SDRE controller. For all subsequent simulations
we use simplification level 3.
Table 10.2 summarizes comparisons of the accumulated cost with respect to variation of the horizon
length and MPNC update interval, number of neurons in the hidden layer, and omitting the use of the
10
u
(
For comparisons, we specify the normalized accumulated cost:
over )T
over
sat k .
k
R u
15
1
tfinal t0
Ptk t eTk Qek
final
=0
+
uTk Ruk
+
SDRE test trajectory
9
MPNC test trajectory
8
8
7
10
60
9
7
11
55
6
40
5
12
10
50
Z, m.
Z, m.
50
11
45
5
40
35
4
30
4
30
3
3
70
60
60
2
50
2
50
60
40
30
1
20
60
40
50
30
40
40
1
30
20
20
10
Y, m.
0
0
6
20
10
Y, m.
X, m.
10
0
a)
0
X, m.
b)
Figure 10.3: Test trajectory: a) SDRE trajectory , cost = 100:00. b) MPNC trajectory, cost = 49:25.
Table 10.1: Simplification levels vs. CPU time and performance costs (Pentium-3 750 MHz,
Linux). Note that the maneuver corresponds to 40 seconds in actual time.
Simplification level
Accurate
1
2
3
cost-to-go,
V
(xt+N ).
Cost
49.25
51.66
51.49
52.83
CPU time (sec)
28882
3048
1793
694
The weights of the NN are trained for 10 epochs for each horizon (number of
simulated trajectories). Missing data in the table corresponds to a case where states and control inputs
exceeded the envelope of the FlightLab model consistency. Results indicate that a sufficient horizon
length is between 10 to 25 time steps (1-2.5 seconds). The importance of the cost-to-go function is
apparent for short horizon lengths. On the other hand, inclusion of the cost-to-go does not appear to help
for longer horizons. Overall, significant performance improvement is clearly achieved with the MPNC
controller relative to the pure SDRE controller.
10.4.1. RAPID TAKE-OFF AND LANDING
The next example illustrates performance of a rapid take-off, side slip, and landing. The tracking ability
for this aggressive maneuver is illustrated in Figure 10.4a and b. While in the static figure the SDRE
16
Table 10.2: Performance cost comparisons for various MPNC options
Neurons in hidden layer
Horizon / update interval
MPNC cost with V (xt+N )
MPNC cost w/o V (xt+N )
SDRE cost
5/1
69.27
50 neurons
10/2
25/5
74.55 64.84
126.34 69.39
50/10
5/1
70.74 75.84
73.54
100.00
200 neurons
10/2
25/5
52.83 54.75
62.72
50/10
68.15
69.60
trajectory appears very accurate, the actual trajectory of the controlled aircraft lags the desired trajectory
in time. In contrast, the MPNC controlled helicopter maintains the desired velocity of approximately 15
m=s throughout the flight. Note that this maneuver involves a number of “mode-transitions” (take-off,
vertical rise to side slip, landing, etc.) that would have required a combination of different controllers
and appropriate gain scheduling using more traditional approaches.
During the landing, the FlightLab model includes aerodynamic ground effects and provides information on landing gear (LG) forces and compression. To achieve a smooth landing, we simply add a
quadratic cost associated with deviations from a desired smooth force curve. Referring to Figure 10.1, the
NN is provided with two additional inputs corresponding to the LG forces (as output from the helicopter
model) and the LG force deviation vector. Then in Figure 10.2, an additional Lagrange multiplier vector
is propagated backward through the vehicle and NN Jacobians. Note that such additional constraints are
not possible with the SDRE controller using the analytic 6DOF model (which does not include ground
effects, force dynamics, or the loss of degrees of freedom at touch down, and hence can not be directly
optimized for landing). Figure 10.4c and d shows the desired and actual the LG forces (3 points of
contact), indicating a much smoother landing for the MPNC controller.
17
SDRE vertical take−off and landing
MPNC vertical take−off and landing
35
5
30
Z, m.
Z, m.
15
8
2
5
1
0
5 0
−5
−90
−70
−80
Y, m.
−60
−50
−40
−30
3
25
7
3
10
5
30
25
20
4
35
6
4
−20
−10
6
20
7
15
8
2
10
0
5
1
0
5 0
−5
X, m.
−90
−80
−70
Landing gear force, N.
x 10
−1
−1.5
−2
−2.5
−2
−2.5
24
26
28
30
time, s.
32
34
36
SDRE landing
0.15
−3
22
38
Landing gear compression, m.
Landing gear force, N.
−1.5
Landing gear compression, m.
0
−0.5
−1
24
26
0.05
28
30
time, s.
32
34
36
38
34
36
38
MPNC landing
0.15
0.1
0
22
0
MPNC landing
4
−0.5
−3
22
−10
−20
b)
SDRE landing
4
x 10
−50
−30
X, m.
Y, m.
a)
0
−60
−40
0.1
0.05
24
26
28
30
time, s.
32
34
36
38
0
22
24
26
28
30
time, s.
32
Figure 10.4: Take-off and landing: a) SDRE, ost = 2254:90 b) MPNC, ost = 158:88. (desired
trajectory is plotted for comparison). Landing trajectories: c) SDRE, d) MPNC. Desired forces
at landing are plotted as downward triangles for the tail LG and upward triangles for the left and
right LG. Dashed line is the actual force readings at the tail LG, solid lines are for the left and
right LG.
18
10.4.2. ELLIPTIC MANEUVER
In this example, we execute an extremely difficult “elliptic” maneuver 11 , consisting of (hover to) straight
flight at 22.8 m=s while performing a constant yaw rotation of 120 deg=se . Trajectories are illustrated
for the MPNC and SDRE controllers in Figure 10.5. Note the tight execution performance of the MPNC
which has a total cost of 14.43 versus 267.28 for the SDRE. The SDRE had a maximum yaw lag error
of up to 84 degrees. In contrast, the MPNC had a maximum yaw error of only 12 degrees.
SDRE elliptic maneuver
350
300
Z, m.
250
60
200
40
150
100
0
−20
−40
Y, m. −60
50
X, m.
0
a)
MPNC elliptic maneuver
350
300
250
Z, m.
200
150
70
60
20
Y, m.
100
50
0
−20
0
X, m.
b)
Figure 10.5: Elliptic maneuver: 22.8 m/s straight flight, 120 deg/sec yaw rate. a) SDRE, b)
MPNC.
10.4.3. WIND DISTURBANCE
Figure 10.6 shows the effect of an artificial wind shear ( 15 m=s) on the helicopter trajectory ( 3 m=s
vertical rise). The displacement with the MPNC (< 1.9
7.5
11
m).
m) is noticeably improved over the SDRE (<
In addition to the simple wind shear experiment, output from a large-eddy simulation (LES)
The mass of the helicopter was reduced to
M a = 4999 lbs.
19
MPNC
55
55
50
50
45
45
Z, m.
Z, m.
SDRE
40
40
35
35
30
30
25
25
10 5
0 −5
Y, m. −10
−15
−10
−5
5
0
X, m.
10
15
a)
10 5
0 −5
Y, m. −10
−15
−10
0
−5
5
X, m.
10
15
b)
Figure 10.6: Vertical lift through wind shear. a) SDRE, cost = 33.24 b) MPNC, cost = 16.40
model was used as forcing for the external wind server. This model [25] was specifically designed to
examine small-scale atmospheric flows, especially those involving cumulus convection, entrainment, and
turbulence. In this instance, the LES was used to simulate a microburst downdraft of the type associated
with strong convective storms. The modeled domain enclosed an area of approximately 8 km x 8 km
horizontally, and 4.2 km vertically. The grid spacing was 60 m in all directions. In the core downdraft,
vertical velocities reached a magnitude of over 16 m=s. Upon impacting the surface, the downdraft
spread out radially, forming a gust front—an area of strong turbulence which also includes substantial
vertical and horizontal wind shear. In addition to the resolved wind fields, the LES also predicts subgridscale turbulent kinetic energy, which is decomposed into a rapidly evolving small-scale turbulence field.
This small scale turbulence was defined using continuous functions over a series of length scales ranging
from the resolved grid scale down to viscous scales (i.e., 1 m), and it varied at time scales on the
order of 10 seconds (largest scale subgrid turbulence) down to less than a second (smallest scale subgrid
turbulence). Due to the nature of turbulence within the inertial subrange, the majority of the turbulent
kinetic energy resides in the larger scales. Features of the wind field are displayed in Figure 10.7.
The helicopter flight path (straight trajectory at 24 m=s) took it through the gust front (with winds
exceeding 18 m=s). The relative performance and improvement with the MPNC controller is illustrated
20
in Figure 10.8. Note that for this simulation, the MPNC uses the resolved wind field for prediction and
optimization, while the smaller-scale turbulence is assumed unknown.
In general, we consider 4 possible scenarios for how to approximate the wind flow with the MPNC
design:
1. MPNC trained with no information available about wind.
2. MPNC trained using constant wind fields as measured from the start of each horizon.
3. MPNC trained using resolved wind field (turbulence is assumed unknown and neglected for training purposes)
4. MPNC trained using knowledge of both resolved and actual turbulent wind flow (ideal case).
Table 10.3 summarizes the different performance using the experimental flight path as before. It is clearly
seen that with even partial wind flow information the performance is greatly increased. However, perfect
knowledge of the wind flow (ideal case 4) does not provide significant improvement over using just the
constant wind approximation (case 2) or the resolved wind field (case 3). We speculate that advantages
of the full wind prediction might be achieved using a larger NN with more training epochs. In practice,
the most realistic approximation is to assume a constant wind velocity over the horizon length (1 to 2
seconds), which could be measured using on-board sensors.
Table 10.3: Cost comparisons for wind flow incorporation.
MPNC training scenario
1
2
3
4
SDRE
21
Aircraft speed
18 m=s 24 m=s
49.46
178.02
28.32
57.16
26.51
38.61
24.23
48.75
111.49 162.69
2000
1500
8
1000
2
0.25 0.50
4
6
z
(m)
0
-2
-4
4
10
6
-6
0.75
8
1.00
10
500
22 20
0
0
500
1000
x (m)
12
14
16
18
1500
-8
0.25
8
0.75
6
2000 0
500
x
1000
(m)
1500
2000 0
500
x
1000
(m)
1500
2000
Figure 10.7: x-z cross sections focusing on the gust front formed in conjunction with a microburst
downdraft. The panels display the resolved x and z velocity components in m=s (left and center,
respectively), and subgrid-scale turbulent kinetic energy in m2 =s2 (right). The subgrid-scale
turbulent kinetic energy is decomposed into a high-resolution turbulent velocity field which is
summed with the resolved velocities.
SDRE
3600
3500
3400
Z, m.
3300
200
180
X, m.
3200
3100
−3640
−3660
a)
3000
Y, m.
MPNC
3600
3500
3400
Z, m.
3300
200
180
−3650
−3670
X, m.
3200
3100
3000
b)
Y, m.
Figure 10.8: Trajectory through the gust front. a) SDRE, b) MPNC.
22
10.5. CONCLUSIONS
In this paper, we have presented a new approach to receding horizon MPC based on a NN feedback controller in combination with an SDRE controller. The approach exploits both a sophisticated numerical
model of the vehicle (FlightLab) and its analytical nonlinear approximation (6DOF model). The NN is
optimized using properties of the full FlightLab simulator to minimize the MPC cost, while the SDRE
controller is designed using the approximate model and provides a baseline stabilizing control trajectory.
In addition, we considered a number of simplifications in order to improve the computational requirements of the approach. Overall, results verify the superior performance of the approach over traditional
SDRE (and LQ) control. Future work includes determination of optimal settings for the horizon length
and update intervals, influence of training epochs on the NN, and the effects of modeling errors and
disturbances. It should be noted that the processing power necessary to implement the FlightLab model
currently precludes the use of this approach in real-time. However, with ever increasing CPU speeds and
on-board resources, we feel this approach will be viable in the near future. In the interim, our plans are
to investigate an alternative model designed specifically for small-helicopters that can be run in real-time
[8] (see also Chapter 15 of this text). Using this model, we are working towards a field demonstration
using a model RC helicopter, as well as software integration and algorithm optimization on the Open
Control Platform (OCP) described in this book.
10.6. Acknowledgements
This work was sponsored by DARPA under SEC grant F33615-98-C-3516. The authors would also like
to thank Andy Moran for his programming assistance in the early phases of this research.
10.7. References
[1] Flightlab release note - version 2.8.4. Advanced Rotorcraft Technology, Inc., 1999.
[2] G. Balas, I. Fialho, A. Packard, J. Renfrow, and C. Mullaney. On the design of lpv controllers for
the f-14 aircraft lateraldirectional axis during powered approach. In American Control Conference,
1997.
23
[3] A. A. Bogdanov, E. A. Wan, M. Carlsson, Y. Zhang, R. Kieburtz, and A. Baptista. Model predictive
neural control of a high fidelity helicopter model. In AIAA Guidance Navigation and Control
Conference, Montreal, Quebec, Canada, August 2001.
[4] A. Calise and Rysdyk. Nonlinear adaptive flight control using neural networks. In IEEE Control
System Magazine, volume 18, No. 6, December 1998.
[5] J. R. Cloutier, C. N. D’Souza, and C. P. Mracek. Nonlinear regulation and nonlinear H-infinity control via the state-dependent Riccati equation technique: Part1, Theory. In Proc. of the International
Conf. on Nonlinear Problems in Aviation and Aerospace, FL, May 1996.
[6] J. R. Cloutier, C. N. D’Souza, and C. P. Mracek. Nonlinear regulation and nonlinear H-infinity
control via the state-dependent Riccati equation technique: Part2, Examples. In Proceedings of the
International Conference on Nonlinear Problems in Aviation and Aerospace, Daytona Beach, FL,
May 1996.
[7] G. F. Franklin, J. D. Powell, and M. L. Workman. Digital Control of Dynamic Systems. AddisonWesley, Reading, MA, second edition, 1990.
[8] V. Gavrilets, B. Mettler, and E. Feron. Nonlinear model for a small-size acrobatic helicopter. In
AIAA Guidance Navigation and Control Conference, Montreal, Quebec, Canada, August 2001.
[9] R. A. Jacobs. Increasing rates of convergence through learning rate adaptation. Neural Networks,
1(4):295–307, 1988.
[10] A. Jadbabaie, J. Yu, and J. Hauser. Stabilizing receding horizon control of nonlinear systems: a
control Lyapunov function approach. In Proceedings of American Control Conference, 1999.
[11] A. Jadbabaie, J. Yu, and J. Hauser. Unconstrained receding horizon control of nonlinear systems.
In Proceedings of IEEE Conference on Decision and Control, 1999.
[12] E. Johnson, A. Calise, R. Rysdyk, and H. El-Shirbiny. Feedback linearization with neural network
augmentation applied to x-33 attitude control. In Proceedings of the AIAA Guidance, Navigation,
and Control Conference, 2000.
24
[13] M. K. Kalnay and W. Baker. Global numerical weather prediction at the national meteorological
center. In Bull. Amer. Meteor. Soc., volume 71, pages 1410–1428, 1990.
[14] M. Kanamitsu, J. Alpert, K. Campana, P. Caplan, D. Deaven, M. Iredell, B. Katz, H.-L. Pan, J. Sela,
and G. White. Recent changes implemented into the global forecast system at nmc. In Weather
and Forecasting, volume 6, pages 425–435, 1991.
[15] E. S. Meadow and J. B. Rawlings. Nonlinear Process Control, chapter Model predictive control.
PHALL, 1997.
[16] D. Peters and C. He. Finite state induced flow models part II, three dimensional rotor disk. Journal
of Aircraft, 32(2), March-April 1995.
[17] S. W. Piche, B. Sayyar-Rodsari, D. Johnson, and M. Gerules. Nonlinear model predictive control
using neural networks. IEEE Control Systems Magazine, 20(3), 2000.
[18] S. J. Qin and T. A. Badgwell. An overview of industrial model predictive control technology.
Chemical Process Control - AIChE Symposium Series, pages 232–256, 1997.
[19] J. Shamma and J. Cloutier. Gain-scheduled missile autopilot design using linear parameter varying
transformations. In AIAA J. on Guidance, Control an Dynamics, pages 16(2):256–263, 1993.
[20] A. Shapiro, L. Zhao, S. Weygandt, K. Brewster, S. lazarus, and K. Droegemeir. Initial forecast
fields from single-doppler wind retrieval, thermodynamic retrieval and adas. In 11th Conf. On
Numerical Wethaer Prediction, Amer. Meteor. Soc, pages 119–121, Norfolk, VA, 1996.
[21] M. Sznaizer, J. Cloutier, R. Hull, D. Jacques, and C. Mracek. Receding horizon control Lyapunov
function approach to suboptimal regulation of nonlinear systems. The Journal of Guidance, Control, and Dynamics, 23(3):399–405, May-June 2000.
[22] E. A. Wan and A. A. Bogdanov. Model predictive neural control with applications to a 6 DoF
helicopter model. In Proc. of IEEE American Control Conference, Arlington, VA, June 2001.
[23] F. Wu, A. Packard, and G. Balas. Lpv control design for pitch-axis missile autopilots. In Proc. 34th
IEEE Conf. on Decision and Control, pages 53–6, New Orleans, LA, 1995.
25
[24] M. Xue, K. K. Droegemeier, and V. Wong. The advanced regional prediction system (arps) - a
multiscale nonhydrostatic atmospheric simulation and prediction tool. part i: Model dynamics and
verification. In Meteor. Atmos. Physics, volume 75, 2000.
[25] M. A. Zulauf. Modeling the Effects of Boundary Layer Circulations Generated by Cumulus Convection and Leads on Large-Scale Surface Fluxes, Ph.D. thesis, University of Utah. Salt Lake City,
UT 84112, 2001.
26