Academia.eduAcademia.edu

Model Predictive Neural Control for Aggressive Helicopter Maneuvers

2002

this paper we consider general multi-input-multi-output (MIMO) nonlinear systems with tracking error costs of the form L k ## k ; # k ### k ## k # # k ## k ### k (10.2) where # k # # k # # k , with corresponding to a desired reference state trajectory. The last term assesses a penalty for

Model Predictive Neural Control for Aggressive Helicopter Maneuvers Eric A. Wan, Alexander A. Bogdanov, Richard Kieburtz Antonio Baptista, Magnus Carlsson, Yinglong Zhang, and Mike Zulauf OGI School of Science and Engineering, OHSU 20000 NW Walker Rd, Beaverton, Oregon 97006 Editor’s Summary This chapter shares with Chapter 9 the adoption of a model predictive control (MPC) framework for flight control applications, but the details differ substantially. In particular, the control feedback in this case is a superposition of a neural-network-based nonlinear mapping and a nonlinear state-dependent Riccati equation (SDRE) controller. The neural network is optimized (trained) online for high performance using a high-fidelity dynamic simulation model of the vehicle. The SDRE controller design, repeated at every sample time, provides initial local asymptotic stability. The relative contributions of each controller vary depending on the training error of the neural network. The application considered is maneuver control of autonomous helicopters. The controller is multivariable with five actuator command outputs. Simulation results for a variety of maneuvers are presented, including rapid take-off and landing and a difficult ”elliptic” maneuver. The authors also incorporate wind effects in the optimization. Instead of the conventional tactic of treating environmental conditions as a random disturbance, this allows control commands to be optimized for localized wind flows. This approach relies on wind predictions over the optimization horizon; sensors and models to support such predictions at the appropriate scale and resolution are the object of intense research today. Both the SDRE design and the neural network training are computationally demanding. The authors consider possible tradeoffs between computational effort and controller performance in order to approach real-time feasibility. Various approximations to some of the especially complex calculations are considered. Computing time and control performance data are presented, comparing different neural 1 network complexities, optimization horizons, and update intervals. Substantial CPU time speedup can be achieved with minor loss of performance. The autonomous helicopter control theme is continued in Chapter 12 where also neural networks feature in the solution approach. 10.1. INTRODUCTION Advances in technology and modeling in commercially available software makes possible highly accurate simulations of aircraft and their environmental interactions. With increased on-board computational resources, this allows for the design of sophisticated nonlinear controllers that exploit these simulations on-line, in order to achieve high-performance autonomous control of vehicles capable of rapid adaptation and aggressive maneuvering. In this chapter, we describe a method for helicopter control through the use of the FlightLab simulator [1] coupled with nonlinear control techniques. FlightLab is a commercial software product developed by Advanced Rotorcraft Technologies. Details on the modeling capabilities of FlightLab are given in Section 10.2.2. A challenge in using FlightLab and similar flight simulators to design controllers, is that the governing dynamic equations are not readily available (i.e., the aircraft represents a ‘black-box’ model). This precludes the use of most traditional nonlinear control approaches that require an analytic model. Our methodology is based on the model predictive control (MPC) approach [18, 15]. MPC is an optimization based framework for learning a stabilizing control sequence that minimizes a specified cost function. For general nonlinear systems, this requires a numerical optimization procedure involving iterative forward (and backward) simulations of the model dynamics. The resulting control sequence represents an ’open-loop’ control law, which can then be re-optimized on-line at periodic update intervals to improve robustness. Our approach to MPC, referred to as model predictive neural control (MPNC), utilizes a combination of a state-dependent Riccati equation (SDRE) controller and an optimal neural controller. In contrast to traditional MPC, the architecture implements an explicit feedback controller. The SDRE technique [5, 6] is an improvement over traditional linearization based Linear Quadratic (LQ) controllers (SDRE control will be elaborated on in a later section). SDRE design, however, requires an analytic representation of an aircraft model. To provide an analytic representation, the numeric simulator model is approximated by a six-degree of freedom (6DOF) rigid body dynamic model, providing a set of governing equations at each 2 time instant necessary to design the SDRE. In our framework, the SDRE controller provides an initial stabilizing controller and it is then augmented by a neural network (NN) controller. The NN controller is optimized on-line using a calculus of variations approach to minimize the MPC cost function. Note that this differs from either the use of a NN for system identification of the nonlinear plant as part of a traditional MPC approach (see [17]), or the use of a NN for error feedback to account for model uncertainty (see [4, 12]). We also make use of the SDRE solution to provide a control Lyapunov function (CLF) for use in a receding horizon approach to MPC. In addition, we explore a number of numeric approximations in order to improve computational performance of the approach. The basic framework of our approach has been described in [22, 3]. 10.2. MPC CONTROL The general MPC optimization problem involves minimizing a cost function X tfinal Jt = xk uk ) Lk ( k=t ; ; x u which represents the accumulated cost of the sequence of states k and controls k from the current discrete time t to the final time, tfinal . For regulation problems tfinal = 1. Optimization is done with respect to the control sequence subject to constraints of the system dynamics, xk+1 x u = f (xk uk ) ; : (10.1) x Qxk + uTk Ruk corresponds to the standard Linear Quadratic As an example, choosing Lk ( k ; k ) = Tk cost. For linear systems, this leads to linear state-feedback control, which is found by solving a Riccati equation [7]. More general costs allow for inequality constraints on state and control, minimum time, minimum control (fuel), etc. In this paper we consider general multi-input-multi-output (MIMO) nonlinear systems with tracking error costs of the form Lk ( T over xk uk ) = eTk Qek + uTk Ruk + (uover k ) Rsat uk ; 3 (10.2) where ek = xdes k , with corresponding to a desired reference state trajectory. The last term assesses xk a penalty for exceeding control saturation level usat , where each element (j uover k is defined as of the vector 8 sat < over = 0, if j k j  j k : k sat j sign( k ), otherwise. u u = 1; : : : ; m) j u j u j u u j t In general, a numerical optimization approach is used to solve for the sequence of controls, fu k gtfinal arg min Jt , = corresponding to an open-loop control law, which can then be re-optimized on-line at pe- riodic update intervals. The complexity of the approach is a function of the final time t final , which determines the length of the optimal control sequence. In practice, we can reduce the number of computations by taking a Receding Horizon (RH) approach, in which optimization is performed over a shorter fixed length time interval. This is accomplished by rewriting the cost function as J where the last term V (xt+N ) t = t+X N 1 k=t k (xk ; uk ) + V (xt+N ) (10.4) L denotes the cost-to-go from time t + N to time t final . Note that for trajectory following, the cost-to-go is also implicitly a function of the given desired trajectory x des t+N through xdes tfinal (and resulting error). The advantage is that this yields an optimization problem of fixed length N . In practice, the true value of V (xt+N ) is unknown, and must be approximated. Most common is to simply set V (x t+N ) = 0; however, this may lead to reduced stability and poor performance for short horizon lengths [10]. Alternatively, we may include a control Lyapunov function (CLF), which guarantees stability if the CLF is an upper bound on the cost-to-go, and results in a region of attraction for the MPC of at least that of the CLF [10]. 10.2.1. HELICOPTER MODELING AND DESIGN CONSIDERATIONS 10.2.2. FlightLab Helicopter Model The FlightLab package provides a simulation tool for multidisciplinary concurrent engineering. Consisting of a number of modular and reconfigurable modeling primitives (called ”components”), it uses a high-level interpretive language (Scope) for rapid prototyping. With these reconfigurable components, the user is able to construct a model of selective fidelity, as well as create or link custom components. 4 Numerical integration of dynamics is accomplished through iterative techniques (e.g., Newton-Raphson) to solve systems of equations, using a selective compartmental approach for integrating subsystem components. FlightLab also has a number of GUI tool, such as editors for developing models and control systems, and a dynamic system analyzer. It allows real-time interfacing with external hardware or software components. One of the outstanding features of the FlightLab is its ability to accurately model the main rotor and the interaction between it and other components, such as the tail rotor (through the use of empirical formulas), external wind (e.g., through an external wind server), ground effects (including both in- and out-of-ground effects with an image method, a free-wake model, and a horse-shoe ground vortex model), etc. It also includes the modeling of dynamic stall, transonic flow, aeroelastic response, vortex wake, blade element aerodynamics, and can also provide finite element structure analysis. For our research purposes, we generated a high-fidelity helicopter model having a rigid fuselage with empirical airloads, and elastic blades with quasi-unsteady airloads 1 , 3-state inflow2 , and direct control of the swashplate angles. The model is presented as a numerical discrete-time nonlinear system with 92 internal state variables. 10.2.3. Control Specifications For helicopter control, we define the state vector xk to correspond to the standard states of a 6DOF rigid body model. This 12 dimensional state vector consists of Cartesian coordinates in the inertial frame x; y; z , Euler angles ; ;  (yaw, roll, pitch), linear velocities u; v; w and angular velocities p; q; r in the body coordinate frame. Technically, this represents a reduced state-space, as our FlightLab model utilizes a total of 92 internal state variables (e.g., main and tail rotor states). However, we treat these as both unobservable and uncontrollable for purposes of deriving the controller. There are 4 control inputs, 0 ; 1C ; 1S ; T 0 , corresponding to the main collective, lateral cyclic, longitudinal cyclic, and tail collective controls (incident angles of rotor blades) 3 . 1 Quasi-unsteady airloads incorporates both theory and table look-up for the calculation of airloads for the effective angle of attack, side-slip angle and Mach number; certain dynamics (e.g., dynamic stall) are simplified or neglected. 2 The general finite state inflow model allows for the variation of the rotor induced flow with arbitrary harmonics azimuthally and an arbitrary order of polynomial radially. The 3-state model is truncated from the higher order finite state inflow model. Its first state is for uniform inflow (both azimuthally and radially), the 2nd state is the 1st cosine harmonics of azimuthal variation with linear radial variation, the 3rd state is for the 1st sine harmonics of azimuthal variation with the similar linear radial variation [16]. 3 We set control constraints as follows: Max.  0 = 20 deg., Max.  1C = 30 deg., Max.  1S = 32 deg., Max. T 0 = 39 deg. 5 The tracking error for the helicopter, e k = x x k des k , is determined by the trajectory of a reference target (provided by a mission planner for higher-level control coordination). The target specifies desired coordinates, velocities, and attitude in the inertial frame. The reference state is then projected into the body frame to produce the desired state, x where des k = T(x )x tar k k ; (10.5) T(x ) is an appropriate projection matrix consisting of necessary rotation operators: k 0 BB c c BB s BB B 0 T(x) = B BB 0 BB B s c with s(), c() denoting sin and s s + c cs 0 0 c c 0 0 c s 0 1 0 0 0 0 1 0 0 0 c s + s s c s c  + c s  s ; c c + ss s O7;5 os, x k 1 C C C C C O5 7 C C C C C C C C A I7;7 u; w; q; ; v; p; ; r; ; x; y; z ) . Minimization of the tracking T =( error causes the helicopter to move in the direction of the target motion. 10.3. MPNC Difficulties with application of traditional MPC include a) the need for a ‘good’ initial sequence of controls that is capable of stabilizing the model, and b) the need to re-optimize at short intervals to avoid problems associated with open-loop control (e.g, lack of robustness to model uncertainty or disturbances). We address these issues by directly implementing a feedback controller as a combination of an SDRE stabilizing controller and a neural controller, u where 0 < k =  Nnet(x ; e ; w) + (1 k k )  K(x )e k k (10.6) < 1 is a relative weighting constant. A standard multi-layer feedforward neural network is used for its universal mapping capabilities (not for its biological motivation). The SDRE controller K(x )e k k provides a robust stabilizing control, while the weights of the neural network, 6 w, are optimized to minimize the overall receding horizon MPC cost. Note that this control structure is a function of the state xk (and tracking error ek ), providing explicit feedback. As stated earlier, the SDRE controller requires a set of governing equations, which are not available in FlightLab. Thus we derive a 6DOF rigid body model as an analytical approximation. The NN controller, however, is designed using the full flight simulator model. The NN is trained on-line, and is updated for each horizon interval. For this, the SDRE solution and the 6DOF model are again utilized in a number of different ways in order to implement terms used for minimizing the MPC cost. Design of the MPNC controller, which includes the SDRE and neural controller, is detailed in the following subsections. 10.3.1. SDRE CONTROLLER Referring to the system state-space Equation 10.1, an SDRE controller [5] is designed by reformulating f (xk ; uk ) as f (xk ; uk ) = (xk )xk + (xk )uk : This representation is not a linearization. To illustrate the principle, consider a simple scalar example, xk sin x +1 = sin xk + xk os xk uk . A valid state-space representation is then (xk ) = xk k and (xk ) = os xk . xk Based on the new state-space representation, we design an optimal LQ controller to track the desired 4 state xdes k . This leads to the nonlinear controller , 1 T (x )P(x )(x usd xdes k k k k = R k )  K(xk )ek ; where P(xk ) is a solution of the standard Riccati Equations using state-dependent matrices (x k ) and (xk ), which are treated as being constant. The procedure is repeated at every time step at the current state xk and provides local asymptotic stability of the plant [5]. In practice, the approach has been found to be far more robust than LQ controllers based on standard linearization techniques. See [5] for a discussion on the class of dynamic equations that can be presented in a state-dependent form. u Kx e tr In the case where the SDRE is used as a stand-alone controller, we formulate sd ( k ) k + utr k = k , where uk is scheduled to compensate for steady state errors related to various trim conditions. Alternatively, way may also include an integral control in the SDRE framework. Note that when the full MPNC control is used, the trim control is accounted for automatically through the optimization of the NN. 4 7 Dynamic equations for the FlightLab helicopter model are not available. Thus we use the simplified dynamics given by a 6DOF rigid body model, u_ = (wq vr ) v_ = (ur wp) + g os  sin  + Fy =Ma _ = p + q sin  tan  + r os  tan  w_ = (vp uq ) + g os  os  + Fz =Ma _ = q os  Ixx p_ = (Iyy Iyy q_ = (Izz where Rot1 ( g sin  + Fx =Ma Izz r_ = (Ixx Izz )qr + Ixz (r_ + pq ) + L Ixx )rp + Ixz (r 2  p2 ) + M _ T x_ y_ z_ Iyy )pq + Ixz (p_ qr ) + N r sin  q sin  se  + r os  se  = Rot1( = ; ;  )  u v w T ; ;  ) is a rotation matrix (coordinate transformation) from body frame to inertial frame, Ma is the aircraft mass, Ixx , Iyy , Izz are moments of inertia, Ixz is the product inertia, and Fx ; Fy ; Fz ; L; M; N are rotor-induced forces and moments. The forces and moments are nonlinear functions of helicopter states and control inputs. We then rewrite this into a state-dependent continuous canonical representation x_ = A(x)x + B(x)u. The matrix A(x) is given explicitly as, 0 0 q=2 B q= 2 0 B B 0 0 B B 0 0 B B r=2 p=2 B B 0 B 00 0 A(x) = B B B 0 0 B B 0 0 B s s+ B c c B + c c s  B B B s c c s+ +ss c 0 a2 p + a3 r s tan  a4 p + a5 r where s(), c() denote and s 0 sin r= p= g c 1 0 0 0 0 0 s=c 2 2 0 0 0 0 0 0 0 s c+ g s 0 0 0 0 0 0 0 0 c c c os, a0 = 0 0 , a1 2 w= a2 q I 0 0 g c s a4 q 0 0 0 0 0 0 0 0 0 0 0 1 s s c c+ +sss c s  2 2 v= g c c 1 a0 r a1 p +c Izz Ixx Iyy 0 = Ixz , a2 = yy 1 g=z C C 0 C C 0 C C 0 C C 0 C C 0 C C 0 C CC 0 C 0 C CC C 0 0 0 0 C A 2 0 0 0 0 0 0 0 a0 p + a1 r 0 0 0 s 0 0 0 u=2 0 0 0 a3 q 0 0 0 c tan  0 0 0 a5 q 0 0 0 c=c 0 0 0 0 0 0 0 v= 0 0 0 0 0 0 0 2 I2 Ixz (Ixx Iyy +Izz ) Iyy Izz Izz xz 2 ) , a3 = 2(Ixx Izz I 2 ) , 2(Ixx Izz Ixz xz ( Ixx+Iyy Izz ) , x = (u; w; q; ; v; p; ; r; ; x; y; z )T . (x ) is then obtained 2 ) k 2(IxxIzz Ixz from A(x) by discretization at each time step (i.e., (x k ) = eA(xk )t ) 5 . a4 = + ,a = ) 5 2 2 w= u= 2 2 Ixx Iyy Ixx Ixz 2 Ixx Izz Ixz 2( Ixz Explicit analytic equations are unavailable for the rotor-induced forces and moments, which would relate the 6DOF model to the FlightLab model. These terms are highly nonlinear and include vehiclespecific aerodynamic look-up tables. Thus B(x) cannot be specified in closed-form. Therefore, we Parameters settings to match FlightLab are M a = 16308 lbs, Ixx = 9969 lbs  ft2 , Iyy = 44493 lbs  ft2 , Izz = 44265 lbs  ft2 , Ixz = 1478 lbs  ft2 . 5 8 approximate x ( k) with a constant by linearizing6 the full FlightLab model with respect to the control inputs uk around the hover trim state 7 . Finally, given (xk ) and , we can design the SDRE control gain K(xk ) at each time step. Note that while () is based on the 6DOF model, the state argument x k comes directly from the FlightLab model. We have found that this mixed approach using the approximate model plus the linearized control matrix, , is far more robust that simply using a standard LQ approach based on linearization of all system matrices (see Experimental Results Section). u overT R sat u over + uTk Ru k k k ϕ k ,θ k ,ψ k sin(⋅) Neural control cos(⋅) xk u nn k Neural Network uk a 1−α ek u Target x Helicopter dynamics tar k T ( xk ) x xk sd k ek des k K (x k ) xk x k +1 Vehicle xk q -1 SDRE control V ( xt + N ) eTkQ ek Figure 10.1: MPNC signal flow diagram 10.3.2. NEURAL NETWORK CONTROLLER The neural controller is designed using the full FlightLab helicopter model and augments the SDRE controller. The overall flowgraph of the system in shown in Figure 10.1. The neural controller is specified as follows: unn = N net(u; v; w; p; q; r; z; s ; s; s; c ; c; c; e; w); where we have included sines and cosines of the yaw, pitch, and roll angles. This is motivated since the simplified helicopter dynamics depend on such trigonometric functions of the Euler angles. The 6 Linearization is performed over multiple rotor revolutions with averaging. Alternatively, we may explicitly schedule (x k ) using combinations of different trim states, as is often done with Linear Parameter Varying (LPV) approaches [23, 19, 2]. 7 9 coordinates x and y of the aircraft in the inertial frame do not influence dynamics, and are excluded as inputs (the altitude, z is included due to modeling of ground effects and air density). The NN represents an optimal feedback controller. Optimization is performed by learning the weights, w, of the NN in order to minimize the receding horizon MPC cost function (Equation 10.4) subject to the system dynamics and the composite form of overall feedback controller (Equation 10.6). The problem is solved by taking a standard calculus of variations approach, where  k and k are vectors of Lagrange multipliers in the augmented cost function J X1  t+N = t L k(  x ;u ) + x T k k k f (x ; u ) k +1 k k =t + where u nn k =  T  k u u k nn (1 k ) k Nnet(x ; e ; w). The cost-to-go V (x + k k t N V (x + + k + k  K(x )e  t N ) (10.8) ) is approximated using the solution of the SDRE at time t + N , 1n X V (x + t x x ( k des t+N ) e  o Q(x x ) + u Ru : N T  e P(x T ) t+N t+N des t+N T t+N k ) (10.9) k k k =N This CLF provides the exact cost-to-go for regulation assuming a linear system at the horizon time. A similar formulation was used for nonlinear regulation control in [21]. We can now derive the recurrent Euler-Lagrange equations   f (x ; u ) L (x ; u )  +1 + x x  Nnet(x ; e ; w)  e Nnet(x ; e ; w) + + x e x 8  k T k = T  k k k k k k k i k with t+N > > > < = > > > :  = k k    L f (xk ;uk )  uk k (xk ;uk )  uk t+ N )  xt+N V (x k k T  T  k +1 T  k k k (xk ;uk )  uk + (1 k T  )  K(x )e   x T k k (10.10) k k , if juki j  usat i L + k i , if juki j > usat i i  t+N )  xt+N V (x + t+N )  et+N  et+N  xt+N V (x 10 T ,k t N) =( + t N) 1, ( + 2, : : : ; t, and 2 ( u overT R sat + uTk R ) k cos(⋅) Neural control − sin(⋅) du nn k Neural Network Jacobian µk a du k λk +1 Helicopter Jacobians -a denn k dx nn k dx k du ksd de nn k  ∂  T ( x k ) ⋅ x tar k  de k Vehicle ∂x k λk ∂K (x k ) ek ∂x k K(xk ) q +1 SDRE Jacobian dx des k dx nn k ∂V ( x t + N ) ∂x t + N 2eTk Q Figure 10.2: Adjoint system i = 1; : : : ; m (m is the dimension of control vector u). From Equations 10.2 and 10.5, L k  (x ; u ) = L (x ; u )  e = 2e x e x k k k k k k T k k k Q I   ( ) T xk xtar k k x  (10.11) k ( ) where the partial is evaluated analytically by specifying T x k . Each element i of the gradient vector  e uk ) uk  Lk ( k ;   i =2  uTk R i , if ju j  u sat ki i . These equations correspond to an adjoint system (shown graphically in Figure 10.2), with optimality condition J w t = X1 t+N k =t  T k Nnet(x ; e ; w) = 0: w k k The overall training procedure for the NN can now be summarized as follows: 1. Simulate the system forward in time for N time steps (Figure 10.1 ). Note that the SDRE controller is updated at each time step. 2. Run the adjoint system backward in time to accumulate the Lagrange Multipliers (Figure 10.2). Jacobians are evaluated analytically or by perturbation. 3. Update the weights using gradient descent 8 , 8 w =  Jk  w. In practice we use an adaptive learning rate for each weight in the network using a procedure similar to delta-bar-delta [9]. 11 4. Repeat until an acceptable level of cost reduction is achieved, or simply for a preset number of iterations. For the first horizon, the NN weights are initialized by pre-training the NN to behave similar to the SDRE controller. The SDRE controller provides for stable tracking and good conditions for subsequent training. As training progresses, the NN decreases the tracking error. This reduces the SDRE control output, which in turn gives more authority to the neural controller. The training process is repeated at the update interval, with weights from the previous horizon used as the initialization. Stability of MPNC is closely related to that of traditional MPC. Ideally, in the case of unconstrained x optimization, stability is guaranteed provided V ( t+N ) is a CLF and is an (incremental) upper bound on the cost-to-go [10]. In this case, the minimum region of attraction of the receding horizon optimal control is determined by the CLF used and horizon length. The guaranteed region of operation contains that of the CLF controller and may be made as large as desired by increasing the optimization horizon (restricted to the infinite horizon domain) [11]. In our case, the minimum region of attraction of the receding horizon MPNC is determined by the SDRE solution used as the CLF to approximate the terminal cost. In addition, we also restrict the controls to be of the form given by Equation 10.6 and the optimization is performed with respect to the NN weights w. In theory, the universal mapping capability of neural networks implies that the stability guarantees are equivalent to that of the traditional MPC framework. However, in practice stability is affected by the chosen size of the NN (which affects the actual mapping capabilities), as well as the horizon length and update interval length (how often NN is re-optimized). This can be clearly traced in Table 10.2 (Section 10.4), where additional hidden neurons results in a higher overall performance. When the horizon is short, performance is more affected by the chosen CLF. On the other hand, when the horizon is long, performance is limited by the NN properties. An additional factor affecting stability is the specific algorithm used for numeric optimization. Gradient descent, which we use to minimize the cost function (Equation 10.8) is guaranteed to converge to only a local minimum (the cost function is not guaranteed convex with respect to the NN weights), and thus depends on the initial conditions. In addition, convergence is assured only if the learning rate is kept sufficiently small. To summarize these points, stability of the MPNC is guaranteed under certain restricted ideal conditions. In practice, the designer must select appropriate settings and perform sufficient flight experiments to assure stability and performance over a desired flight envelope. 12 10.3.3. ENGINE SPEED CONTROL Typically, engine speed is maintained by a separate throttle regulator. If this is not implemented correctly, a stall situation may arise due to an improper coupling with the main flight controller. During aggressive maneuvers, increased engine load may result in a reduced rotor speed and a loss of lift. To overcome the altitude loss, the controller reacts by increasing the collective. However, this results in even higher loads and further slowing down of the rotor, and thus further loss of lift and altitude. To prevent this, one must decrease the engine load and gain rotor speed back, while momentarily sacrificing tracking performance. In our implementation, we chose to by-pass a separate throttle regulator, and build this directly into the MPNC framework. We accomplish this by adding an additional state variable corresponding to the main rotor speed as well as a direct throttle (rotor speed) control command. For the SDRE controller, we simply augment (xk ) with an additional row and column found by linearization with respect to the rotor speed at hover. The neural network is provided with extra inputs corresponding to the main rotor speed and its error, as well as an additional throttle (rotor speed) command output. Optimization within the MPC framework remains the same. 10.3.4. INCORPORATING WIND FLOWS With the increased sophistication in atmospheric modeling and sensing, it is becoming more feasible to directly optimize for wind effects. This is in contrast to traditional approaches that often treat wind as a disturbance to be countered through feedback error correction. Modeling predictions of atmospheric conditions are well established at global scales and coarse resolution, and the subject of intense research at regional scales and fine resolution9. Global and regional predictions, however, are most supportive of mission-planning level control of autonomous vehicles. For aggressive maneuvers, data from on-board or land-based radar appears to offer the most promising path to realistically incorporating atmospheric 9 The Medium-Range Forecast Model, MRF [13, 14], maintained by the National Oceanic and Atmospheric Administration, is an example of a global model with a long track record. It covers the entire earth surface, and extends in the vertical to a layer at the pressure of 2 hPa. The horizontal grid has 512x256 cells, each roughly equivalent to 0.7 X 0.7 degree latitude/longitude. In the vertical, the grid has 42 unequally spaced sigma levels. Recent efforts in regional prediction are exemplified by the Advanced Regional Prediction System (ARPS), based on a multi-scale non-hydrostatic atmospheric model [24]. ARPS targets storm-scale and mountain scale predictions, with horizontal and vertical resolutions as fine as 1km and 100m, respectively. Applications at this level of resolution have the potential for simulation of boundary layer eddies and wind gusts. The impetus for predictions at these scales is credited to the deployment Doppler radars in the US, and of techniques for retrieving unobserved quantities from Doppler radar data to yield mass and wind fields appropriate for initialization of storm-scale prediction models [20]. 13 conditions in on-line control strategies. FlightLab has the built-in capability of incorporating some forms of winds (e.g., sinusoidal gusts, stochastic atmospheric turbulent wind etc.) into the calculation of total airloads on the rotors as well as fuselage. For our purposes, we replaced the existing atmospheric turbulent wind component (ATMTUR), with one we developed to allow inputs from an external wind server. The new component was compiled separately and linked to other components that require wind information (e.g., the aerodynamic components for main rotor etc.). This enables us to study the responses of a helicopter to controlled wind patterns (e.g., wind shear and Large-eddy Simulations (LES) - Section 10.4.3), as well as future integration with measurements from on-board or land-based radar. Given the helicopter and wind interaction model, no additional modifications are necessary for incorporation into the MPNC framework. In section 10.4.3 we report a number of simulation experiments illustrating performance trade-offs associated with how the wind is approximated. 10.3.5. COMPUTATIONAL SIMPLIFICATIONS Generally, MPC design implies multiple simulations of the system forward in time and the adjoint system backward in time. Computations scale with the number of training epochs and the horizon length. The most computationally demanding operations correspond to solving the Riccati equation for the SDRE controller in the forward simulation, and the numeric (i.e., by perturbation) computation of FlightLab Jacobians in the backward simulation. In order to approach real-time feasibility, we consider possible trade-offs between computational effort and controller performance through a number of successive simplifications: 1. The SDRE Jacobian is approximated as  K(xk )ek xk    K(x ) xekk , where xekk , can then be calcuk     lated analytically as in Equation 10.11. 2. Same as (1), with the additional simplification that the matrices K(x ) are memo-ized (i.e., stored) k at each time step k during the first epoch, and again used for the all subsequent epochs within the current horizon. This simplification allows us to avoid resolving the SDRE in the multiple forward simulations. 3. Same as (2), with the addition that the Jacobians 14  f (xk uk ) xk ;  and  f (xk uk ) uk ;  (which must be calcu- lated by perturbation of FlightLab) are memo-ized during the first epoch and again used for all subsequent epochs within the current horizon. This assumes the aircraft trajectory and thus the Jacobians are not changing significantly during training within the same horizon from epoch to epoch. In the following section, we will provide experimental results to illustrate the performance of the MPNC approach, including the computational versus performance trade-offs associated with these simplifications. 10.4. EXPERIMENTAL RESULTS Figure 10.3a and b shows a test trajectory for the helicopter (vertical rise, forward flight, left turn, u-turn, forward flight to hover). The figure compares tracking performance at a velocity of 6.0 m=s for the MPNC system (SDRE+NN) versus a pure SDRE controller (MPNC settings are: horizon = 10 , update interval = 2, training epochs = 10, sampling time = 0:097 sec, = 0:7). (Note that a standard LQ controller based on linearization exhibits loss of tracking in executing this maneuver and crashes for velocities above 3.0 m=s.) The smaller tracking error for the MPNC controller is apparent (note the reduced overshooting and oscillations at mode transitions). The total normalized accumulated cost 10 for the MPNC is 49:25 in comparison to 100:00 for the SDRE controller. Table 10.1 illustrates the trade-offs between computational effort and control performance (accumulated cost) for the simplifications discussed in the previous section. Clearly, substantial speed-up in CPU time can be achieved with only a minor loss of performance. The actual CPU times should only be viewed as indicative of relative requirements. The experiments were performed in MATLAB (with a C module for the vehicle model generated by FlightLab) and were not optimized for efficient implementation or hardware consideration. Note that with all simplifications the MPNC control still achieves a substantial performance improvement over the standard SDRE controller. For all subsequent simulations we use simplification level 3. Table 10.2 summarizes comparisons of the accumulated cost with respect to variation of the horizon length and MPNC update interval, number of neurons in the hidden layer, and omitting the use of the 10 u ( For comparisons, we specify the normalized accumulated cost: over )T over sat k . k R u 15 1 tfinal t0 Ptk t eTk Qek final =0 + uTk Ruk + SDRE test trajectory 9 MPNC test trajectory 8 8 7 10 60 9 7 11 55 6 40 5 12 10 50 Z, m. Z, m. 50 11 45 5 40 35 4 30 4 30 3 3 70 60 60 2 50 2 50 60 40 30 1 20 60 40 50 30 40 40 1 30 20 20 10 Y, m. 0 0 6 20 10 Y, m. X, m. 10 0 a) 0 X, m. b) Figure 10.3: Test trajectory: a) SDRE trajectory , cost = 100:00. b) MPNC trajectory, cost = 49:25. Table 10.1: Simplification levels vs. CPU time and performance costs (Pentium-3 750 MHz, Linux). Note that the maneuver corresponds to 40 seconds in actual time. Simplification level Accurate 1 2 3 cost-to-go, V (xt+N ). Cost 49.25 51.66 51.49 52.83 CPU time (sec) 28882 3048 1793 694 The weights of the NN are trained for 10 epochs for each horizon (number of simulated trajectories). Missing data in the table corresponds to a case where states and control inputs exceeded the envelope of the FlightLab model consistency. Results indicate that a sufficient horizon length is between 10 to 25 time steps (1-2.5 seconds). The importance of the cost-to-go function is apparent for short horizon lengths. On the other hand, inclusion of the cost-to-go does not appear to help for longer horizons. Overall, significant performance improvement is clearly achieved with the MPNC controller relative to the pure SDRE controller. 10.4.1. RAPID TAKE-OFF AND LANDING The next example illustrates performance of a rapid take-off, side slip, and landing. The tracking ability for this aggressive maneuver is illustrated in Figure 10.4a and b. While in the static figure the SDRE 16 Table 10.2: Performance cost comparisons for various MPNC options Neurons in hidden layer Horizon / update interval MPNC cost with V (xt+N ) MPNC cost w/o V (xt+N ) SDRE cost 5/1 69.27 50 neurons 10/2 25/5 74.55 64.84 126.34 69.39 50/10 5/1 70.74 75.84 73.54 100.00 200 neurons 10/2 25/5 52.83 54.75 62.72 50/10 68.15 69.60 trajectory appears very accurate, the actual trajectory of the controlled aircraft lags the desired trajectory in time. In contrast, the MPNC controlled helicopter maintains the desired velocity of approximately 15 m=s throughout the flight. Note that this maneuver involves a number of “mode-transitions” (take-off, vertical rise to side slip, landing, etc.) that would have required a combination of different controllers and appropriate gain scheduling using more traditional approaches. During the landing, the FlightLab model includes aerodynamic ground effects and provides information on landing gear (LG) forces and compression. To achieve a smooth landing, we simply add a quadratic cost associated with deviations from a desired smooth force curve. Referring to Figure 10.1, the NN is provided with two additional inputs corresponding to the LG forces (as output from the helicopter model) and the LG force deviation vector. Then in Figure 10.2, an additional Lagrange multiplier vector is propagated backward through the vehicle and NN Jacobians. Note that such additional constraints are not possible with the SDRE controller using the analytic 6DOF model (which does not include ground effects, force dynamics, or the loss of degrees of freedom at touch down, and hence can not be directly optimized for landing). Figure 10.4c and d shows the desired and actual the LG forces (3 points of contact), indicating a much smoother landing for the MPNC controller. 17 SDRE vertical take−off and landing MPNC vertical take−off and landing 35 5 30 Z, m. Z, m. 15 8 2 5 1 0 5 0 −5 −90 −70 −80 Y, m. −60 −50 −40 −30 3 25 7 3 10 5 30 25 20 4 35 6 4 −20 −10 6 20 7 15 8 2 10 0 5 1 0 5 0 −5 X, m. −90 −80 −70 Landing gear force, N. x 10 −1 −1.5 −2 −2.5 −2 −2.5 24 26 28 30 time, s. 32 34 36 SDRE landing 0.15 −3 22 38 Landing gear compression, m. Landing gear force, N. −1.5 Landing gear compression, m. 0 −0.5 −1 24 26 0.05 28 30 time, s. 32 34 36 38 34 36 38 MPNC landing 0.15 0.1 0 22 0 MPNC landing 4 −0.5 −3 22 −10 −20 b) SDRE landing 4 x 10 −50 −30 X, m. Y, m. a) 0 −60 −40 0.1 0.05 24 26 28 30 time, s. 32 34 36 38 0 22 24 26 28 30 time, s. 32 Figure 10.4: Take-off and landing: a) SDRE, ost = 2254:90 b) MPNC, ost = 158:88. (desired trajectory is plotted for comparison). Landing trajectories: c) SDRE, d) MPNC. Desired forces at landing are plotted as downward triangles for the tail LG and upward triangles for the left and right LG. Dashed line is the actual force readings at the tail LG, solid lines are for the left and right LG. 18 10.4.2. ELLIPTIC MANEUVER In this example, we execute an extremely difficult “elliptic” maneuver 11 , consisting of (hover to) straight flight at 22.8 m=s while performing a constant yaw rotation of 120 deg=se . Trajectories are illustrated for the MPNC and SDRE controllers in Figure 10.5. Note the tight execution performance of the MPNC which has a total cost of 14.43 versus 267.28 for the SDRE. The SDRE had a maximum yaw lag error of up to 84 degrees. In contrast, the MPNC had a maximum yaw error of only 12 degrees. SDRE elliptic maneuver 350 300 Z, m. 250 60 200 40 150 100 0 −20 −40 Y, m. −60 50 X, m. 0 a) MPNC elliptic maneuver 350 300 250 Z, m. 200 150 70 60 20 Y, m. 100 50 0 −20 0 X, m. b) Figure 10.5: Elliptic maneuver: 22.8 m/s straight flight, 120 deg/sec yaw rate. a) SDRE, b) MPNC. 10.4.3. WIND DISTURBANCE Figure 10.6 shows the effect of an artificial wind shear ( 15 m=s) on the helicopter trajectory ( 3 m=s vertical rise). The displacement with the MPNC (< 1.9 7.5 11 m). m) is noticeably improved over the SDRE (< In addition to the simple wind shear experiment, output from a large-eddy simulation (LES) The mass of the helicopter was reduced to M a = 4999 lbs. 19 MPNC 55 55 50 50 45 45 Z, m. Z, m. SDRE 40 40 35 35 30 30 25 25 10 5 0 −5 Y, m. −10 −15 −10 −5 5 0 X, m. 10 15 a) 10 5 0 −5 Y, m. −10 −15 −10 0 −5 5 X, m. 10 15 b) Figure 10.6: Vertical lift through wind shear. a) SDRE, cost = 33.24 b) MPNC, cost = 16.40 model was used as forcing for the external wind server. This model [25] was specifically designed to examine small-scale atmospheric flows, especially those involving cumulus convection, entrainment, and turbulence. In this instance, the LES was used to simulate a microburst downdraft of the type associated with strong convective storms. The modeled domain enclosed an area of approximately 8 km x 8 km horizontally, and 4.2 km vertically. The grid spacing was 60 m in all directions. In the core downdraft, vertical velocities reached a magnitude of over 16 m=s. Upon impacting the surface, the downdraft spread out radially, forming a gust front—an area of strong turbulence which also includes substantial vertical and horizontal wind shear. In addition to the resolved wind fields, the LES also predicts subgridscale turbulent kinetic energy, which is decomposed into a rapidly evolving small-scale turbulence field. This small scale turbulence was defined using continuous functions over a series of length scales ranging from the resolved grid scale down to viscous scales (i.e., 1 m), and it varied at time scales on the order of 10 seconds (largest scale subgrid turbulence) down to less than a second (smallest scale subgrid turbulence). Due to the nature of turbulence within the inertial subrange, the majority of the turbulent kinetic energy resides in the larger scales. Features of the wind field are displayed in Figure 10.7. The helicopter flight path (straight trajectory at 24 m=s) took it through the gust front (with winds exceeding 18 m=s). The relative performance and improvement with the MPNC controller is illustrated 20 in Figure 10.8. Note that for this simulation, the MPNC uses the resolved wind field for prediction and optimization, while the smaller-scale turbulence is assumed unknown. In general, we consider 4 possible scenarios for how to approximate the wind flow with the MPNC design: 1. MPNC trained with no information available about wind. 2. MPNC trained using constant wind fields as measured from the start of each horizon. 3. MPNC trained using resolved wind field (turbulence is assumed unknown and neglected for training purposes) 4. MPNC trained using knowledge of both resolved and actual turbulent wind flow (ideal case). Table 10.3 summarizes the different performance using the experimental flight path as before. It is clearly seen that with even partial wind flow information the performance is greatly increased. However, perfect knowledge of the wind flow (ideal case 4) does not provide significant improvement over using just the constant wind approximation (case 2) or the resolved wind field (case 3). We speculate that advantages of the full wind prediction might be achieved using a larger NN with more training epochs. In practice, the most realistic approximation is to assume a constant wind velocity over the horizon length (1 to 2 seconds), which could be measured using on-board sensors. Table 10.3: Cost comparisons for wind flow incorporation. MPNC training scenario 1 2 3 4 SDRE 21 Aircraft speed 18 m=s 24 m=s 49.46 178.02 28.32 57.16 26.51 38.61 24.23 48.75 111.49 162.69 2000 1500 8 1000 2 0.25 0.50 4 6 z (m) 0 -2 -4 4 10 6 -6 0.75 8 1.00 10 500 22 20 0 0 500 1000 x (m) 12 14 16 18 1500 -8 0.25 8 0.75 6 2000 0 500 x 1000 (m) 1500 2000 0 500 x 1000 (m) 1500 2000 Figure 10.7: x-z cross sections focusing on the gust front formed in conjunction with a microburst downdraft. The panels display the resolved x and z velocity components in m=s (left and center, respectively), and subgrid-scale turbulent kinetic energy in m2 =s2 (right). The subgrid-scale turbulent kinetic energy is decomposed into a high-resolution turbulent velocity field which is summed with the resolved velocities. SDRE 3600 3500 3400 Z, m. 3300 200 180 X, m. 3200 3100 −3640 −3660 a) 3000 Y, m. MPNC 3600 3500 3400 Z, m. 3300 200 180 −3650 −3670 X, m. 3200 3100 3000 b) Y, m. Figure 10.8: Trajectory through the gust front. a) SDRE, b) MPNC. 22 10.5. CONCLUSIONS In this paper, we have presented a new approach to receding horizon MPC based on a NN feedback controller in combination with an SDRE controller. The approach exploits both a sophisticated numerical model of the vehicle (FlightLab) and its analytical nonlinear approximation (6DOF model). The NN is optimized using properties of the full FlightLab simulator to minimize the MPC cost, while the SDRE controller is designed using the approximate model and provides a baseline stabilizing control trajectory. In addition, we considered a number of simplifications in order to improve the computational requirements of the approach. Overall, results verify the superior performance of the approach over traditional SDRE (and LQ) control. Future work includes determination of optimal settings for the horizon length and update intervals, influence of training epochs on the NN, and the effects of modeling errors and disturbances. It should be noted that the processing power necessary to implement the FlightLab model currently precludes the use of this approach in real-time. However, with ever increasing CPU speeds and on-board resources, we feel this approach will be viable in the near future. In the interim, our plans are to investigate an alternative model designed specifically for small-helicopters that can be run in real-time [8] (see also Chapter 15 of this text). Using this model, we are working towards a field demonstration using a model RC helicopter, as well as software integration and algorithm optimization on the Open Control Platform (OCP) described in this book. 10.6. Acknowledgements This work was sponsored by DARPA under SEC grant F33615-98-C-3516. The authors would also like to thank Andy Moran for his programming assistance in the early phases of this research. 10.7. References [1] Flightlab release note - version 2.8.4. Advanced Rotorcraft Technology, Inc., 1999. [2] G. Balas, I. Fialho, A. Packard, J. Renfrow, and C. Mullaney. On the design of lpv controllers for the f-14 aircraft lateraldirectional axis during powered approach. In American Control Conference, 1997. 23 [3] A. A. Bogdanov, E. A. Wan, M. Carlsson, Y. Zhang, R. Kieburtz, and A. Baptista. Model predictive neural control of a high fidelity helicopter model. In AIAA Guidance Navigation and Control Conference, Montreal, Quebec, Canada, August 2001. [4] A. Calise and Rysdyk. Nonlinear adaptive flight control using neural networks. In IEEE Control System Magazine, volume 18, No. 6, December 1998. [5] J. R. Cloutier, C. N. D’Souza, and C. P. Mracek. Nonlinear regulation and nonlinear H-infinity control via the state-dependent Riccati equation technique: Part1, Theory. In Proc. of the International Conf. on Nonlinear Problems in Aviation and Aerospace, FL, May 1996. [6] J. R. Cloutier, C. N. D’Souza, and C. P. Mracek. Nonlinear regulation and nonlinear H-infinity control via the state-dependent Riccati equation technique: Part2, Examples. In Proceedings of the International Conference on Nonlinear Problems in Aviation and Aerospace, Daytona Beach, FL, May 1996. [7] G. F. Franklin, J. D. Powell, and M. L. Workman. Digital Control of Dynamic Systems. AddisonWesley, Reading, MA, second edition, 1990. [8] V. Gavrilets, B. Mettler, and E. Feron. Nonlinear model for a small-size acrobatic helicopter. In AIAA Guidance Navigation and Control Conference, Montreal, Quebec, Canada, August 2001. [9] R. A. Jacobs. Increasing rates of convergence through learning rate adaptation. Neural Networks, 1(4):295–307, 1988. [10] A. Jadbabaie, J. Yu, and J. Hauser. Stabilizing receding horizon control of nonlinear systems: a control Lyapunov function approach. In Proceedings of American Control Conference, 1999. [11] A. Jadbabaie, J. Yu, and J. Hauser. Unconstrained receding horizon control of nonlinear systems. In Proceedings of IEEE Conference on Decision and Control, 1999. [12] E. Johnson, A. Calise, R. Rysdyk, and H. El-Shirbiny. Feedback linearization with neural network augmentation applied to x-33 attitude control. In Proceedings of the AIAA Guidance, Navigation, and Control Conference, 2000. 24 [13] M. K. Kalnay and W. Baker. Global numerical weather prediction at the national meteorological center. In Bull. Amer. Meteor. Soc., volume 71, pages 1410–1428, 1990. [14] M. Kanamitsu, J. Alpert, K. Campana, P. Caplan, D. Deaven, M. Iredell, B. Katz, H.-L. Pan, J. Sela, and G. White. Recent changes implemented into the global forecast system at nmc. In Weather and Forecasting, volume 6, pages 425–435, 1991. [15] E. S. Meadow and J. B. Rawlings. Nonlinear Process Control, chapter Model predictive control. PHALL, 1997. [16] D. Peters and C. He. Finite state induced flow models part II, three dimensional rotor disk. Journal of Aircraft, 32(2), March-April 1995. [17] S. W. Piche, B. Sayyar-Rodsari, D. Johnson, and M. Gerules. Nonlinear model predictive control using neural networks. IEEE Control Systems Magazine, 20(3), 2000. [18] S. J. Qin and T. A. Badgwell. An overview of industrial model predictive control technology. Chemical Process Control - AIChE Symposium Series, pages 232–256, 1997. [19] J. Shamma and J. Cloutier. Gain-scheduled missile autopilot design using linear parameter varying transformations. In AIAA J. on Guidance, Control an Dynamics, pages 16(2):256–263, 1993. [20] A. Shapiro, L. Zhao, S. Weygandt, K. Brewster, S. lazarus, and K. Droegemeir. Initial forecast fields from single-doppler wind retrieval, thermodynamic retrieval and adas. In 11th Conf. On Numerical Wethaer Prediction, Amer. Meteor. Soc, pages 119–121, Norfolk, VA, 1996. [21] M. Sznaizer, J. Cloutier, R. Hull, D. Jacques, and C. Mracek. Receding horizon control Lyapunov function approach to suboptimal regulation of nonlinear systems. The Journal of Guidance, Control, and Dynamics, 23(3):399–405, May-June 2000. [22] E. A. Wan and A. A. Bogdanov. Model predictive neural control with applications to a 6 DoF helicopter model. In Proc. of IEEE American Control Conference, Arlington, VA, June 2001. [23] F. Wu, A. Packard, and G. Balas. Lpv control design for pitch-axis missile autopilots. In Proc. 34th IEEE Conf. on Decision and Control, pages 53–6, New Orleans, LA, 1995. 25 [24] M. Xue, K. K. Droegemeier, and V. Wong. The advanced regional prediction system (arps) - a multiscale nonhydrostatic atmospheric simulation and prediction tool. part i: Model dynamics and verification. In Meteor. Atmos. Physics, volume 75, 2000. [25] M. A. Zulauf. Modeling the Effects of Boundary Layer Circulations Generated by Cumulus Convection and Leads on Large-Scale Surface Fluxes, Ph.D. thesis, University of Utah. Salt Lake City, UT 84112, 2001. 26