2022 Neural-Fly
2022 Neural-Fly
2022 Neural-Fly
Executing safe and precise flight maneuvers in dynamic high-speed winds is important for the ongoing commoditiza-
tion of uninhabited aerial vehicles (UAVs). However, because the relationship between various wind conditions
and its effect on aircraft maneuverability is not well understood, it is challenging to design effective robot controllers
Downloaded from https://www.science.org at Academy of Mathematics and System Sciences, Cas on September 15, 2023
using traditional control design methods. We present Neural-Fly, a learning-based approach that allows rapid
online adaptation by incorporating pretrained representations through deep learning. Neural-Fly builds on two
key observations that aerodynamics in different wind conditions share a common representation and that the
wind-specific part lies in a low-dimensional space. To that end, Neural-Fly uses a proposed learning algorithm,
domain adversarially invariant meta-learning (DAIML), to learn the shared representation, only using 12 minutes of
flight data. With the learned representation as a basis, Neural-Fly then uses a composite adaptation law to update a set
of linear coefficients for mixing the basis elements. When evaluated under challenging wind conditions generated
with the Caltech Real Weather Wind Tunnel, with wind speeds up to 43.6 kilometers/hour (12.1 meters/second), Neural-Fly
achieves precise flight control with substantially smaller tracking error than stateof-the-art nonlinear and adaptive
controllers. In addition to strong empirical performance, the exponential stability of Neural-Fly results in robustness
guarantees. Last, our control design extrapolates to unseen wind conditions, is shown to be effective for outdoor
flights with only onboard sensors, and can transfer across drones with minimal performance degradation.
Downloaded from https://www.science.org at Academy of Mathematics and System Sciences, Cas on September 15, 2023
Fig. 1. Agile flight through narrow gates. (A) Caltech Real Weather Wind Tunnel system, the quadrotor UAV, and the gate. In our flight tests, the UAV follows an agile
trajectory through narrow gates, which are slightly wider than the UAV itself, under challenging wind conditions. (B and C) Trajectories used for the gate tests. In (B), the
UAV follows a figure-8 through one gate, with a wind speed of 3.1 m/s or time-varying wind condition. In (C), the UAV follows an ellipse in the horizontal plane through
two gates, with a wind speed of 3.1 m/s. (D and E) Long-exposure photos (with an exposure time of 5 s) showing one lap in two tasks. (F to I) High-speed photos (with a
shutter speed of 1/200 s) showing the moment the UAV passed through the gate and the interaction between the UAV and the wind.
For the online adaptive control phase, we have developed a and with more complex hardware [for example, (4) requires a 10 times
regularized composite adaptive control law, which we derived from higher control frequency and onboard optical sensors for direct
a fundamental understanding of how the learned representation motor speed feedback]. We also compare Neural-Fly with two variants
interacts with the closed-loop control system and which we support with of our method: Neural-Fly-Transfer, which uses a learned repre-
rigorous theory. The adaptation law updates the wind-dependent sentation trained on data from a different drone, and Neural-Fly-
linear coefficients using a composite of the position tracking error Constant, which only uses our adaptive control law with a trivial
term and the aerodynamic force prediction error term. Such a prin- nonlearning basis. NeuralFly-Transfer demonstrates that our method
cipled approach effectively guarantees stable and fast adaptation to is robust to changes in vehicle configuration and model mismatch.
any wind condition and robustness against imperfect learning. Neural-Fly-Constant, ℒ1, and INDI all directly adapt to the un-
Although this adaptive control law could be used with a number of known dynamics without assuming the structure of the under-
learned models, the speed of adaptation is further aided by the lying physics, and they have similar performance. Furthermore, we
concise representation learned from DAIML. demonstrate that our method enables a new set of capabilities that
Using Neural-Fly, we report an average improvement of 66% allow the UAV to fly through low-clearance gates after agile trajec-
over a nonlinear tracking controller, 42% over an ℒ1 adaptive tories in gusty wind conditions (Fig. 1).
controller, and 35% over an incremental nonlinear dynamics inver-
sion (INDI) controller. These results are all accomplished using Related work for precise quadrotor control
standard quadrotor UAV hardware while running the PX4’s default Typical quadrotor control consists of a cascaded or hierarchical
regulation attitude control. Our tracking performance is competitive control structure that separates the design of the position controller,
even compared with related work without external wind disturbances attitude controller, and thrust mixer (allocation). Commonly used
Downloaded from https://www.science.org at Academy of Mathematics and System Sciences, Cas on September 15, 2023
SGD
C Control diagram
Tracking error
Fig. 2. Offline meta-learning and online adaptive control design. (A) The online adaptation block in our adaptive controller. Our controller leverages the meta-trained
basis function , which is a wind-invariant representation of the aerodynamic effects, and uses composite adaptation (that is, including tracking error–based and prediction
error–based adaptation) to update wind-specific linear weights a ˆ. The output of this block is the wind-effect force estimate, fˆ = aˆ. (B) The illustration of our meta-learning
algorithm DAIML. We collected data from wind conditions {w1, ⋯, wK} and applied Algorithm 1 to train the net. (C) The diagram of our control method, where the gray
part corresponds to (A). Interpreting the learned block as an aerodynamic force allows it to be incorporated into the feedback control easily.
off-the-shelf controllers, such as PX4, design each of these loops as and drone ground speeds up to 12.9 m/s (4). However, Tai and
proportional-integral-derivative (PID) regulation controllers (13). Karaman (4) rely on high-frequency control updates (500 Hz) and
The control performance can be substantially improved by designing direct motor speed feedback using optical encoders to rapidly esti-
each layer of the cascaded controller as a tracking controller using the mate external disturbances. Both are challenging to deploy on stan-
concept of differential flatness (14) or, as has recently been popular, dard systems. Hanover et al. (7) simplify the hardware setup, do not
using a single optimization-based controller such as model predictive require optical motor speed sensors, and have demonstrated state-
control (MPC) to directly compute motor speed commands from of-the-art tracking performance. However, Hanover et al. (7) rely
desired trajectories. State-of-the-art tracking performance relies on MPC on a high-rate ℒ1 adaptive controller inside a model predictive con-
with fast adaptive inner loops to correct for modeling errors (4, 7); troller and uses a racing drone with a fully customized control stack.
however, this approach requires full custom flight controllers. In con- Torrente et al. (11) leverage an aerodynamic model learned offline
trast, our method is designed to be integrated with a typical PX4 flight represented as Gaussian processes. However, Torrente et al. (11)
controller, yet it achieves state-of-the-art flight performance in wind. cannot adapt to unknown or changing wind conditions and provide
Prior work on agile quadrotor control has achieved impressive no theoretical guarantees. Another recent work focuses on deriving
results by considering aerodynamics (2, 4, 7, 11). However, those simplified rotor-drag models that are differentially flat (2). Howev-
approaches require specialized onboard hardware (4), entail full er, the work of Faessler et al. (2) focuses on horizontal, x-y plane
custom flight control stacks (4, 7), or cannot adapt to external wind trajectories at ground speeds of 4 m/s without external wind, where
disturbances (2, 11). For example, state-of-the-art tracking perform the thrust is more constant than ours, achieves ~6-cm tracking
ance has been demonstrated using INDI to estimate aerodynamic error (2), uses an attitude controller running at 4000 Hz, and is not
disturbance forces, with a root mean square tracking error of 6.6 cm extensible to faster flights as pointed out in (11).
Relation between Neural-Fly and conventional Neural-Fly solves the aforementioned issues of basis function
adaptive control design and adaptive control stability using newly developed methods
Adaptive control theory has been extensively studied for online for meta-learning and composite adaptation that can be seamlessly
control and identification problems with parametric uncertainty, integrated together. NeuralFly uses DAIML and flight data to learn
for example, unknown linear coefficients for mixing known basis an effective and compact set of basis functions, represented as a
functions (15–20). There are three common aspects of adaptive DNN. The regularized composite adaptation law uses the learned
control that must be addressed carefully in any well-designed basis functions to quickly respond to wind conditions. Neural-Fly
system and that we address in Neural-Fly: designing suitable basis enjoys fast adaptation because of the conciseness of the feature
functions for online adaptation, stability of the closed-loop system, space, and it guarantees closed-loop exponential stability and
and persistence of excitation, which is a property related to robust- robustness without assuming persistent excitation.
ness against disturbances. These challenges arise because of the Related to Neural-Fly, neural network–based adaptive control
coupling between the unknown underlying dynamics and the has been researched extensively but by and large was limited to
Downloaded from https://www.science.org at Academy of Mathematics and System Sciences, Cas on September 15, 2023
online adaptation. This coupling precludes naive combinations of shallow or single-layer neural networks without pretraining. Some
online learning and control. For example, gradient-based parame- early works focus on shallow or single-layer neural networks with
ter adaptation has well-known stability and robustness issues as unknown parameters that are adapted online (19, 24–27). A recent
discussed in (15). work applies this idea to perform an impressive quadrotor flip (28).
The basis functions play a crucial role in the performance of However, the existing neural network–based adaptive control work
adaptive control, but designing or selecting proper basis functions does not use multilayer DNNs and lacks a principled and efficient
might be challenging. A good set of basis functions should reflect mechanism to pretrain the neural network before deployment.
important features of the underlying physics. In practice, basis Instead of using shallow neural networks, recent trends in machine
functions are often designed using physics-informed modeling of learning highly rely on DNNs due to their representation power
the system, such as the nonlinear aerodynamic modeling in (21). (29). In this work, we leverage modern deep learning advances to
However, physics-informed modeling requires a tremendous amount pretrain a DNN that represents the underlying physics compactly
of prior knowledge and human labor and is often still inaccurate. and effectively.
Another approach is to use random features as the basis set, such as
random Fourier features (22, 23), which can model all possible Related work in multienvironment deep learning
underlying physics as long as the number of features is large enough. for robot control
However, the high-dimensional feature space is not optimal for a Recently, researchers have been addressing the data and computa-
specific system because many of the features might be redundant or tion requirements for DNNs to help the field progress toward the
irrelevant. Such suboptimality and redundancy not only increase fast online-learning paradigm. In turn, this progress has been
the computational burden but also slow down the convergence enabling adaptable DNN-based control in dynamic environments.
speed of the adaptation process. The most popular learning scheme in dynamic environments is
Given a set of basis functions, naive adaptive control designs meta-learning, or “learning to learn,” which aims to learn an efficient
may cause instability and fragility in the closedloop system, due to model from data across different tasks or environments (30, 31).
the nontrivial coupling between the adapted model and the system The learned model, typically represented as a DNN, ideally should
dynamics. In particular, asymptotically stable adaptive control be capable of rapid adaptation to a new task or an unseen environ-
cannot guarantee robustness against disturbances, and so exponential ment given limited data. For robotic applications, meta-learning
stability is desired. Even so, often existing adaptive control methods has shown great potential for enabling autonomy in highly dynamic
only guarantee exponential stability when the desired trajectory is environments. For example, it has enabled quick adaptation against
persistently exciting, by which information about all of the co- unseen terrain or slopes for legged robots (32, 33), changing
efficients (including irrelevant ones) is constantly provided at the suspended payload for drones (34), and unknown operating condi-
required spatial and time scales. In practice, persistent excitation tions for wheeled robots (35).
requires either a succinct set of basis functions or perturbing the In general, learning algorithms typically can be decomposed into
desired trajectory, which compromises tracking performance. two phases: offline learning and online adaptation. In the offline
Recent multirotor flight control methods—including INDI (4) learning phase, the goal is to learn a model from data collected in
and ℒ1 adaptive control, presented in (5) and demonstrated inside different environments, such that the model contains shared knowl-
an MPC loop in (7)—achieve good results by abandoning complex edge or features across all environment, for example, learning
basis functions. Instead, these methods directly estimate the aerodynamic features shared by all wind conditions. In the online
aerodynamic residual force vector. The residual force is observable; adaptation phase, the goal is to adapt the offline-learned model,
thus, these methods bypass the challenge of designing good basis given limited online data from a new environment or a new task, for
functions and the associated stability and persistent excitation issues. example, fine-tuning the aerodynamic features in a specific wind
However, these methods suffer from lag in estimating the residual condition.
force and encounter the filter design performance trade of reduced There are two ways that the offline-learned model can be adapted.
lag versus amplified noise. Neural-Fly-Constant only uses Neural-Fly's In the first class, the adaptation phase adapts the whole neural
composite adaptation law to estimate the residual force, and network model, typically using one or more gradient descent steps
therefore, Neural-Fly-Constant also falls into this class of adaptive (30, 32, 34, 36). However, because of the notoriously data-hungry
control structures. The results of this article demonstrate that the and high-dimensional nature of neural networks, for real-world
inherent estimation lag in these existing methods limits performance robots, it is still impossible to run such adaptation onboard as
on agile trajectories and in strong wind conditions. fast as the feedback control loop (e.g., ~l00 Hz for quadrotor).
Furthermore, adapting the whole neural network often lacks Positioning System (GPS) module and an external antenna. We
explainability and robustness and could generate unpredictable now discuss the design of the UAV and the wind condition in detail.
outputs that make the closed loop unstable. UAV design
In the second class (including Neural-Fly), the online adaptation We built a quadrotor UAV for our primary data collection and all
only adapts a relatively small part of the learned model, for example, experiments, shown in Fig. 1A. The quadrotor weighs 2.6 kg with a
the last layer of the neural network (35, 37–39). The intuition is that thrust-to-weight ratio of 2.2. The UAV is equipped with a Pixhawk
different environments share a common representation (e.g., the flight controller running PX4, an open-source commonly used
wind-invariant representation in Fig. 2A), and the environment drone autopilot platform (13). The UAV incorporates a Raspberry
specific part is in a low-dimensional space (e.g., the wind-specific Pi 4 onboard computer running a Linux operation system, which
linear weight in Fig. 2A), which enables the real-time adaptation as performs real-time computation and adaptive control and interfaces
fast as the control loop. In particular, the idea of integrating with the flight controller through Robot Operating System (ROS).
meta-learning with adaptive control is first presented in our prior State estimation is performed using the built-in PX4 Extended
Downloaded from https://www.science.org at Academy of Mathematics and System Sciences, Cas on September 15, 2023
work (37), later followed by Richards et al. (38). However, the Kalman Filter (EKF), which fuses inertial measurement unit data
representation learned in (37) is ineffective, and the tracking per- with global position estimates from OptiTrack motion capture sys-
formance in (37) is similar as the baselines; Richards et al. (38) focus tem (or the GPS module for outdoor flight tasks). The UAV plat-
on a planar and fully actuated rotorcraft simulation without experi- form features a wide-X configuration—measuring 85 cm in width,
ment validation, and there is no stability or robustness analysis. 75 cm in length, and 93 cm diagonally—and tilted motors for im-
Neural-Fly instead learns an effective representation using our proved yaw authority. This general hardware setup is standard and
meta-learning algorithm called DAIML, demonstrates state-of-the-art similar to many quadrotors. We refer to the Supplementary Ma-
tracking performance on real drones, and achieves nontrivial stability terials (section S1) for further configuration details.
and robustness guarantees. We implemented our control algorithm and the baseline control
Another popular deep-learning approach for control in dynamic methods in the position control loop in Python and ran it on the
environments is robust policy learning via domain randomization onboard Linux computer at 50 Hz. The PX4 was set to the offboard
(40–42). The key idea is to train the policy with random physical flight mode and received thrust and attitude commands from the
parameters such that the controller is robust to a range of condi- position control loop. The built-in PX4 multicopter attitude con-
tions. For example, the quadrupedal locomotion controller in (40) troller was then executed at the default rate, which is a linear PID
retains its robustness over challenging natural terrains. However, regulation controller on the quaternion error. The online inference
robust policy learning optimizes average performance under a of the learned representation is also in Python via PyTorch, which
broad range of conditions rather than achieving precise control by is an open-source deep learning framework.
adapting to specific environments. To study the generalizability and robustness of our approach, we
also used an Intel Aero Ready to Fly drone for data collection. This
dataset was used to train a representation of the wind effects on the
RESULTS Intel Aero drone, which we tested on our custom UAV. The Intel
In this section, we first discuss the experimental platform for data Aero drone (weighing 1.4 kg) has a symmetric X configuration,
collection and experiments. Second, we discuss the key conceptual 52 cm in width and 52 cm in length, without tilted motors (see the
reasoning behind our combined method of our meta-learning algo- Supplementary Materials for further details).
rithm, called DAIML, and our composite adaptive controller with Wind condition design
stability guarantees. Third, we discuss several experiments to quan- To generate dynamic and diverse wind conditions for the data
titatively compare the closed-loop trajectory-tracking performance collection and experiments, we leveraged the state-of-the-art Caltech
of our methods to a nonlinear baseline method and two state-of- Real Weather Wind Tunnel System (Fig. 1A). The wind tunnel is a
the-art adaptive flight control methods, and we observe that our 3 m by 3 m array of 1296 independently controllable fans capable of
methods reduce the average tracking error substantially. To demon- generating wind conditions up to 43.6 km/hour. The distributed
strate the new capabilities brought by our methods, we present agile fans are controlled in real time by a Python-based application
flight results in gusty winds, where the UAV must quickly fly programming interface (API). For data collection and flight exper-
through narrow gates that are only slightly wider than the vehicle. iments, we designed two types of wind conditions. For the first type,
Last, we show that our methods are also applicable in outdoor agile each fan has uniform and constant wind speed between 0 and 43.6
tracking tasks without external motion capture systems. km/hour (12.1 m/s). The second type of wind follows a sinusoidal
function in time, e.g., 30.6 + 8.6 sin(t) km/hour. Note that the train-
Experimental platform ing data only cover constant wind speeds up to 6.1 m/s. To visualize
All of our experiments were conducted at Caltech’s Center for the wind, we used five smoke generators to indicate the direction and
Autonomous Systems and Technologies. The experimental setup intensity of the wind condition (see examples in Fig. 1 and Movie 1).
consisted of an OptiTrack motion capture system with 12 infrared
cameras for localization streaming position measurements at 50 Hz, Offline learning and online adaptive control development
a WiFi router for communication, the Caltech Real Weather Wind Data collection and meta-learning using DAIML
Tunnel for generating dynamic wind conditions, and a custom-built To learn an effective representation of the aerodynamic effects, we
quadrotor UAV. The Real Weather Wind Tunnel is composed of had a custom-built drone follow a randomized trajectory for 2 min
1296 individually controlled fans and can generate uniform wind each in six different static wind conditions, with speeds ranging
speeds of up to 43.6 km/hour in its 3 m by 3 m by 5 m test section. from 0 to 22.0 km/hour. However, in experiments, we used wind
For outdoor flight, the drone was also equipped with a Global speeds up to 43.6 km/hour (12.1 m/s) (e.g., Fig. 6). Data were
collected at 50 Hz with a total of 36, 000 data points. Figure 3A repeated on the Intel Aero drone to study whether the learned
shows the data collection process, and Fig. 3B shows the inputs and representation can generalize to a different drone.
labels of the training data, under one wind condition of 13.3 km/hour On the collected datasets for both our custom drone and the
(3.7 m/s). Figure 3C shows the distributions of input data (pitch) Intel Aero drone, we applied the DAIML algorithm to learn two
and label data (x component of the aerodynamic force) in different representations of the wind effects. The learning process was done
wind conditions. A shift in wind conditions causes distribution offline on a normal desktop computer and is depicted in Fig. 2B. Figure 4
shifts in both input domain and label domain, which motivates the shows the evolution of the linear coefficients (a*) during the learn-
algorithm design of DAIML. The same data collection process was ing process, where DAIML learns a representation of the aerody-
namic effects shared by all wind conditions, and the linear coefficient
contains the wind-specific information. Moreover, the learned
representation is explainable in the sense that the linear coefficients
in different wind conditions are well disentangled (see Fig. 4). We
Downloaded from https://www.science.org at Academy of Mathematics and System Sciences, Cas on September 15, 2023
refer to Materials and Methods for more details.
Baselines and the variants of our method
We briefly introduce three variants of our method and the three
baseline methods considered (details are provided in Materials and
Methods). Each of the controllers is implemented in the position
control loop and outputs a force command. The force command is
fed into a kinematics block to determine a corresponding attitude
and thrust, similar to (14), which is sent to the PX4 flight controller.
The three baselines include the following: globally exponentially
stabilizing nonlinear tracking controller for quadrotor control
(8, 43, 44), INDI linear acceleration control (4), and ℒ1 adaptive
control (5, 7). The primary difference between these baseline methods
Movie 1. Neural-Fly enables agile quadrotor flights through low-clearance gates. and Neural-Fly is how the controller compensates for the unmodeled
residual force (that is, each baseline
method has the same control structure,
in Fig. 2C, except for the estimation of
the fˆ ). In the case of the nonlinear baseline
controller, an integral term accumulates
error to correct for the modeling error.
The integral gain is limited by the sta-
bility of the interaction with the position
and velocity error feedback, leading to
slow model correction. In contrast, both
INDI and ℒ1 decouple the adaptation
rate from the PD gains, which allow for
fast adaptation. Instead, these methods
are limited by more fundamental design
factors, such as system delay, measure-
ment noise, and controller rate.
Our method is illustrated in Fig. 2
(A and C) and replaces the integral
feedback term with an adapted learning
term. The deployment of our approach
depends on the learned representation
function , and our primary method
and two variants consider a different
choice of . Neural-Fly is our primary
method using a representation learned
from the dataset collected by the custom-
built drone, which is the same drone used
Fig. 3. Training data collection. (A) The xyz position along a 2-min randomized trajectory for data collection with a
in experiments. NeuralFly-Transfer uses
wind speed of 8.3 km/hour (3.7 m/s), in the Caltech Real Weather Wind Tunnel. (B) A typical 10-s trajectory of the
the Neural-Fly algorithm, where the
inputs (velocity, attitude quaternion, and motor speed PWM command) and label (offline calculation of aerodynamic
residual force) for our learning model, corresponding to the highlighted part in (A). (C) Histograms showing data
representation is trained using the data-
distributions in different wind conditions. Left: Distributions of the x-component of the wind-effect force, fx. This set collected by the aforementioned Intel
shows that the aerodynamic effect changes as the wind varies. Right: Distributions of the pitch, a component of Aero drone. Neural-Fly-Constant uses
the state used as an input to the learning model. This shows that the shift in wind conditions causes a distribution the online adaptation algorithm from
shift in the input. Neural-Fly, but the representation is an
Downloaded from https://www.science.org at Academy of Mathematics and System Sciences, Cas on September 15, 2023
Fig. 4. t-SNE plots showing the evolution of the linear weights (a*) during the training process. As the number of training epochs increases, the distribution of a*
becomes more clustered with similar wind speed clusters near each other. The clustering also has a physical meaning: After training convergence, the top right part
corresponds to a higher wind speed. This suggests that DAIML successfully learned a basis function shared by all wind conditions, and the wind-dependent information
is contained in the linear weights. Compared with the case without the adversarial regularization term (using = 0 in Algorithm 1), the learned result using our algorithm
is also more explainable, in the sense that the linear coefficients in different conditions are more disentangled.
artificially designed constant mapping. Neural-Fly-Transfer is included 30.6 km/hour, 43.6 km/hour, and sinusoidal wind speeds—all of
to show the generalizability and robustness of our approach with drone which exceed the wind speed in the training data. All of these results
transfer, i.e., using a different drone in experiments compared with present a trend: Adaptive control substantially outperforms the
data collection. Last, Neural-Fly-Constant demonstrates the benefit of nonlinear baseline that relies on integral control, and learning
using a better representation learned from the proposed meta- markedly improves adaptive control.
learning method DAIML. Note that Neural-Fly-Constant is a com-
posite adaptation form of a Kalman filter disturbance observer, that is, Agile flight through narrow gates
a Kalman filter augmented with a tracking error update term. Precise flight control in dynamic and strong wind conditions has many
applications, such as rescue and search, delivery, and transportation.
Trajectory tracking performance In this section, we present a challenging drone flight task in strong
We quantitatively compare the performance of the aforementioned winds, where the drone must follow agile trajectories through narrow
control methods when the UAV follows a 2.5-m-wide, 1.5-m-tall gates, which are only slightly wider than the drone. The overall result
figure-8 trajectory with a lap time of 6.28 s under constant, uniform is depicted in Fig. 1 and Movie 1. As shown in Fig. 1A, the gates
wind speeds of 0 km/hour, 15.l km/hour (4.2 m/s), 30.6 km/hour used in our experiments are 110 cm in width, which is only slightly
(8.5 m/s), and 43.6 km/hour (12.1 m/s), and under time-varying wider than the drone (85 cm wide and 75 cm long). To visualize the
wind speeds of 30.6 + 8.6 sin(t) km/hour [8.5 + 2.4 sin(t) m/s]. trajectory using long-exposure photography, our drone was deployed
The flight trajectory for each of the experiments is shown in with four main light-emitting diodes (LEDs) on its legs, where the
Fig. 5, which includes a warm-up lap and six 6.28-s laps. The non- two rear LEDs were red, and the front two were white. There were
linear baseline integral term compensates for the mean model error also several small LEDs on the flight controller, the computer, and
within the first lap. As the wind speed increases, the aerodynamic the motor controllers, which can be seen in the long-exposure shots.
force variation becomes larger, and we notice a substantial performance Task design
degradation. INDI and ℒ1 both improve over the nonlinear baseline, We tested our method on three different tasks. In the first task
but INDI is more robust than ℒ1 at high wind speeds. Neural-Fly- [see Fig. 1 (B, D, and F to I) and Movie 1], the desired trajectory is
Constant outperforms INDI except during the two most challenging a 3 m by 1.5 m figure-8 in the x-z plane with a lap time of 5 s. A gate
tasks: 43.6 km/hour and sinusoidal wind speeds. The learning-based is placed on the left bottom part of the trajectory. The minimum
methods, Neural-Fly and Neural-Fly-Transfer, outperform all other clearance is about 10 cm (see Fig. 1I), which requires that the
methods in all tests. Neural-Fly outperforms Neural-Fly-Transfer slight- controller precisely tracks the trajectory. The maximum speed and
ly, which is because the learned model was trained on data from the acceleration of the desired trajectory are 2.7 m/s and 5.0 m/s2,
same drone and thus better matches the dynamics of the vehicle. respectively. The wind speed was 3.l m/s. The second task (see
In Table 1, we tabulate the root mean square position error and Movie 1) is the same as the first one, except that it uses a more
mean position error values over the six laps for each experiment. challenging, time-varying wind condition, 3.1 + 1.8 sin(2 5 t) m/s.
_
Figure 6 shows how the mean tracking error changes for each con- In the third task [see Fig. 1 (C and E) and Movie 1], the desired
troller as the wind speed increases and includes the SD for the mean trajectory is a 3 m by 2.5 m ellipse in the x-y plane with a lap time of
lap position error. In all cases, Neural-Fly and NeuralFly-Transfer 5 s. We placed two gates on the left and right sides of the ellipse. As
outperform the state-of-the-art baseline methods, including the with the first task, the wind speed was 3.1 m/s.
Downloaded from https://www.science.org at Academy of Mathematics and System Sciences, Cas on September 15, 2023
Fig. 5. Depiction of the trajectory tracking performance of each controller in several wind conditions. The baseline nonlinear controller can track the trajectory well;
however, the performance substantially degrades at higher wind speeds. INDI, ℒ1, and Neural-Fly-Constant have similar performance and improve over the nonlinear
baseline by estimating the aerodynamic disturbance force quickly. Neural-Fly and Neural-Fly-Transfer use a learned model of the aerodynamic effects and adapt the
model in real time to achieve lower tracking error than the other methods.
Table 1. Tracking error statistics in centimeters for different wind conditions. Two metrics are considered: root mean square (RMS) and mean.
Wind speed [m/s]
Method 0 4.2 8.5 12.1 8.5 + 2.4 sin(t)
RMS Mean RMS Mean RMS Mean RMS Mean RMS Mean
Nonlinear 11.9 10.8 10.7 9.9 16.3 14.7 23.9 21.6 31.2 28.2
INDI 7.3 6.3 6.4 5.9 8.5 8.2 10.7 10.1 11.1 10.3
L1 4.6 4.2 5.8 5.2 12.1 11.1 22.7 21.3 13.0 11.6
NF-Constant 5.4 5.0 6.1 5.7 7.5 6.9 12.7 11.2 12.7 12.1
NF-Transfer 3.7 3.4 4.8 4.4 6.2 5.9 10.2 9.4 8.8 8.0
NF 3.2 2.9 4.0 3.7 5.8 5.3 9.4 8.7 7.6 6.9
Performance emphasize that the drone is wider than the LED light region,
For all three tasks, we used our primary method, Neural-Fly, where because the LEDs are located on the legs (see Fig. 1A). Figure 1
the representation is learned using the dataset collected by the (F to I) shows four high-speed photos with a shutter speed of 1/200 s.
custom-built drone. Figure 1 (D and E) shows two long-exposure These four photos captured the moment the drone passed through
photos with an exposure time of 5 s, which is the same as the lap the gate in the first task and the complex interaction between the
time of the desired trajectory. We see that our method precisely drone and the wind. We see that the aerodynamic effects are com-
tracked the desired trajectories and flew safely through the gates plex and nonstationary and depend on the UAV attitude, the rela-
(see Movie 1). These long-exposure photos also captured the tive velocity, and aerodynamic interactions between the propellers
smoke visualization of the wind condition. We would like to and the wind.
Downloaded from https://www.science.org at Academy of Mathematics and System Sciences, Cas on September 15, 2023
cation layer (such as using ROS2’s real-
time features) could allow us to achieve
tracking errors on the order of the
localization accuracy. Attitude tracking
delay can be substantially reduced through
the use of a nonlinear attitude controller
[e.g., (44)]. Our method is also directly
Fig. 6. Mean tracking errors of each lap in different wind conditions. This figure shows position tracking errors
extensible to attitude control because
of different methods as wind speed increases. Solid lines show the mean error over six laps, and the shaded areas
attitude dynamics match the Euler-La-
show SD of the mean error on each lap. The gray area indicates the extrapolation region, where the wind speeds
are not covered in training. Our primary method (Neural-Fly) achieves state-of-the-art performance even with a
grange dynamics used in our deriva-
strong wind disturbance. tions. However, further work is needed
to understand the interaction of the
Outdoor experiments learned dynamics with the cascaded
We tested our algorithm outdoors in gentle breeze conditions (wind control design when implementing a tracking attitude controller.
speeds measured up to 17 km/hour). An onboard GPS receiver We have tested our control method in outdoor flight to demon-
provided position information to the EKF, giving lower precision strate that it is robust to less precise state estimation and does not rely
state estimation and therefore less precise aerodynamic residual force on any particular features of our test facility. Although control and
estimation. After the same aforementioned figure-8 trajectory, the estimation are usually separately designed parts of an autonomous
controller reached 7.5-cm mean tracking error, shown in Fig. 7. system, aggressive adaptive control requires minimal noise in force
measurement to effectively and quickly compensate for unmodeled
dynamics. Testing our method in outdoor flight, the quadrotor main-
DISCUSSION tains precise tracking with only a 7.5-cm tracking error on a gentle
State-of-the-art tracking performance breezy day with wind speeds around 17 km/hour.
When measuring position tracking errors, we observe that our
Neural-Fly method outperforms state-of-the-art flight controllers Challenges caused by unknown and time-varying
in all wind conditions. Neural-Fly uses deep learning to obtain a wind conditions
compact representation of the aerodynamic disturbances and In the real world, the wind conditions are not only unknown but
incorporates that representation into an adaptive control design to also constantly changing, and the vehicle must continuously adapt.
achieve high-precision tracking performance. The benchmark methods We designed the sinusoidal wind test to emulate unsteady or gusty
used in this article are nonlinear control, INDI, and ℒ1, and performance wind conditions. Although our learned model was trained on static and
was compared tracking an agile figure-8 in constant and time-varying approximately uniform wind condition data, Neural-Fly can quickly
wind speeds up to 43.6 km/hour (12.1 m/s). Furthermore, we observe identify changing wind speed and maintains precise tracking even on
a mean tracking error of 2.9 cm in 0 km/h wind, which is comparable the sinusoidal wind experiment. Moreover, in each of our experi-
with state-of-the-art tracking performance demonstrated on more ments, the wind conditions were unknown to the UAV before start-
aggressive racing drones (4, 7) despite several architectural limita- ing the test yet were quickly identified by the adaptation algorithm.
tions such as limited control rate in offboard mode, a larger, less Our work demonstrated that it is possible to repeatedly and
maneuverable vehicle, and without direct motor speed measurements. quantitatively test quadrotor flight in time-varying wind. Our method
All our experiments were conducted using the standard PX4 attitude separately learns the wind effect’s dependence on the vehicle state
controller, with Neural-Fly implemented in an onboard, low-cost, (i.e., the wind-invariant representation in Fig. 2A) and the wind
and “credit card–sized” Raspberry Pi 4 computer. Furthermore, condition (i.e., the wind-specific linear weight in Fig. 2A). This
Neural-Fly is robust to changes in vehicle configuration, as demon- separation allows Neural-Fly to quickly adapt to the time-varying
strated by the similar performance of Neural-Fly-Transfer. wind even as the UAV follows a dynamic trajectory, with an average
To understand the fundamental tracking-error limit, we estimate tracking error below 8.7 cm in Table 1.
that the localization precision from the OptiTrack system is about 1 cm,
which is a practical lower bound for the average tracking error in our Computational efficiency of our method
system (see more details in the Supplementary Materials, section S8). In the offline meta-learning phase, the proposed DAIML algorithm
This is based on the fact that the difference between the OptiTrack is able to learn an effective representation of the aerodynamic effect
Downloaded from https://www.science.org at Academy of Mathematics and System Sciences, Cas on September 15, 2023
Fig. 7. Outdoor flight setup and performance. Left: In outdoor experiments, a GPS module is deployed for state estimation, and a weather station records wind profiles.
The maximum wind speed during the test was around 17 km/hour (4.9 m/s). Right: Trajectory tracking performance of Neural-Fly.
in a data-efficient manner. This requires only 12 min of flight data M(q ) q¨ + C(q, q̇ ) q̇ + g(q ) = u + f(q, q̇, w)
(1)
at 50 Hz, for a total of 36,000 data points. The training proce-
dure only takes 5 min on a normal desktop computer. In the where q, q̇ , q¨ ∈ ℝ n are the n dimensional position, velocity, and
online adaptation phase, our adaptive control method only takes acceleration vectors; M(q) is the symmetric, positive definite inertia
10 ms to compute on a compact onboard Linux computer (Rasp- matrix; C(q, q ̇)is the Coriolis matrix; g(q) is the gravitational force
berry Pi 4). In particular, the feedforward inference time via the vector; and u ∈ ℝn is the control force. f(q, q̇ , w)incorporates un-
learned basis function is about 3.5 ms, and the adaptation update modeled dynamics, and w ∈ ℝm is an unknown hidden state used to
is about 3.0 ms, which implies the compactness of the learned represent the underlying environmental conditions, which is po-
representation. tentially time-variant. Specifically, in this article, w represents the
wind profile (for example, the wind profile in Fig. 1), and different
Generalization to new trajectories and new aircraft wind profiles yield different unmodeled aerodynamic disturbances
Our control method is orthogonal to the design of the desired for the UAV.
trajectory. In this article, we focus on the figure-8 trajectory, which Neural-Fly can be broken into two main stages, the offline
is a commonly used control benchmark. We also demonstrate our meta-learning stage and the online adaptive control stage. These
method flying a horizontal ellipse during the narrow gate demon- two stages build a model of the unknown dynamics of the form
stration (Fig. 1). Note that our method supports any trajectory
planners such as those of Foehn et al. (1) or learning-based planners f(q, q̇, w ) ≈ (q, q̇ ) a(w)
(2)
(45, 46). In particular, for those planners that require a precise and
agile downstream controller [e.g., for close-proximity flight or where is a basis or representation function shared by all wind
drone racing (1, 10)], our method immediately provides a solution conditions and captures the dependence of the unmodeled dynamics
and further pushes the boundary of these planners, because our on the robot state, and a is a set of linear coefficients that is updated
state-of-the-art tracking capabilities enable tighter configurations for each condition. In the Supplementary Material (section S2), we
and smaller clearances. However, further research is required to prove that the decomposition (q, q̇ ) a(w)exists for any analytic
understand the coupling between planning and learning-based function f (q, q̇ , w). In the offline meta-learning stage, we learn as a
control near actuation limits. Future work will consider using DNN using our metalearning algorithm DAIML. This stage results
Neural-Fly in a combined planning and control structure such as in learning as a wind-invariant representation of the unmodeled
MPC, which will be able to handle actuation limits. dynamics, which generalizes to new trajectories and new wind
The comparison between Neural-Fly and Neural-Fly-Transfer conditions. In the online adaptive control stage, we adapt the linear
shows that our approach is robust to changing vehicle design and that coefficients a using adaptive control. Our adaptive control algo-
the learned representation does not depend on the vehicle. This rithm is a type of composite adaptation and was carefully designed to
demonstrates the generalizability of the proposed method running on allow for fast adaptation while maintaining the global exponential
different quadrotors. Moreover, our control algorithm is formulated stability and robustness of the closed loop system. The offline learning
generally for all robotic systems described by the Euler-Lagrange and online control architectures are illustrated in Fig. 2B and Fig. 2
equation (see Materials and Methods), including many types of (A and C), respectively.
aircraft such as (21, 47).
Data collection
To generate training data to learn a wind-invariant representation
MATERIALS AND METHODS of the unmodeled dynamics, the drone tracks a randomized trajec-
Overview tory with the baseline nonlinear controller for 2 min each in several
We consider a general robot dynamics model different static wind conditions. Figure 3A illustrates one trajectory
under the wind condition 13.3 km/hour (3.7 m/s). The set of input Domain shift problems
output pairs for the kth such trajectory is referred to as the kth One challenge of the optimization in Eq. 4 is the inherent domain
subdataset, Dwk, with the wind condition wk. Our dataset consists of shift in x caused by the shift in w. Recall that during data collection,
six different subdatasets with wind speeds from 0 to 22.0 km/hour we have a program flying the drone in different winds. The actual
(6.1 m/s), which are in the white interpolation region in Fig. 6. flight trajectories differ vastly from wind to wind because of the
The trajectory follows a polynomial spline between three waypoints: (i)
wind effect. Formally, the distribution of x k varies between k because
the current position and two randomly generated target positions. the underlying environment or context w has changed. For exam-
The spline is constrained to have zero velocity, acceleration, and ple, as depicted by Fig. 3C, the drone pitches into the wind, and the
jerk at the starting and ending waypoints. Once the end of one average degree of pitch depends on the wind condition. Note that
spline is reached, a new random spline is generated, and the process pitch is only one component of the state x. The domain shift in the
repeats for the duration of the training data flight. This process whole state x is even more drastic.
allows us to generate a large amount of data using a trajectory very Such inherent shifts in x bring challenges for deep learning. The
Downloaded from https://www.science.org at Academy of Mathematics and System Sciences, Cas on September 15, 2023
different from the trajectories used to test our method, such as the DNN may memorize the distributions of x in different wind
figure-8 in Fig. 1. By training and testing on different trajectories, conditions, such that the variation in the dynamics {f(x, w1), f(x,
we demonstrate that the learned model generalizes well to new w2), …, f(x, wK)} is reflected via the distribution of x, rather than the
trajectories. wind condition {w1, w2, ...,wK}. In other words, the optimization
Along each trajectory, we collect time-stamped data [ q, q̇ , u]. in Eq. 4 may lead to overfitting and may not properly find a wind-
Next, we compute the acceleration q¨ by fifthorder numerical differ- invariant representation .
entiation. Combining this acceleration with Eq. 1, we get a noisy To solve the domain shift problem, inspired by the work of
measurement of the unmodeled dynamics, y = f(x, w) + ϵ, where ϵ Ganin et al. (48), we propose the following adversarial optimization
includes all sources of noise (e.g., sensor noise and noise from framework
numerical differentiation), and x = [q; q̇ ] ∈ ℝ 2nis the state. Last, K N k 2
this allows us to define the dataset, D = {Dw 1, … , Dw k}, where ∑ ∑ ( ‖y(i) (i) (i)
h
,
max amin
,⋯a
1 K
k − (xk ) a k‖ − ⋅ loss(h((xk )), k)) (5)
k = 1i = 1
(i) N k
D w k = {x(i) (i)
k , yk = f(x(i)
k , w k ) + ϵk }i=1
(3) where h is another DNN that works as a discriminator to predict the
environment index out of K wind conditions, loss(•) is a classifica-
is the collection of Nk noisy input-output pairs with wind condition tion loss function (e.g., the cross entropy loss), ≥ 0 is a hyper-
wk. As we discussed in Results, to show that DAIML learns a model parameter to control the degree of regularization, k is the wind
that can be transferred between drones, we applied this data condition index, and (i) is the input-output pair index. Intuitively,
collection process on both the custom-built drone and the Intel h and play a zero-sum max-min game: The goal of h is to predict
Aero drone. the index k directly from (x) (achieved by the outer max); the goal
(i)
of is to approximate the label y k while making the job of h harder
The DAIML algorithm (achieved by the inner min). In other words, h is a learned regularizer
In this section, we will present the methodology and details of to remove the environment information contained in . In our
learning the representation function . In particular, we will first experiments, the output of h is a K-dimensional vector for the
introduce the goal of meta-learning, motivate the proposed algorithm classification probabilities of K conditions, and we use the cross
DAIML by the observed domain shift problem from the collected entropy loss for loss(·), which is given as
dataset, and finally discuss key algorithmic details.
Meta-learning goal K T
loss(h((x(i) (i)
k ) ) , k ) = − ∑ kj log(h ((xk ) ) e j) (6)
Given the dataset, the goal of meta-learning is to learn a representation j=1
(x), such that for any wind condition w, there exists a latent variable
a(w) that allows (x)a(w) to approximate f(x, w) well. where kj = 1 if k = j and kj = 0 otherwise, and e j is the standard
Formally, an optimal representation, , solves the following basis function.
optimization problem Algorithm 1: Domain adversarially invariant meta-learning (DAIML)
Hyperparameter: ≥ 0, 0 < ≤ 1, > 0
K N k 2
∑ ∑ ‖y(i) (i) Input: D = {Dw 1, ⋯ , Dw k}
,amin k − (xk ) a k‖ (4)
,⋯,a 1 K
k = 1i = 1 Initialize: Neural networks and h
Result: Trained neural networks and h
where ( ⋅ ) : ℝ2n → ℝn × h is the representation function and ak ∈ ℝh 1 repeat
is the latent linear coefficient. Note that the optimal weight ak is lines 2–9 until convergence
specific to each wind condition, but the optimal representation is 2 Randomly sample Dwk from D
shared by all wind conditions. In this article, we use a DNN to 3 Randomly sample two disjoint batches Ba (adaptation set) and
represent . In the Supplementary Materials (section S2), we prove B (training set) from Dwk 2
that for any analytic function f(x, w), the structure (x)a(w) can 4 Solve the least squares problem a *() =arg min a ∑i∈B a ‖y(i) (i)
k − (xk ) a‖
approximate f(x, w) with an arbitrary precision as long as the DNN 5 if ‖a*‖ > then *
has enough neurons. This result implies that the solved from the 6 a * ← ⋅ _ a
‖a *‖
optimization in Eq. 4 is a reasonable representation of the unknown 7 Train DNN using stochastic gradient descent (SGD) and
dynamics f(x, w). spectral normalization with loss
2
∑ (‖y (i) (i) * (i) we are ultimately interested in minimizing the position tracking
k − (xk ) a ‖ − ⋅ loss(h((xk ) ) , k ) )
i∈B error, and we can improve the adaptation using a more sophisticat-
8 if rand() ≤ then ed update law. Thus, in this section, we propose a more sophisticat-
9 Train DNN h using SGD with loss ∑ i∈B loss(h((x(i)
k ) ) , k) ed adaptation law for the linear coefficients based on a Kalman filter
Design of the DAIML algorithm estimator. This formulation results in automatic gain tuning for the
Last, we solve the optimization problem in Eq. 5 by the proposed update law, which allows the controller to quickly estimate parame-
algorithm DAIML (described in Algorithm 1 and illustrated in ters with large uncertainty. We further boost this estimator into a
Fig. 2B), which belongs to the category of gradientbased meta-learning composite adaptation law; that is, the parameter update depends
(31), but with least squares as the adaptation step. DAIML contains both on the prediction error in the dynamics model and on the
three steps: (i) The adaptation step (lines 4 to 6) solves a least tracking error, as illustrated in Fig. 2. This allows the system to
squares problem as a function of on the adaptation set Ba. (ii) The quickly identify and adapt to new wind conditions without requir-
training step (line 7) updates the learned representation on the ing persistent excitation. In turn, this enables online adaptation of
Downloaded from https://www.science.org at Academy of Mathematics and System Sciences, Cas on September 15, 2023
training set B, based on the optimal linear coefficient a* solved from the high-dimensional learned models from DAIML.
the adaptation step. (iii) The regularization step (lines 8 and 9) Our online adaptive control algorithm can be summarized by
updates the discriminator h on the training set. the following control law, adaptation law, and covariance update
We emphasize the following important features of DAIML: (i) equations, respectively
After the adaptation step, a* is a function of . In other words, in the
(q, q̇ ) aˆ
PD ⏟
training step (line 7), the gradient with respect to the parameters in u NF = M(q
) q¨ r + C(q, q̇ ) q̇ r + g(q) −
Ks −
(7)
feedback
the neural network will backpropagate through a*. Note that the nominal model feedforward terms learning‐based feedforward
an error ball around 0, q will exponentially converge to a propor- unless extra care is taken in constraining the learned model and tun-
tionate error ball around the desired trajectory qd(t) (see section ing the gains. Thus, we have designed our adaptation law to include
S5). Formulating our control law in terms of the composite veloc- a tracking error term, making Eq. 8 a composite adaptation law,
ity errors simplifies the analysis and gain tuning without loss of rigor. guaranteeing the stability of the closed-loop system (see Theo-
The baseline nonlinear (NL) control law using PID feedback is rem 1), and in turn simplifying the gain tuning process. The regu-
defined as larization term allows the stability result to be independent of the
persistent excitation of the learned model , which is particularly rel-
u NL = M(q
) q¨ r + C(q, q̇ ) q̇ r + g(q)
−
Ks − K I∫ sdt
(11) evant when using high-dimensional learned representations. The
nonlinear feedforward terms PID feedback adaptation gain and covariance matrix, P, acts as automatic gain
tuning for the adaptive controller, which allows the controller to
where K and KI are positive definite control gain matrices. Note quickly adapt when a new mode in the learned model is excited.
that a standard PID controller typically only includes the PID Stability and robustness guarantees
Downloaded from https://www.science.org at Academy of Mathematics and System Sciences, Cas on September 15, 2023
feedback terms, and gravity compensation. This only leads to local First, we formally define the representation error d(t) as the difference
exponential stability about a fixed point, but it is often sufficient for between the unknown dynamics f (q, q̇ , w)and the best linear weight
gentle tasks such as a UAV hovering and slow trajectories in static vector a given the learned representation (q, q̇ ), namely, d(t ) = f(q,
wind conditions. In contrast, this nonlinear controller includes q ̇, w ) − (q, q ̇ ) a(w). The measurement noise for the measured resid-
feedback on velocity error and feedforward terms to account for ual force is a bounded function ϵ(t) such that y(t) = f(t) + ϵ(t). If the
known dynamics and desired acceleration, which allows good environment conditions are changing, we consider the case that
tracking of dynamic trajectories in the presence of nonlinearities ȧ ≠ 0. This leads to the following stability theorem.
[e.g., M(q) and C(q, q) ̇ are nonconstant in attitude control]. However, Theorem 1. If we assume that the desired trajectory has bounded
this control law only compensates for changing wind conditions derivative and the system evolves according to the dynamics in Eq. 1,
and unmodeled dynamics through an integral term, which is the control law (Eq. 7), and the adaptation law (Eqs. 8 and 9), then
slow to react to changes in the unmodeled dynamics and distur- the position tracking error exponentially converges to the ball
bance forces.
Our method improves the controller by predicting the unmodeled lim
t→∞
‖~
q‖≤ sup
[C1 ‖d(t) ‖+ C2 ‖ϵ(t) ‖+ C3 (‖a(t) ‖+ ‖ȧ (t) ‖)] (12)
t
dynamics and disturbance forces, and in Table 1, we see a substantial
improvement gained by using our learning method. Given the where C1, C2, and C3 are three bounded constants depending on ,
learned representation of the residual dynamics, (q, q̇ ), and the R, Q, K, , M, and .
parameter estimate a ˆ, we replace the integral term with the learned
force term, f ˆ = aˆ, resulting in our control law in Eq. 7. Neural-Fly Implementation details
uses trained using DAIML on a dataset collected with the same Quadrotor dynamics
drone. Neural-Fly-Transfer uses trained using DAIML on a Now, we introduce the quadrotor dynamics. Consider states given
dataset collected with a different drone, the Intel Aero drone. Neural- by global position p ∈ ℝ3, velocity v ∈ ℝ3, attitude rotation matrix
Fly-Constant does not use any learning but instead uses = I and R ∈ SO(3), and body angular velocity ∈ ℝ3. Then, the dynamics
is included to demonstrate that the main advantage of our method of a quadrotor are
comes from the incorporation of learning. The learning-b ased
methods, Neural-Fly and Neural-Fly-Transfer, outperform Neural- ṗ = v,mv̇ = mg + Rf T + f (13a)
Fly-Constant because the compact learned representation can effec-
tively and quickly predict the aerodynamic disturbances online in Ṙ = RS( ) , J̇ = J × + T (13b)
Fig. 5. This comparison is further discussed in the Supplementary
Materials (section S7). where m is the mass, J is the inertia matrix of the quadrotor, S(·) is
Composite adaptation law the skew-symmetric mapping, g is the gravity vector, fT = [0,0, T]T
We define an adaptation law that combines a tracking error update and T = [x, y, z]T are the total thrust and body torques from four
term, a prediction error update term, and a regularization term in rotors predicted by the nominal model, and f = [fx, fy, fz]T are forces
Eqs. 8 and 9, where y is a noisy measurement of f, is a damping resulting from unmodeled aerodynamic effects due to varying wind
gain, P is a covariance matrix that evolves according to Eq. 9, and Q conditions.
and R are two positive definite gain matrices. Some readers may We cast the position dynamics in Eq. 13a into the form of Eq. 1,
note that the regularization term, prediction error term, and covari- by taking M(q) = mI, C (q, q ̇ ) ≡ 0, and u = Rf T. Note that the
ance update, when taken alone, are in the form of a Kalman-Bucy quadrotor attitude dynamics (Eq. 13b) is also a special case of Eq. 1
filter. This Kalman-Bucy filter can be derived as the optimal estimator (15, 51), and thus our method can be extended to attitude control.
that minimizes the variance of the parameter error (50). The We implement our method in the position control loop; that is, we
Kalman-Bucy filter perspective provides intuition for tuning the use our method to compute a desired force ud. Then, the desired
adaptive controller: The damping gain corresponds to how quickly force is decomposed into the desired attitude Rd and the desired
the environment returns to the nominal conditions, Q corresponds thrust Td using kinematics (see Fig. 2). Then, the desired attitude
to how quickly the environment changes, and R corresponds to the and thrust are sent to the onboard PX4 flight controller.
combined representation error d and measurement noise ϵ. More Neural network architectures and training details
discussion on the gain tuning process is included in section S6. In practice, we found that in addition to the drone velocity v, the
However, naively combining this parameter estimator with the aerodynamic effects also depend on the drone attitude and the rotor
controller can lead to instabilities in the closed-loop system behavior rotation speed. To that end, the input state x to the DNN is an
11-d vector, consisting of the drone velocity (3-d), the drone attitude 8. G. Shi, X. Shi, M. O'Connell, R. Yu, K. Azizzadenesheli, A. Anandkumar, Y. Yue, S.-J. Chung,
represented as a quaternion (4-d), and the rotor speed commands Neural lander: Stable drone landing control using learned dynamics, in Proceedings of the
2019 International Conference on Robotics and Automation (ICRA) (IEEE, 2019), pp. 9784–9790.
as a pulse width modulation (PWM) signal (4-d) (see Figs. 2 and 3).
9. G. Shi, W. Hönig, Y. Yue, S.-J. Chung, Neural-swarm: Decentralized close-proximity
The DNN has four fully connected hidden layers, with an archi- multirotor control using learned interactions, in Proceedings of the 2020 IEEE International
tecture 11 → 50 → 60 → 50 → 4 and rectified linear unit activation. Conference on Robotics and Automation (ICRA) (IEEE, 2020), pp. 3241–3247.
We found that the three components of the wind-effect force, fx, fy, 10. G. Shi, W. Hnig, X. Shi, Y. Yue, S.-J. Chung, Neural-swarm2: Planning and control of heterogeneous
and fz, are highly correlated and share common features, so we multirotor swarms using learned interactions. IEEE Trans. Robot. 38, 1063–1079 (2022).
11. G. Torrente, E. Kaufmann, P. Föhn, D. Scaramuzza, Data-driven mpc for quadrotors.
use as the basis function for all the components. Therefore, the IEEE Robot. Autom. Lett. 6, 3769–3776 (2021).
wind-effect force f is approximated by 12. P. L. Bartlett, D. J. Foster, M. J. Telgarsky, Spectrally-normalized margin bounds for neural
networks. Adv. Neural Inform. Process. Syst. 30, 6240 (2017).
[ 0 0 (x)][a z]
13. L. Meier, P. Tanskanen, L. Heng, G. H. Lee, F. Fraundorfer, M. Pollefeys, PIXHAWK—A micro
(x) 0 0 a x aerial vehicle design for autonomous flight using onboard computer vision. Autonomous
Downloaded from https://www.science.org at Academy of Mathematics and System Sciences, Cas on September 15, 2023
f ≈ 0 (x)
0 a y (14) Robots 33, 21–39 (2012).
14. D. Mellinger, V. Kumar, Minimum snap trajectory generation and control for quadrotors,
in Proceedings of the 2011 IEEE International Conference on Robotics and Automation
(IEEE, 2011), pp. 2520–2525.
where ax, ay, az ∈ ℝ4 are the linear coefficients for each component of 15. J.-J. E. Slotine, W. Li, Applied Nonlinear Control (Prentice Hall, 1991).
the wind-effect force. We followed Algorithm 1 to train in PyTorch, 16. P. A. Ioannou, J. Sun, Robust Adaptive Control (Prentice-Hall Upper Saddle River, 1996), vol. 1.
17. M. Krstic, P. V. Kokotovic, I. Kanellakopoulos, Nonlinear and Adaptive Control Design (John
which is an open-source deep learning framework. We refer to the Wiley & Sons, Inc., 1995).
Supplementary Materials for hyperparameter details (section S3). 18. K. S. Narendra, A. M. Annaswamy, Stable Adaptive Systems (Courier Corporation, 2012).
Note that we explicitly include the PWM as an input to the 19. J. A. Farrell, M. M. Polycarpou, Adaptive Approximation Based Control (John Wiley & Sons,
network. The PWM information is a function of u = RfT, which Ltd., 2006); https://onlinelibrary.wiley.com/doi/pdf/10.1002/0471781819.fmatter.
makes the controller law (e.g., Eq. 7) nonaffine in u. We solve this 20. K. A. Wise, E. Lavretsky, N. Hovakimyan, Adaptive control of flight: theory, applications,
and open problems, in Proceedings of the 2006 American Control Conference (IEEE, 2006).
issue by using the PWM from the last time step as an input to , to 21. X. Shi, P. Spieler, E. Tang, E.-S. Lupu, P. Tokumaru, S.-J. Chung, Adaptive Nonlinear Control
compute the desired force ud at the current time step. Because of Fixed-Wing VTOL with Airflow Vector Sensing, in 2020 IEEE International Conference on
we train using spectral normalization (see Algorithm 1), this Robotics and Automation (ICRA) (IEEE, 2020), pp. 5321–5327.
method is stable and guaranteed to converge to a fixed point, as 22. A. Rahimi, B. Recht, Random features for large-scale kernel machines, in Proceedings of
the 20th International Conference on Neural Information Processing Systems (Advances in
discussed in (8).
neural information processing systems, 2007), pp. 1177–1184.
Controller implementation 23. S. Lale, K. Azizzadenesheli, B. Hassibi, A. Anandkumar, Model Learning Predictive Control
For experiments, we implemented a discrete form of the Neural-Fly in Nonlinear Dynamical Systems, in Proceedings of the 2021 60th IEEE Conference on
controllers, given in section S4. For INDI, we implemented the Decision and Control (CDC) (IEEE, 2021), pp. 757–762. ISSN: 2576-2370.
position and acceleration controller from sections III.A and III.B in 24. J. Nakanishi, J. Farrell, S. Schaal, A locally weighted learning composite adaptive
controller with structure adaptation, in IEEE/RSI International Conference on Intelligent
(4). For ℒ1 adaptive control, we followed the adaptation law first
Robots and Systems (IEEE, 2002), vol. 1, pp. 882–889.
presented in (6) and used in (7) and augmented the nonlinear 25. F.-C. Chen, H. K. Khalil, Adaptive control of a class of nonlinear discrete-time systems
baseline control with f ˆ = − u ℒ 1. using neural networks. IEEE Trans. Automatic Control 40, 791–801 (1995).
26. E. N. Johnson, A. J. Calise, Limited authority adaptive flight control for reusable launch
vehicles. J. Guid. Control Dynam. 26, 906–913 (2003).
SUPPLEMENTARY MATERIALS
27. K. S. Narendra, S. Mukhopadhyay, Adaptive control using neural networks
www.science.org/doi/10.1126/scirobotics.abm6597
and approximate models. IEEE Trans. Neural Netw. 8, 475–485 (1997).
Sections S1 to S8
28. M. Bisheban, T. Lee, Geometric adaptive control with neural networks for a quadrotor
Figs. S1 to S4
in wind fields. IEEE Trans. Control Syst. Technol. 29, 1533–1548 (2021).
Tables S1 to S3
References (52–57) 29. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436–444 (2015).
30. C. Finn, P. Abbeel, S. Levine, Model-agnostic meta-learning for fast adaptation of deep
networks, in Proceedings of the 34th International Conference on Machine Learning (PMLR,
REFERENCES AND NOTES 2017), pp. 1126–1135.
1. P. Foehn, A. Romero, D. Scaramuzza, Time-optimal planning for quadrotor waypoint 31. T. M. Hospedales, A. Antoniou, P. Micaelli, A. J. Storkey, Meta-learning in neural networks:
flight. Sci. Robot. 6, abh1221 (2021). A survey, in IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE, 2021), pp. 1.
2. M. Faessler, A. Franchi, D. Scaramuzza, Differential flatness of quadrotor dynamics subject 32. A. Nagabandi, I. Clavera, S. Liu, R. S. Fearing, P. Abbeel, S. Levine, C. Finn, Learning to
to rotor drag for accurate tracking of high-speed trajectories. IEEE Robot. Autom. Lett. 3, adapt in dynamic, real-world environments through meta-reinforcement learning. arXiv:
620–626 (2018). 1.803.11347 [cs.LG] (2018).
3. P. Ventura Diaz, S. Yoon, High-fidelity computational aerodynamics of multi-rotor 33. X. Song, Y. Yang, K. Choromanski, K. Caluwaerts, W. Gao, C. Finn, J. Tan, Rapidly adaptable
unmanned aerial vehicles, in 2018 AIAA Aerospace Sciences Meeting (2018), p. 1266. legged robots via evolutionary meta-learning, in Proceedings of the 2020 IEEE/RSI
4. E. Tai, S. Karaman, Accurate tracking of aggressive quadrotor trajectories using International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2020), pp. 3769–3776.
incremental nonlinear dynamic inversion and differential flatness. IEEE Trans. Control 34. S. Belkhale, R. Li, G. Kahn, R. McAllister, R. Calandra, S. Levine, Model-based
Syst. Technol. 29, 1203–1218 (2021). meta-reinforcement learning for flight with suspended payloads. IEEE Robot. Autom.
5. S. Mallikarjunan, B. Nesbitt, E. Kharisov, E. Xargay, N. Hovakimyan, C. Cao, L1 adaptive Lett. 6, 1471–1478 (2021).
controller for attitude control of multirotors, in A/AA Guidance, Navigation, and Control 35. C. D. McKinnon, A. P. Schoellig, Meta learning with paired forward and inverse models
Conference (American Institute of Aeronautics and Astronautics, 2012). for efficient receding horizon control. IEEE Robot. Autom. Lett. 6, 3240–3247 (2021).
6. J. Pravitra, K. A. Ackerman, C. Cao, N. Hovakimyan, E. A. Theodorou, LI-adaptive MPPI 36. I. Clavera, J. Rothfuss, J. Schulman, Y. Fujita, T. Asfour, P. Abbeel, Model-based
architecture for robust and agile control of multirotors, in Proceedings of the 2020 IEEE/RSI reinforcement learning via meta-policy optimization, in Conference on Robot Learning
International Conference on Intelligent Robots and Systems (IROS) (2020), pp. 7661–7666. (PMLR, 2018), pp. 617–629.
ISSN: 2153-0866. 37. M. O'Connell, G. Shi, X. Shi, S.-J. Chung, Meta-learning-based robust adaptive flight
7. D. Hanover, P. Foehn, S. Sun, E. Kaufmann, D. Scaramuzza, Performance, precision, and control under uncertain wind conditions. arXiv: 2103.01932 [cs.RO] (2021).
payloads: Adaptive nonlinear MPC for quadrotors. IEEE Robot. Autom. Lett. 7, 690-697 38. S. M. Richards, N. Azizan, J.-J. E. Slotine, M. Pavone, Adaptive-control-oriented
(2021). meta-learning for nonlinear systems. arXiv:2103.04490 [cs.RO] (2021).
39. M. Peng, B. Zhu, J. Jiao, Linear representation meta-reinforcement learning for instant 54. L. Dieci, T. Eirola, Positive definiteness in the numerical solution of Riccati differential
adaptation. arXiv: 2101.04750 [cs.LG] (2021). equations. Numerische Mathematik 67, 303–313 (1994).
40. J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, M. Hutter, Learning quadrupedal locomotion 55. R. E. Kalman, A new approach to linear filtering and prediction problems. J. Basic Eng. 82,
over challenging terrain. Sci. Robot. 5, eabc5986 (2020). 35–45 (1960).
41. J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, P. Abbeel, Domain randomization for 56. H. K. Khalil, Nonlinear Systems, 3rd Edition (Prentice Hall, 2002).
transferring deep neural networks from simulation to the real world, in Proceedings of the 57. Multicopter PID Tuning Guide (Advanced/Detailed) I PX4 User Guide.
2017 IEEE/RS] International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2017),
pp. 23–30. Acknowledgments: A.A. is also affiliated with NVIDIA Corporation, and Y.Y. is also with
42. F. Ramos, R. C. Possas, D. Fox, Bayessim: Adaptive domain randomization via probabilistic associated Argo AI. K.A. is currently affiliated with Purdue University. We thank J. Burdick and
inference for robotics simulators. arXiv: 1906.01728 [cs.RO] (2019). J.-J. E. Slotine for their helpful discussions. We thank M. Anderson for help with configuring the
43. D. Morgan, G. P. Subramanian, S.-J. Chung, F. Y. Hadaegh, Swarm assignment quadrotor platform, and M. Anderson and P. Spieler for help with hardware troubleshooting.
and trajectory optimization using variable-swarm, distributed auction assignment We also thank N. Badillo and L. Pabon Madrid for help in experiments. Funding: This research
and sequential convex programming. Int. J. Robot. Rese. 35, 1261–1285 (2016). was developed with funding from the Defense Advanced Research Projects Agency (DARPA).
44. X. Shi, K. Kim, S. Rahili, S.-J. Chung, Nonlinear control of autonomous flying cars with This research was also conducted in part with funding from Raytheon Technologies. The
Downloaded from https://www.science.org at Academy of Mathematics and System Sciences, Cas on September 15, 2023
wings and distributed electric propulsion, in Proceedings of the 2018 IEEE Conference on views, opinions, and/or findings expressed are those of the authors and should not be
Decision and Control (CDC) (IEEE, 2018), pp. 5326–5333. interpreted as representing the official views or policies of the Department of Defense or the
45. Y. K. Nakka, A. Liu, G. Shi, A. Anandkumar, Y. Yue, S.-J. Chung, Chance-constrained U.S. Government. The experiments reported in this article were conducted at Caltech's Center
trajectory optimization for safe exploration and learning of nonlinear systems. IEEE Robot. for Autonomous Systems and Technologies (CAST). Author contributions: S.-J.C. and Y.Y.
Autom. Lett. 6, 389 (2021). directed the research activities. G.S. and M.O. designed and implemented the metalearning
46. A. Loquercio, E. Kaufmann, R. Ranftl, M. Muller, V. Koltun, D. Scaramuzza, Learning algorithm under the guidance of Y.Y., K.A., A.A., and S.-J.C., while the last-layer adaptation idea
high-speed flight in the wild. Sci. Robot. 6, eabg5810 (2021). was started with a discussion by G.S., M.O., X.S., and S.-J.C. M.O. and G.S. designed and
47. K. Kim, P. Spieler, E.-S. Lupu, A. Ramezani, S.-J. Chung, A bipedal walking robot that can implemented the adaptive control algorithm with inputs from S.-J.C. and X.S. M.O. and G.S.
fly, slackline, and skateboard. Sci. Robot. 6, eabf8136 (2021). performed experiments and evaluated the results. M.O. conducted the theoretical analysis of
48. Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, the meta-learning based adaptive controller with input from S.-J.C., G.S., and X.S. G.S. analyzed
V. Lempitsky, Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, the learning algorithm with feedback from Y.Y., K.A., A.A., and S.-J.C. G.S. and M.O. created all
2096 (2017). the figures and videos with input from the other authors. All authors prepared the
49. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, manuscript. Competing interests: The authors declare that they have no competing
Y. Bengio, Generative adversarial nets. Adv. Neural Inform. Process. Syst. 27, (2014). interests. Data and materials availability: All data needed to evaluate the conclusions in
50. R. E. Kalman, R. S. Bucy, New results in linear filtering and prediction theory. J. Basic Eng. the article are present in the article or in the Supplementary Materials. We have provided the
83, 95 (1961). machine learning model training code, training data, and experimental data at github.com/
51. R. M. Murray, Z. Li, S. S. Sastry, A Mathematical Introduction to Robotic Manipulation aerorobotics/neural-fly.
(CRC Press, ed. 1, 2017).
52. L. Trefethen, Multivariate polynomial approximation in the hypercube. Proc. Am. Math. Soc. Submitted 11 October 2021
145, 4837–4844 (2017). Accepted 12 April 2022
53. D. Yarotsky, Error bounds for approximations with deep relu networks. Neural Netw. 94, Published 4 May 2022
103–114 (2017). 10.1126/scirobotics.abm6597
Downloaded from https://www.science.org at Academy of Mathematics and System Sciences, Cas on September 15, 2023
Sci. Robot., 7 (66), eabm6597.
DOI: 10.1126/scirobotics.abm6597
Science Robotics (ISSN ) is published by the American Association for the Advancement of Science. 1200 New York Avenue NW,
Washington, DC 20005. The title Science Robotics is a registered trademark of AAAS.
Copyright © 2022 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim
to original U.S. Government Works