
A step towards a safe and reliable autopilot for flying
MIT researchers developed a machine learning technique that can autonomously drive a car or fly a plane through extremely difficult “stabilization-avoid” scenarios, in which a vehicle must stabilize its trajectory to arrive and stay in a destination area, while avoiding obstacles. . Image: Courtesy of the researchers
By Adam Zewe | MIT News Agency
In the movie “Top Gun: Maverick,” Maverick, played by Tom Cruise, is tasked with training young pilots to complete a seemingly impossible mission — fly their jet deep into a rocky canyon, stay so low to the ground that it cannot be detected by radar, then quickly exit the canyon at extreme angles, dodging walls. rock. Spoiler alert: With Maverick’s help, these human pilots complete their mission.
A machine, on the other hand, will struggle to complete the same task. For an autonomous aircraft, for example, the easiest path to a target is at odds with what the engine needs to do to avoid colliding with a canyon wall or remaining undetected. Many existing AI methods are unable to resolve this conflict, known as the stabilization-avoid problem, and will not be able to safely reach their goal.
MIT researchers have developed a new technique that can solve complex stabilization-avoid problems better than other methods. Their machine learning approach matches or exceeds the security of existing methods while providing a tenfold increase in stability, meaning the agent reaches and remains stable within its target region.
In an experiment that would make Maverick proud, their technique effectively piloted a simulated jet plane through a narrow corridor without crashing into the ground.
“This has been a long and challenging matter. Many people have seen it but don’t know how to deal with such high-dimensional and complex dynamics,” said Chu ChufanWilson Assistant Professor of Aviation and Astronautics, member of the Information and Decision Systems Laboratory (LIDS), and senior author of a new paper on this technique.
Fan joins lead author Oswin So, a graduate student. This paper will be presented at the Robotics: Science and Systems conference.
Steady-avoid challenges
Many approaches tackle complex stabilization-avoid problems by simplifying systems so they can solve them with direct mathematics, but simplified results often don’t match real-world dynamics.
A more effective technique uses reinforcement learning, a machine learning method in which agents learn by experimenting with rewards for behavior that brings them closer to a goal. But there’s really two goals here—staying steady and avoiding obstacles—and finding the right balance is tedious.
The MIT researchers broke the problem down into two steps. First, they reframe the avoidance-stabilization problem as a constrained optimization problem. In this setup, completing optimizations allows the agent to achieve and stabilize its goals, which means the agent stays within a certain territory. By imposing constraints, they ensure agents avoid obstacles, So explains.
Then for the second step, they reformulate the constrained optimization problem into a mathematical representation known as the inscription shape and solve it using a deep reinforcement learning algorithm. The inscription form allows them to bypass the difficulties that other methods face when using reinforcement learning.
“But deep reinforcement learning wasn’t designed to solve the inscription form of optimization problems, so we couldn’t simply incorporate it into our problem. We have to derive a mathematical expression that works for our system. Once we have that new derivative, we combine it with some of the existing engineering tricks used by other methods,” said So.
There are no points for second place
To test their approach, they designed a number of control experiments with different initial conditions. For example, in some simulations, an autonomous agent needs to reach and stay within the target area while performing drastic maneuvers to avoid obstacles in its path of collision.

This video shows how the researchers used their technique to effectively fly a simulated jetliner in a scenario where it had to stabilize to a target near the ground while maintaining a very low altitude and remaining within a narrow flight corridor. Thank you researchers.
When compared to some baselines, their approach is the only one that can stabilize all trajectories while maintaining safety. To push their method further, they used it to fly simulated jet planes in scenarios that might be seen in “Top Gun”. film. The jet must stabilize to a target near the ground while maintaining a very low altitude and remaining within a narrow flight corridor.
This simulated jet model was open sourced in 2018 and has been designed by flight control experts as a test challenge. Can researchers come up with scenarios that their controllers can’t fly? But the model is so complicated that it’s hard to work with, and it still can’t handle complex scenarios, said Fan.
MIT’s research controllers were able to prevent the jet from crashing or stalling while stabilizing to the target much better than any baseline.
In the future, this technique could be a starting point for designing highly dynamic robotic controllers that must meet security and stability requirements, such as autonomous delivery drones. Or it can be implemented as part of a larger system. Perhaps the algorithm only fires when the car is cruising down a snowy road to help drivers navigate back to a stable trajectory safely.
Navigating extreme scenarios that humans can’t handle is where their approach really shines, So added.
“We believe that the goal we should strive for as a field is to provide the security and stability assurance reinforcement learning we need to give us assurance as we deploy these controllers to mission critical systems. We think this is a promising first step toward achieving that goal,” he said.
Going forward, the researchers want to improve their technique so that it is better able to take uncertainty into account when completing optimizations. They also wanted to investigate how well the algorithm performs when applied to hardware, as there would be a mismatch between model dynamics and real-world dynamics.
“Professor Fan’s team has improved reinforcement learning performance for dynamic systems that put safety first. Instead of just reaching the destination, they create a controller that ensures the system can reach its target safely and stay there indefinitely,” said Stanley Bak, assistant professor in the Department of Computer Science at Stony Brook University, who was not involved in the study. . “Their improved formulation allowed the successful fabrication of safe controllers for complex scenarios, including a 17-nation nonlinear jet aircraft model designed in part by researchers from the Air Force Research Laboratory (AFRL), which combines nonlinear differential equations with lift and drag tables. ”
This work was funded, in part, by the MIT Lincoln Laboratory under the Safety in Aerobatic Flight Regimes program.
MIT News