Spatial artificial intelligence: how drones navigate
(Nanowerk News) People can see their surroundings in three dimensions and can quickly spot potential dangers in everyday situations. Drones have to learn this. Prof. Stefan Leutenegger calls the intelligence needed for this task ‘spatial artificial intelligence’, or spatial AI. This new approach will be used by cartographers to map forests, in ship inspections, and when constructing walls.
In humans, this is completely automatic: they recognize objects and their characteristics, can judge distances and hazards, and interact with other people. Stefan Leutenegger talks about a coherent 3D representation of the surrounding area, resulting in a uniform overall picture.
Enabling drones to distinguish between static and dynamic elements and recognize other actors: that is one of the more important areas for the professor of machine learning in robotics at TUM, who is also the head of the field of artificial intelligence innovation at Munich Institute of Robotics and Machine Intelligence (MIRMI).
Spatial AI, step 1: estimate the robot’s position in space and map it
Prof. Leutenegger uses spatial AI to provide drones with the on-board intelligence needed to fly through a forest without bumping into delicate branches, to perform 3D printing or to inspect the hold of a tanker or cargo ship. Spatial AI consists of several components adapted for specific tasks. Starting with sensor selection:
– Computer vision: Drones use one or two cameras to view their surroundings. For depth perception, two cameras are needed – just as humans need two eyes. Leutenegger uses two sensors and compares the images to get a sense of depth. There is also a depth camera that produces 3D images directly.
– Inertia sensor: This sensor measures acceleration and angular velocity to detect the movement of objects in space.
“Visual and inertial sensors complement each other very well,” said Leutenegger. That’s because combining their data produces incredibly precise images of the drone’s movement and its static environment. This allows the entire system to judge its own position in space. This is necessary for applications such as autonomous deployment of robots.
It also enables detailed, high-resolution mapping of the robot’s static environment – a critical requirement for obstacle avoidance. Initially, mathematical and probabilistic models were used without artificial intelligence. Leutenegger refers to this as the lowest level of “Spatial AI” – an area where he did research at Imperial College in London before coming to TUM.
Spatial AI, step 2: Neural networks for understanding the environment
Artificial intelligence in the form of neural networks plays an important role in the semantic mapping of a region. This involves a deeper understanding of the robot’s environment. Through deep learning, categories of information that are understandable to humans and clearly visible in images can be digitally captured and mapped. To do this, the neural network uses image recognition based on 2D images to represent them on a 3D map.
The resources needed for a deep learning introduction depend on how many details need to be captured to perform a given task. Distinguishing a tree from the sky is easier than precisely identifying a tree or determining its state of health. For this particular type of image recognition, there is often not enough data for a neural network to learn.
Therefore, one of Leutenegger’s research goals was to develop a machine learning method that can efficiently use sparse training data and allow robots to continuously learn while operating. In more advanced forms of spatial AI, the goal is to recognize objects or parts of objects even when they are moving.
Recent AI projects from professor MIRMI: forest mapping, ship inspection, construction robots
Spatial artificial intelligence has been applied in three research projects:
– Building wall: In construction robotics, robots equipped with manipulators are used. The job is in SPAICR project, with funding over four years by the Georg Nemetschek Institute, will build and dismantle the wall-like structure. A particular challenge in this project, where Prof. Leutenegger collaborated with Prof. Kathrin Dörfler (TUM professor of digital fabrication), is enabling robots to work without motion tracking, in other words without external infrastructure. In contrast to previous research projects, which used clearly marked rooms in the laboratory with orientation points, the aim was for the robot to operate with precision at each building site.
– Forest digitization: In the EU project Digiforest, University of Bonn, University of Oxford, ETH Zürich, Norwegian University of Science and Technology and TUM are working to develop “digital technologies for sustainable forestry”. For that, forests need to be mapped. Where is which tree? How healthy is it? Is there a disease? Where do forests need to be thinned out and where do new plantings need to be done? “This research will provide additional information for foresters to make decisions,” explained Prof. Leutenegger. TUM assignment: Drone AI Prof. Leutenegger will fly independently over the forest and chart it. They have to find their way around trees despite wind and small branches to produce a complete map of the forest area.
– Checking the ship: In the EU project CAR ASSESSMENT, the aim is to send drones into the interior of tankers and freighters to inspect the inner walls. They will be equipped with ultrasonic sensors, among other instruments, to detect cracks. For this task, drones must be able to fly autonomously in confined spaces with poor radio connectivity. In this app too, motion tracking is not possible.
Spatial AI creates the basis for decisions
“We are working to provide people in various fields with sufficient amounts of data to reach the right decisions,” said Prof. Leutenegger. He added, however: “Our robots complement each other. They increase human capabilities and will free them from dangerous and repetitive tasks.”
Z. Landgraf, R. Scona, S. Leutenegger et al: SIMstack: Generative Forms and Instance Models for Unordered Object Stacks; https://ieeexplore.ieee.org/document/9710412
S. Zhi, T. Laidlow et al: On-Site Scene Labeling and Understanding with Implicit Scene Representations; https://ieeexplore.ieee.org/document/9710936
G. Gallego, T. Delbrück, S. Leutenegger et al: Event-Based Vision: Surveys; https://ieeexplore.ieee.org/document/9138762