
Completely autonomous real-world reinforcement learning with apps for mobile manipulation
By Jędrzej Orbik, Charles Sun, Coline Devin, Glen Berseth
Reinforcement learning provides a conceptual framework for autonomous agents to learn from experience, analogous to how someone might train a pet with a treat. But practical applications of reinforcement learning are often far from natural: instead of using RL to learn by trial and error by actually trying the desired task, typical RL applications use separate (usually simulated) training phases. For example, AlphaGo he doesn’t learn to play Go by competing against thousands of humans, but by playing against himself in a simulation. While this kind of simulation training is attractive for games where the rules are perfectly known, applying it to real-world domains such as robotics can require a variety of complex approaches, such as use of simulated dataor instrumenting real-world environments in multiple ways to make training feasible under laboratory conditions. Can we design reinforcement learning systems for robots that allow them to learn directly “on-the-job”, while doing the tasks they have to do? In this blog post, we will cover ReLMM, a system we developed that learns to clean a room live with real robots through continuous learning.
We evaluate our method on various tasks ranging in difficulty. The upper left task has uniform white blobs for unhindered picking, while other rooms have objects of various shapes and colors, obstacles that increase navigation difficulty and obscure objects and patterned rugs that make objects difficult to see on the ground.
In order to enable “on-the-job” training in the real world, the difficulty of accumulating more experience was a barrier. If we can make real-world training easier, by making the data collection process more autonomous without the need for human monitoring or intervention, we can further benefit from the simplicity of agents learning from experience. In this work, we design an “on-the-job” mobile robot training system for cleaning by learning to handle objects in various rooms.
Lesson 1: The Benefits of Modular Policies for Robots.
People are not born one day and have a job interview the next. There are many levels of tasks that people study before they apply for a job because we start with the easier ones and build on them. At ReLMM, we exploit this concept by allowing robots to train common reusable skills, such as grasping, by first encouraging the robot to prioritize practicing these skills before learning later skills, such as navigation. Learning in this way has two advantages for robotics. The first advantage is that when an agent focuses on learning a skill, it is more efficient at gathering data around the local state distribution of that skill.
That is shown in the figure above, where we evaluate the amount of prioritized grip experience required to generate efficient cellular manipulation training. The second advantage of the multi-level learning approach is that we can examine the models trained for various tasks and ask them questions, such as, “can you understand anything now” which is useful for navigation training which we will describe next.
Training this multi-level policy is not only more efficient than learning both skills at the same time, it also allows the grasping controller to inform the navigation policy. Has a model that estimates the uncertainty in the success of the grip (Ours above) can be used to improve navigation roaming by passing through areas with no gripable objects, as opposed to No Uncertainty Bonus who do not use this information. The model can also be used to relabel data during training so that in the unfortunate case when the gripping model unsuccessfully attempts to capture an object within its range, the gripping policy can still provide some signal by indicating that the object is there but the object is grasping. policy has not yet learned how to understand it. In addition, studying a modular model has engineering benefits. Modular training allows for reuse of skills that are easier to learn and can allow building intelligent systems one at a time. This is useful for many reasons, including safety evaluation and understanding.
Lesson 2: The learning system beats the hand coding system, with time
Many of the robotics tasks we see today can be accomplished with varying degrees of success using hand-engineered controllers. For our room cleaning task, we designed a hand-engineered controller that finds objects using image grouping and turns towards the closest detected object at each step. This expertly designed controller works wonders on visually striking ball socks and taking reasonable paths around obstacles. but can’t learn optimal paths to quickly collect objects, and struggles with visually diverse rooms. As shown in video 3 below, scripted policy is interrupted by a white patterned carpet while trying to find more white objects to grip onto.
1)
2)
3)
4)
We show a comparison between (1) our policies at the beginning of the training (2) our policies at the end of the training (3) written policies. In (4) we can see the robot’s performance increasing over time, and eventually exceeding the stated policy of quickly collecting objects in the room.
Given that we can use experts to code these hand-engineered controllers, what’s the point of learning? An important limitation of hand-engineered controllers is that they are tuned for a specific task, such as gripping a white object. When various objects are introduced, which differ in color and shape, the original setup may no longer be optimal. Rather than requiring further manual engineering, our learning-based method adapts to various tasks by drawing on its own experience.
However, the most important lesson is that even if hand-engineered controllers are capable, learning agents eventually surpass them given sufficient time. This learning process itself is autonomous and takes place while the robot is doing its work, making it relatively inexpensive. This demonstrates the power of learning agents, which can also be thought of as a generic way of doing an “expert manual tuning” process for any kind of task. The learning system has the ability to create entire control algorithms for robots, and is not limited to setting some parameters in scripts. A key step in this work is enabling these real-world learning systems to independently collect the data necessary to enable successful learning methods.
This post is based on the paper “Fully Autonomous Real-World Reinforcement Learning with Applications for Mobile Manipulation”, presented at CoRL 2021. You can find more details at our paperto us website and in videos. We provide code to reproduce our experiment. We thank Sergey Levine for his valuable feedback on this blog post.
BAIR Blog is the official blog of the Berkeley Artificial Intelligence Research (BAIR) Lab.
BAIR Blog is the official blog of the Berkeley Artificial Intelligence Research (BAIR) Lab.