A faster way to teach robots

4 minutes read

[ad_1]

Researchers from MIT and elsewhere have developed techniques that allow humans to efficiently fine-tune robots that fail to complete a desired task—such as picking up a unique cup—with minimal human effort. Image: Jose-Luis Olivares/MIT with image from iStock and The Coop

By Adam Zewe | MIT News Agency

Imagine buying a robot to do household tasks. These robots are factory built and trained for a specific set of tasks and have never before seen the stuff in your home. When you ask it to pick up a cup from your kitchen counter, it may not recognize yours (perhaps because it’s painted with an unusual image of, say, MIT’s mascot, Tim the Beaver). So, the robot failed.

“Right now, the way we train these robots, when they fail, we don’t know why. So you’d just throw up your hands and say, ‘Okay, I guess we should start over.’ This critical missing component of the system allows the robot to show why it failed so the user can provide feedback,” said Andi Pengan electrical engineering and computer science (EECS) graduate student at MIT.

Peng and his collaborators at MIT, New York University, and the University of California at Berkeley made a framework which allows humans to quickly teach robots what they want, with minimal effort.

When the robot fails, the system uses an algorithm to generate counterfactual explanations that explain what needs to change for the robot to succeed. For example, maybe the robot can take a mug if it is a certain color. It shows these counterfactuals to humans and asks for feedback on why the robot failed. The system then uses this feedback and counterfactual explanations to generate new data that it uses to improve the robot.

Enhancements involve customizing a machine learning model that has been trained to perform one task so that it can perform a second, similar task.

The researchers tested this technique in simulations and found that it can teach robots more efficiently than other methods. Robots trained with this framework perform better, while the training process consumes less human time.

This framework can help robots learn more quickly in new environments without requiring users to have technical knowledge. In the long term, this could be a step towards enabling general-purpose robots to perform everyday tasks efficiently for the elderly or individuals with disabilities in a variety of settings.

Peng, the lead author, is joined by co-author Aviv Netanyahu, an EECS graduate student; Mark Ho, assistant professor at Stevens Institute of Technology; Tianmin Shu, MIT postdoc; Andreea Bobu, a graduate student at UC Berkeley; and senior writer Julie Shahan MIT professor of aeronautics and astronautics and director of the Interactive Robotics Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Pulkit Agrawala professor at CSAIL. This research will be presented at the International Conference on Machine Learning.

On the job training

Robots often fail because of shifts in distribution — the robot is confronted with objects and spaces it did not see during training, and the robot doesn’t understand what to do in this new environment.

One way of retraining a robot for a specific task is imitation learning. Users can demonstrate the correct task to teach the robot what to do. If a user tries to teach a robot to pick up a mug, but demonstrates it with a white mug, the robot can learn that all mugs are white. Then it might fail to take the red, blue, or “Tim-the-Beaver-brown” cup.

Training a robot to recognize that a mug is a mug, regardless of color, requires thousands of demonstrations.

“I don’t want to have to demonstrate with 30,000 mugs. I want to demonstrate with just one mug. But then I need to teach the robot to recognize any color cup,” said Peng.

To achieve this, the research system determines which specific objects the user cares about (the mug) and what elements are not important for the task (maybe the color of the mug doesn’t matter). It uses this information to generate new synthetic data by changing these “unimportant” visual concepts. This process is known as data augmentation.

The framework has three steps. First, it shows the task that caused the robot to fail. It then collects demonstrations from the user of the desired action and generates a counterfactual by tracing all the features in the space that indicate what needs to change for the robot to be successful.

The system displays these counterfactuals to the user and asks for feedback to determine which visual concepts do not influence the desired action. It then uses this human feedback to generate lots of new incremental demonstrations.

This way, the user can demonstrate taking one mug, but the system will generate a demonstration showing the desired action with thousands of different mugs by changing the color. It uses this data to fine-tune the robot.

Creating counterfactual explanations and soliciting feedback from users is critical for the technique to be successful, Peng said.

From human reasoning to robot reasoning

As their work seeks to put humans in a training loop, the researchers tested their technique with human users. They first conducted a study in which they asked people whether counterfactual explanations helped them identify elements that could be changed without affecting the task.

“That was so obvious right off the bat. Humans are very good at this type of counterfactual reasoning. And it is this counterfactual step that allows human reasoning to be translated into robotic reasoning in ways that make sense,” he said.

Then they applied their framework to three simulations in which the robot was tasked: navigating to the destination object, picking up the key and unlocking the door, and picking up the desired object and placing it on the table. In each example, their method allows the robot to learn more quickly than other techniques, while requiring less demonstration from the user.

Going forward, the researchers hope to test this framework on real robots. They also want to focus on reducing the time it takes for systems to generate new data using generative machine learning models.

“We want robots to do what humans do, and we want them to do it in a semantically meaningful way. Humans tend to operate in these abstract spaces, where they don’t think about every property in an image. In the end, it’s really about enabling robots to learn good human-like representations at an abstract level,” said Peng.

This research was partially supported by the National Science Foundation Graduate Research Fellowship, Open Philanthropy, Apple AI/ML Fellowship, Hyundai Motor Corporation, MIT-IBM Watson AI Lab, and the National Science Foundation Institute for Artificial Intelligence and Interaction Fundamentals.