Recently, a team of researchers from Microsoft Research, UC Berkeley and the University of Nottingham has developed a new methodology for testing the human-AI collaboration of reinforcement learning (RL) agents using a two-player game environment, Overcooked.
In this environment, the players control chefs in a kitchen to cook and serve dishes.
First, the researchers made sure the agents trained via vanilla deep reinforcement learning are not robust. They tested the reinforcement learning agents in Overcooked environment and ensured none of the RL agents scored more than 65 percent in robustness. The researchers, then, checked three new proposals on improving state diversity, human model diversity, and human model quality.
Why This Research
In recent years, deep reinforcement learning (deep RL) has been successfully implemented by researchers to train agents that can perform well in various environments. However, the deployment of a reinforcement learning agent in the real world situation calls for high robustness.
The reinforcement learning agents are also trained to work alongside humans in realistic scenarios, such as driving, surgeries, etc. Thus, it is imperative to ensure that the agents are robust.
The researchers stated: “Specifically, we suggest that when designing AI agents that collaborate with humans, designers should search for potential edge cases in possible partner behaviour and possible states encountered, as well as write tests which will verify that the behaviour of the RL agent in these edge cases is reasonable.”
To evaluate and increase the robustness of reinforcement learning agents, the researchers developed the methodology of searching for potential edge cases in possible partner behaviour, and possible states encountered to generate a suite of unit tests for the Overcooked-AI environment. The test suite was, then, used to evaluate three proposals for improving the agents’ robustness.
The research focused mainly on building RL agents that can collaborate with humans. The researchers stated there are various approaches to improve the robustness of RL agents. For instance, rather than training a deep RL agent to play with a single human model trained with behaviour cloning, one can potentially improve:
- The quality of the human model by including the Theory of Mind (ToM)
- The human model diversity by utilising a population of human models
- The state diversity, for instance, by initialising from states visited in human-human gameplay
In the Overcooked environment, the only objects are onions, soups and dishes. The players work together to place three onions in a pot, leave them to cook for 20 timesteps, pour the soup in a bowl, and serve it. All players are given 20 reward points when the soup is served.
The researchers used the multiagent Markov Decision Process (MDP) for the process. Unlike regular MDP, in a multiagent MDP, each agent’s history of interaction is essential to learn about the other agents. In human-AI collaboration, the researchers employed a two-player Markov Decision Process (MDP), where one of the players is human and the other an AI agent.
The test suite offered significant insights into robustness not highly correlated with the information provided by the average validation reward.
According to the researchers, the test suite can extract more information from RL agents than from just observing the rewards alone. They have tested some of the current deep reinforcement learning agents, and the agents were not clearly robust in nature, suggesting the new methodology is a significant improvement to existing systems.
Read the paper here.
Subscribe to our Newsletter
Get the latest updates and relevant offers by sharing your email.