Building Cooperative Embodied Agents Modularly with Large Language Models

1 University of Massachusetts Amherst 2 Tsinghua University 3 Shanghai Jiao Tong University 4 MIT 5 MIT-IBM Watson AI Lab

Abstract

Large Language Models (LLMs) have demonstrated impressive planning abilities in single-agent embodied tasks across various domains. However, their capacity for planning and communication in multi-agent cooperation remains unclear, even though these are crucial skills for intelligent embodied agents. In this paper, we present a novel framework that utilizes LLMs for multi-agent cooperation and tests it in various embodied environments. Our framework enables embodied agents to plan, communicate, and cooperate with other embodied agents or humans to accomplish long-horizon tasks efficiently. We demonstrate that recent LLMs, such as GPT-4, can surpass strong planning-based methods and exhibit emergent effective communication using our framework without requiring fine-tuning or few-shot prompting. We also discover that LLM-based agents that communicate in natural language can earn more trust and cooperate more effectively with humans. Our research underscores the potential of LLMs for embodied AI and lays the foundation for future research in multi-agent cooperation.


Demo

Here are several videos demonstrating our cooperative embodied agents built with Large Langauge Models who can think and communicate, on the ThreeDWorld Multi-Agent Transport and the Communicative Watch-And-Help environments.


Method

The overall modular framework consists of five modules: observation, belief, communication, reasoning, and planning. At each step, we first process the raw observation received with an Observation Module, then update the agent's inner belief of the scene and the other agents through a Belief Module, this belief is then used with the previous actions and dialogues to construct the prompt for the Communication Module and the Reasoning Module which utilizes Large Language Models to generate messages and decide on high-level plans. Finally, a Planning Module gives the primitive action to take in this step according to the high-level plan.

An overview of our framework, consisting of five modules: observation, belief, communication, reasoning, and planning, where the Communication Module and the Reasoning Module leverage Large Language Models to generate messages and decide on high-level plans.



Examples

To better understand the essential factors for effective cooperation, we conduct a qualitative analysis of the agents’ behaviors exhibited in our experiments and identified several cooperative behaviors.

Example cooperative behaviors demonstrating our agents built with LLMs can communicate effectively and are good cooperators.



This webpage template was recycled from here.