Scalable Multi-Robot Collaboration with Large Language Models: Centralized or Decentralized Systems? (ICRA'2024)

Massachusetts Institute of Technology


We compare the task success rate and token efficiency of four multi-agent communication frameworks (centralized, decentralized, and two hybrid) and three step history methods (with all history, without history, and with state-action pairs) as applied to four coordination-dependent multi-agent 2D task scenarios for increasing numbers of agents.

Video

Demo

Abstract

A flurry of recent work has demonstrated that pre-trained large language models (LLMs) can be effective task planners for a variety of single-robot tasks. The planning performance of LLMs is significantly improved via prompting techniques, such as in-context learning or re-prompting with state feedback, placing new importance on the token budget for the context window. An under-explored but natural next direction is to investigate LLMs as multi-robot task planners. However, long-horizon, heterogeneous multi-robot planning introduces new challenges of coordination while also pushing up against the limits of context window length. It is therefore critical to find token-efficient LLM planning frameworks that are also able to reason about the complexities of multi-robot coordination.

In this work, we compare the task success rate and token efficiency of four multi-agent communication frameworks (centralized, decentralized, and two hybrid) as applied to four coordination-dependent multi-agent 2D task scenarios for increasing numbers of agents. We find that a hybrid framework achieves better task success rates across all four tasks and scales better to more agents. We further demonstrate the hybrid frameworks in 3D simulations where the vision-to-text problem and dynamical errors are considered.


Prompt and process examples of the HMAS-2 framework in BoxLift2 task. The generated 'Response from the central agent' is sent to local agents for feedback. Once the central-local iteration terminates, the output plan is checked for syntactic correctness.


Prompt example of the local agent of the HMAS-2 framework in BoxLift2 task. The generated plan from the central agent is sent to local agents for feedback.

Related Links

This work is part of a broader research thread around language-instructed task and motion planning, which allows us to transform from natural language instruction into robot control signals.

Other work on natural language to STL translation and LLM-based agents from our lab include:

BibTeX

@article{chen2023scalable,
  title={Scalable Multi-Robot Collaboration with Large Language Models: Centralized or Decentralized Systems?},
  author={Chen, Yongchao and Arkin, Jacob and Zhang, Yang and Roy, Nicholas and Fan, Chuchu},
  journal={arXiv preprint arXiv:2309.15943},
  year={2023}
}
}