From words to blocks
Combining a language model with reinforcement learning enables object construction in a Minecraft-like environment from natural language instructions.
From Words to Blocks: Building Objects in Minecraft by Grounding Language Models with Reinforcement Learning
Leveraging pre-trained language models to generate action plans for embodied agents is an emerging research direction. However, executing instructions in real or simulated environments necessitates verifying the feasibility of actions and their relevance in achieving a goal. We introduce a novel method that integrates a language model and reinforcement learning for constructing objects in a Minecraft-like environment, based on natural language instructions. Our method generates a set of consistently achievable sub-goals derived from the instructions and subsequently completes the associated sub-tasks using a pre-trained RL policy. We employ the IGLU competition, which is based on the Minecraft simulator, as our test environment, and compare our approach to the competition’s top-performing solutions. Our approach outperforms existing solutions in terms of both the quality of the language model and the quality of the structures built within the IGLU environment.