Figure AI's Robot Did the Dishes. Then It Explained Why.

Figure AI's Robot Did the Dishes. Then It Explained Why.

In March 2024, Figure AI released a demo of their humanoid robot Figure 01 operating with a GPT-4-powered reasoning system. The robot sorted objects, made decisions, and described what it was doing in natural language — in real time.

By RSW Editorial · March 14, 2024 · 4 min read · robot-athletes

#humanoid

In March 2024, Figure AI published a video demonstrating their Figure 01 humanoid robot performing a dexterous manipulation task while communicating about what it was doing in natural language. A human asks the robot for something to eat. The robot surveys the surface in front of it, identifies an apple, picks it up, and hands it over. Then it explains, in a natural voice, why it made that choice.

This is a product of a partnership between Figure AI and OpenAI, announced in early 2024 alongside a funding round that valued Figure at $2.6 billion. The collaboration uses OpenAI's models for high-level reasoning and language understanding, while Figure's own systems handle the low-level motor control required to actually move the robot's body.

What the Demo Actually Shows

The March demo is approximately three minutes long and uncut — or at least presented as uncut, which is relevant because edited demo reels are a known issue in the humanoid robotics space. Several things in the video are genuinely impressive regardless of how staged the setup is.

The robot successfully identifies multiple objects on a table — a cup, several food items — and responds to natural language instructions about them. It picks up and places objects with a level of dexterity that is unusual for manipulators not purpose-built for grasping tasks. And it communicates about its actions in real time in a way that is coherent and relevant rather than a scripted response.

What is less clear from the video is how the perception and task planning stack works in practice, and how robust the demonstrated behaviors are across the range of conditions they would need to handle in a real deployment. Demo videos by robotics companies are, almost without exception, best-case scenarios.

The Architecture: Two Systems Working Together

The Figure 01 demo illustrates a trend that has emerged as the most viable current approach to general-purpose humanoid robots: separating the reasoning problem from the motor control problem and attacking them with different tools.

High-level reasoning — what should the robot do next, how should it respond to this request, what is the human asking for — is handled by large language models that have been trained on essentially the entire written output of human civilization and are very good at understanding context, intent, and natural language.

Low-level motor control — how to move this arm precisely enough to pick up a cup without crushing it, how to maintain balance while reaching forward, how to coordinate joint movements for a smooth trajectory — is handled by specialized motor control systems trained through different methods.

The interface between these two systems is the interesting engineering challenge. Getting a language model's high-level instruction to translate into precise motor commands without meaningful delay requires a well-designed abstraction layer.

Figure AI's Position in the Humanoid Race

Figure AI was founded in 2022 by Brett Adcock, who previously founded Vettery (acquired by Adecco) and Archer Aviation. The company has moved unusually quickly: from founding to a functional humanoid demo to a multi-billion-dollar valuation in under two years.

Their roadmap focuses on industrial deployment — specifically logistics and manufacturing applications where humanoid form factor provides genuine advantage over purpose-built automation. BMW has publicly announced a pilot program to test Figure 01 robots at a manufacturing facility.

The competitive landscape is crowded: Boston Dynamics, Agility Robotics, Tesla, Unitree, 1X Technologies, and several other well-funded companies are all pursuing roughly similar markets with different hardware and software approaches. How the competitive dynamics resolve will depend heavily on which teams solve the dexterous manipulation problem at scale first.