Few-Shot Learning through Inverse Generative Modeling

Aviv Netanyahu¹, Yilun Du^1,2, Antonia Bronars¹, Jyothish Pari¹, Joshua Tenenbaum¹,
Tianmin Shu³, Pulkit Agrawal¹

¹Massachusetts Institute of Technology, ²Harvard University, ³Johns Hopkins University

NeurIPS 2024

arXiv Video Code Data

Training tasks (214 demos)

push on surface (2x)

push around bowl (2x)

pick-and-place on table (2x)

pick-and-place on book (2x)

Test task (10 demos)

FTL-IGM learns to generate behavior conditioned on task representations (text embeddings of task descriptions). Then, it learns new task latent representations from few state-based demonstrations (in this case, videos).

Abstract

Learning the intents of an agent, defined by its goals or motion style, is often extremely challenging from just a few examples. We refer to this problem as task concept learning, and present our approach, Few-Shot Task Learning through Inverse Generative Modeling (FTL-IGM), which learns new task concepts by leveraging invertible neural generative models. The core idea is to pretrain a generative model on a set of basic concepts and their demonstrations. Then, given a few demonstrations of a new concept (such as a new goal or a new action), our method learns the underlying concepts through backpropagation without updating the model weights, thanks to the invertibility of the generative model. We evaluate our method in five domains -- object rearrangement, goal-oriented navigation, motion caption of human actions, autonomous driving and real-world table-top manipulation. Our experimental results demonstrate that via the pretrained generative model, we successfully learn novel concepts and generate agent plans or motion corresponding to these concepts in (1) unseen environments and (2) in composition with training concepts.

Real-World Table-Top Manipulation

We collect demonstrations with a Franka Research 3 robot via teleop. We generate training and test pushing with our model conditioned on different representations.

training task: push on surface, representation: training push on surface (2x)

test task: push on book, representation: test inferred representation (4x)

test baseline task: push on book, representation: training push on surface (4x)

Autonomous Driving

In the highway environment, the green vehicle is controlled by the model, blue vehicles are controlled by a separate controller, red indicates collision. In all scenarios the controlled vehicle must maintain a high speed and avoid collisions. In the highway scenarios it must stay on the rightmost lanes, in exit make the exit, in merge allow a vehicle to merge, in intersection make a left turn and in roundabout take the second exit.

Training tasks (200 demos)

highway

exit

merge

intersection

Test task (5 demos)

VAE

In-Context

Ours

Motion Capture

The CMU Motion Capture Database is collected from real humans performing actions.

Training tasks (2210 demos)

walk

run

march

Test tasks (3 demos)

jumping jacks

VAE

In-Context

Language

Ours

Demo

breaststroke

VAE

In-Context

Language

Ours

Demo

Learned New Concepts + Training Concepts Compositions

walk (demo)

jump (demo)

march (demo)

jumping jacks (demo)

jumping jacks + walk

jumping jacks + jump

jumping jacks + march

jumping jacks (learned)

Learning New Concepts as 1 Concept vs. Compositions of 2 Concepts

jumping jacks 1 concept

jumping jacks 2 concepts

breaststroke 1 concept

breaststroke 2 concepts

Goal-Oriented Navigation

In the AGENT environment, an agent navigates to one of two targets based on their shape and/or color.

Training tasks: target defined by single attribute, shape or color (900 demos)

go to red object

go to yellow object

go to cube

go to bowl

Test tasks: target defined by attribute compositions, shape and color (5 demos)

VAE

In-Context

Ours

Object Rearrangement

In the Object Rearrangement environment, three objects need to be positioned in a certain configuration that satisfies spatial relations between them.

Training tasks: single pairwise relations (11k demos)

triangle above circle

circle above triangle

triangle right of circle

square above circle

Test tasks: new concepts that are training concept compositions: two pairwise relations (5 demos)

triangle right of square + circle above square

square right of triangle + circle above triangle

circle right of square + triangle above square

line: circle right of triangle + triangle right of square

Test tasks: new concepts that are not explicit training concept compositions in natural language symbolic space (5 demos)

all objects on circle circumference of radius 1.67

square diagonal to triangle

triangle diagonal to square

circle diagonal to triangle

Generating a new concept (diagonal) composed with a training pairwise relation

square diagonal to triangle + circle above square

square diagonal to triangle + circle above triangle

square diagonal to triangle + circle right of triangle

square diagonal to triangle + triangle above circle

t-SNE Analysis

New concepts that are not explicit compositions in the natural language symbolic space of training concepts for Driving, MoCap and Object Rearrangement. Hover over data points to view which concept they represent (training representations are blue, learned new concept representations are red).

t-SNE Visualization Carousel

❮ ❯

BibTeX

@inproceedings{netanyahu2024fewshot,
  author    = {Netanyahu, Aviv and Du, Yilun and Bronars, Antonia and Pari, Jyothish and Tenenbaum, Joshua and Shu, Tianmin and Agrawal, Pulkit},
  title     = {Few-Shot Task Learning through Inverse Generative Modeling},
  booktitle   = {Advances in Neural Information Processing Systems},
  year      = {2024},
}

Few-Shot Learning through Inverse Generative Modeling

NeurIPS 2024

Training tasks (214 demos)

Test task (10 demos)

FTL-IGM learns to generate behavior conditioned on task representations (text embeddings of task descriptions). Then, it learns new task latent representations from few state-based demonstrations (in this case, videos).

Abstract

Real-World Table-Top Manipulation

Autonomous Driving

Training tasks (200 demos)

Test task (5 demos)

Motion Capture

Training tasks (2210 demos)

Test tasks (3 demos)

Learned New Concepts + Training Concepts Compositions

Learning New Concepts as 1 Concept vs. Compositions of 2 Concepts

Goal-Oriented Navigation

Training tasks: target defined by single attribute, shape or color (900 demos)

Test tasks: target defined by attribute compositions, shape and color (5 demos)

Object Rearrangement

Training tasks: single pairwise relations (11k demos)

Test tasks: new concepts that are training concept compositions: two pairwise relations (5 demos)

Test tasks: new concepts that are not explicit training concept compositions in natural language symbolic space (5 demos)

Generating a new concept (diagonal) composed with a training pairwise relation

t-SNE Analysis

Related Links

BibTeX