One-Shot Imitation Learning
Yan Duan
†§
, Marcin Andrychowicz
‡
, Bradly Stadie
†‡
, Jonathan Ho
†§
,
Jonas Schneider
‡
, Ilya Sutskever
‡
, Pieter Abbeel
†§
, Wojciech Zaremba
‡
†
Berkeley AI Research Lab,
‡
OpenAI
§
Work done while at OpenAI
{rockyduan, jonathanho, pabbeel}@eecs.berkeley.edu
{marcin, bstadie, jonas, ilyasu, woj}@openai.com
Abstract
Imitation learning has been commonly applied to solve different tasks in isolation.
This usually requires either careful feature engineering, or a significant number of
samples. This is far from what we desire: ideally, robots should be able to learn
from very few demonstrations of any given task, and instantly generalize to new
situations of the same task, without requiring task-specific engineering. In this
paper, we propose a meta-learning framework for achieving such capability, which
we call one-shot imitation learning.
Specifically, we consider the setting where there is a very large (maybe infinite)
set of tasks, and each task has many instantiations. For example, a task could be
to stack all blocks on a table into a single tower, another task could be to place
all blocks on a table into two-block towers, etc. In each case, different instances
of the task would consist of different sets of blocks with different initial states.
At training time, our algorithm is presented with pairs of demonstrations for a
subset of all tasks. A neural net is trained such that when it takes as input the first
demonstration demonstration and a state sampled from the second demonstration,
it should predict the action corresponding to the sampled state. At test time, a full
demonstration of a single instance of a new task is presented, and the neural net
is expected to perform well on new instances of this new task. Our experiments
show that the use of soft attention allows the model to generalize to conditions and
tasks unseen in the training data. We anticipate that by training this model on a
much greater variety of tasks and settings, we will obtain a general system that can
turn any demonstrations into robust policies that can accomplish an overwhelming
variety of tasks.
1 Introduction
We are interested in robotic systems that are able to perform a variety of complex useful tasks, e.g.
tidying up a home or preparing a meal. The robot should be able to learn new tasks without long
system interaction time. To accomplish this, we must solve two broad problems. The first problem is
that of dexterity: robots should learn how to approach, grasp and pick up complex objects, and how
to place or arrange them into a desired configuration. The second problem is that of communication:
how to communicate the intent of the task at hand, so that the robot can replicate it in a broader set of
initial conditions.
Demonstrations are an extremely convenient form of information we can use to teach robots to over-
come these two challenges. Using demonstrations, we can unambiguously communicate essentially
any manipulation task, and simultaneously provide clues about the specific motor skills required to
perform the task. We can compare this with an alternative form of communication, namely natural
language. Although language is highly versatile, effective, and efficient, natural language processing
31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.
arXiv:1703.07326v3 [cs.AI] 4 Dec 2017