Learning Acrobatics by Watching YouTube

Xue Bin (Jason) Peng and Angjoo Kanazawa Oct 9, 2018

Simulated characters imitating skills from YouTube videos.

Whether it’s everyday tasks like washing our hands or stunning feats of acrobatic prowess, humans are able to learn an incredible array of skills by watching other humans. With the proliferation of publicly available video data from sources like YouTube, it is now easier than ever to find video clips of whatever skills we are interested in. A staggering 300 hours of videos are uploaded to YouTube every minute. Unfortunately, it is still very challenging for our machines to learn skills from this vast volume of visual data. Most imitation learning approaches require concise representations, such as those recorded from motion capture (mocap). But getting mocap data can be quite a hassle, often requiring heavy instrumentation. Mocap systems also tend to be restricted to indoor environments with minimal occlusion, which can limit the types of skills that can be recorded. So wouldn’t it be nice if our agents can also learn skills by watching video clips?

In this work, we present a framework for learning skills from videos (SFV). By combining state-of-the-art techniques in computer vision and reinforcement learning, our system enables simulated characters to learn a diverse repertoire of skills from video clips. Given a single monocular video of an actor performing some skill, such as a cartwheel or a backflip, our characters are able to learn policies that reproduce that skill in a physics simulation, without requiring any manual pose annotations.

Continue

Visual Reinforcement Learning with Imagined Goals

Vitchyr Pong$^$ and Ashvin Nair$^$ Sep 6, 2018

We want to build agents that can accomplish arbitrary goals in unstructured complex environments, such as a personal robot that can perform household chores. A promising approach is to use deep reinforcement learning, which is a powerful framework for teaching agents to maximize a reward function. However, the typical reinforcement learning paradigm involves training an agent to solve an individual task with a manually designed reward. For example, you might train a robot to set a dinner table by designing a reward function based on the distance between each plate or utensil and its goal location. This setup requires a person to design the reward function for each task, as well as extra systems like object detectors, which can be expensive and brittle. Moreover, if we want machines that can perform a large repertoire of chores, we would have to repeat this RL training procedure on each new task.

simulated door environment simulated pick and place environment simulated two-object pushing environment real robot pushing environment
While designing reward functions and setting up sensors (door angle measurement, object detectors, etc.) may be easy in simulation, it quickly becomes impractical in the real world (right image).

simulated door environment simulated pick and place environment simulated two-object pushing environment real robot pushing environment
We train agents to solve various tasks from vision without extra instrumentation. The top row shows goal images and the bottom row shows our policies reaching those goals.

In this post, we discuss reinforcement learning algorithms that can be used to learn multiple different tasks simultaneously, without additional human supervision. For an agent to acquire skills without human intervention, it must be able to set goals for itself, interact with the environment, and evaluate whether it has achieved its goals to improve its behavior, all from raw observations such as images without manually engineering extra components like object detectors. We introduce a system that sets abstract goals and autonomously learns to achieve those goals. We then show that we can use these autonomously learned skills to perform a variety of user-specified goals, such as pushing objects, grasping objects, and opening doors, without any additional learning. Lastly, we demonstrate that our method is efficient enough to work in the real world on a Sawyer robot. The robot learns to set and achieve goals involving pushing an object to a specific location, with only images as the input to the system.

Continue

Dexterous Manipulation with Reinforcement Learning: Efficient, General, and Low-Cost

Henry Zhu, Abhishek Gupta, Vikash Kumar, Aravind Rajeswaran, and Sergey Levine
Aug 31, 2018

In this post, we demonstrate how deep reinforcement learning (deep RL) can be used to learn how to control dexterous hands for a variety of manipulation tasks. We discuss how such methods can learn to make use of low-cost hardware, can be implemented efficiently, and how they can be complemented with techniques such as demonstrations and simulation to accelerate learning.

Continue

When Recurrent Models Don't Need to be Recurrent

John Miller Aug 6, 2018

An earlier version of this post was published on Off the Convex Path. It is reposted here with the author’s permission.

In the last few years, deep learning practitioners have proposed a litany of different sequence models. Although recurrent neural networks were once the tool of choice, now models like the autoregressive Wavenet or the Transformer are replacing RNNs on a diverse set of tasks. In this post, we explore the trade-offs between recurrent and feed-forward models. Feed-forward models can offer improvements in training stability and speed, while recurrent models are strictly more expressive. Intriguingly, this added expressivity does not seem to boost the performance of recurrent models. Several groups have shown feed-forward networks can match the results of the best recurrent models on benchmark sequence tasks. This phenomenon raises an interesting question for theoretical investigation:

When and why can feed-forward networks replace recurrent neural networks without a loss in performance?

We discuss several proposed answers to this question and highlight our recent work that offers an explanation in terms of a fundamental stability property.

Continue

One-Shot Imitation from Watching Videos

Tianhe Yu and Chelsea Finn Jun 28, 2018

Learning a new skill by observing another individual, the ability to imitate, is a key part of intelligence in human and animals. Can we enable a robot to do the same, learning to manipulate a new object by simply watching a human manipulating the object just as in the video below?

The robot learns to place the peach into the red bowl after watching the human do so.

Continue

BDD100K Blog Update

Fisher Yu and Trevor Darrell Jun 18, 2018

We are excited by the interest and excitement generated by our BDD100K dataset. Our data release and blog post were covered in an unsolicited article by the UC Berkeley newspaper, the Daily Cal, which was then picked up by other news services without our prompting or intervention. The paper describing this dataset is under review at the ECCV 2018 conference, and we followed the rules of that conference (as communicated to us by the Program Chairs in prompt email response when we asked for clarification following the reporter’s request; the ECCV PC’s replied that ECCV follows CVPR’s long-standing policy). We thus declined to speak to the reporters after they reached out to us. We did not, and have not, communicated with any media outlets regarding this story.

While the Daily Cal article was accurate; unfortunately, other media outlets who followed in reporting the story made claims that were attributed to us incorrectly, and which do not represent our view. In particular, several media outlets attributed to us a claim that the BDD100K dataset was “800 times” bigger than other industrial datasets, specifically mentioning Baidu’s ApolloScape. While it is true our dataset does contain more raw images than other datasets, including Baidu’s, the stated claim is misleading and we did not put that line or anything like it in a paper, blog post, or spoken comment to anyone. It appears that some reporters(s) viewed the data in tables in our paper and came up with this conclusory comment themselves as it made an exciting headline, yet attributed it to us. In fact, it is inappropriate in our view to summarize the difference between our dataset and Baidu’s in a single comment that ours is 800x larger. Comparing the number of raw images directly is not the most appropriate way to compare these types of datasets.

Importantly, different datasets focus on different aspects of the autonomous driving challenge. Our dataset is crowd-sourced, and covers a very large area and diverse visual phenomena (indeed significantly more diverse than previous efforts, in our view), but it is very clearly limited to monocular RGB image data and associated mobile device metadata. Other dataset collection efforts are complementary in our view. Baidu’s, KITTI, and CityScapes each contain important additional sensing modalities and are collected with fully calibrated apparatus including actuation channels. (The dataset from Mapillary is also notable, and similar to ours in being diverse, crowd-sourced, and densely annotated, but differs in that we include video and dynamic metadata relevant to driving control.) We look forward to projects at Berkeley and elsewhere that leverage both BDD100K and these other datasets as the research community brings the potential of autonomous driving to reality.

Continue

BDD100K: A Large-scale Diverse Driving Video Database

Fisher Yu May 30, 2018

Update 06/18/2018: please also check our follow-up blog post after reading this.

TL;DR, we released the largest and most diverse driving video dataset with rich annotations called BDD100K. You can access the data for research now at http://bdd-data.berkeley.edu. We have recently released an arXiv report on it. And there is still time to participate in our CVPR 2018 challenges!

Continue

Delayed Impact of Fair Machine Learning

Lydia T. Liu, Sarah Dean, Esther Rolf, Max Simchowitz, Moritz Hardt May 17, 2018

Machine learning systems trained to minimize prediction error may often exhibit discriminatory behavior based on sensitive characteristics such as race and gender. One reason could be due to historical bias in the data. In various application domains including lending, hiring, criminal justice, and advertising, machine learning has been criticized for its potential to harm historically underrepresented or disadvantaged groups.

In this post, we talk about our recent work on aligning decisions made by machine learning with long term social welfare goals. Commonly, machine learning models produce a score that summarizes information about an individual in order to make decisions about them. For example, a credit score summarizes an individual’s credit history and financial activities in a way that informs the bank about their creditworthiness. Let us continue to use the lending setting as a running example.

Continue

TDM: From Model-Free to Model-Based Deep Reinforcement Learning

Vitchyr Pong Apr 26, 2018

You’ve decided that you want to bike from your house by UC Berkeley to the Golden Gate Bridge. It’s a nice 20 mile ride, but there’s a problem: you’ve never ridden a bike before! To make matters worse, you are new to the Bay Area, and all you have is a good ol’ fashion map to guide you. How do you get started?

Let’s first figure out how to ride a bike. One strategy would be to do a lot of studying and planning. Read books on how to ride bicycles. Study physics and anatomy. Plan out all the different muscle movements that you’ll make in response to each perturbation. This approach is noble, but for anyone who’s ever learned to ride a bike, they know that this strategy is doomed to fail. There’s only one way to learn how to ride a bike: trial and error. Some tasks like riding a bike are just too complicated to plan out in your head.

Once you’ve learned how to ride your bike, how would you get to the Golden Gate Bridge? You could reuse your trial-and-error strategy. Take a few random turns and see if you end up at the Golden Gate Bridge. Unfortunately, this strategy would take a very, very long time. For this sort of problem, planning is a much faster strategy, and requires considerably less real-world experience and trial-and-error. In reinforcement learning terms, it is more sample-efficient.

Left: some skills you learn by trial and error. Right: other times, planning ahead is better.

While simple, this thought experiment highlights some important aspects of human intelligence. For some tasks, we use a trial-and-error approach, and for others we use a planning approach. A similar phenomenon seems to have emerged in reinforcement learning (RL). In the parlance of RL, empirical results show that some tasks are better suited for model-free (trial-and-error) approaches, and others are better suited for model-based (planning) approaches.

However, the biking analogy also highlights that the two systems are not completely independent. In particularly, to say that learning to ride a bike is just trial-and-error is an oversimplification. In fact, when learning to bike by trial-and-error, you’ll employ a bit of planning. Perhaps your plan will initially be, “Don’t fall over.” As you improve, you’ll make more ambitious plans, such as, “Bike forwards for two meters without falling over.” Eventually, your bike-riding skills will be so proficient that you can start to plan in very abstract terms (“Bike to the end of the road.”) to the point that all there is left to do is planning and you no longer need to worry about the nitty-gritty details of riding a bike. We see that there is a gradual transition from the model-free (trial-and-error) strategy to a model-based (planning) strategy. If we could develop artificial intelligence algorithms--and specifically RL algorithms--that mimic this behavior, it could result in an algorithm that both performs well (by using trial-and-error methods early on) and is sample efficient (by later switching to a planning approach to achieve more abstract goals).

This post covers temporal difference model (TDM), which is a RL algorithm that captures this smooth transition between model-free and model-based RL. Before describing TDMs, we start by first describing how a typical model-based RL algorithm works.

Continue

Shared Autonomy via Deep Reinforcement Learning

Siddharth Reddy Apr 18, 2018

A blind, autonomous pilot (left), suboptimal human pilot (center), and combined human-machine team (right) play the Lunar Lander game.

Imagine a drone pilot remotely flying a quadrotor, using an onboard camera to navigate and land. Unfamiliar flight dynamics, terrain, and network latency can make this system challenging for a human to control. One approach to this problem is to train an autonomous agent to perform tasks like patrolling and mapping without human intervention. This strategy works well when the task is clearly specified and the agent can observe all the information it needs to succeed. Unfortunately, many real-world applications that involve human users do not satisfy these conditions: the user's intent is often private information that the agent cannot directly access, and the task may be too complicated for the user to precisely define. For example, the pilot may want to track a set of moving objects (e.g., a herd of animals) and change object priorities on the fly (e.g., focus on individuals who unexpectedly appear injured). Shared autonomy addresses this problem by combining user input with automated assistance; in other words, augmenting human control instead of replacing it.

Continue

Newer

Older

Xue Bin (Jason) Peng and Angjoo Kanazawa Oct 9, 2018

Vitchyr Pong$^*$ and Ashvin Nair$^*$ Sep 6, 2018

Henry Zhu, Abhishek Gupta, Vikash Kumar, Aravind Rajeswaran, and Sergey Levine Aug 31, 2018

John Miller Aug 6, 2018

Tianhe Yu and Chelsea Finn Jun 28, 2018

Fisher Yu and Trevor Darrell Jun 18, 2018

Fisher Yu May 30, 2018

Lydia T. Liu, Sarah Dean, Esther Rolf, Max Simchowitz, Moritz Hardt May 17, 2018

Vitchyr Pong Apr 26, 2018

Siddharth Reddy Apr 18, 2018

Vitchyr Pong$^$ and Ashvin Nair$^$ Sep 6, 2018

Henry Zhu, Abhishek Gupta, Vikash Kumar, Aravind Rajeswaran, and Sergey Levine
Aug 31, 2018