We present a benchmark for studying generalization in deep reinforcement
learning (RL). Systematic empirical evaluation shows that vanilla deep RL
algorithms generalize better than specialized deep RL algorithms designed
specifically for generalization. In other words, simply training on varied
environments is so far the most effective strategy for generalization. The code
can be found at https://github.com/sunblaze-ucb/rl-generalization and the
full paper is at https://arxiv.org/abs/1810.12282.
“Scientific research has changed the world. Now it needs to change itself.”
- The Economist, 2013
There has been a growing concern about the validity of scientific findings. A multitude of journals, papers and reports have recognized the ever smaller number of replicable scientific studies. In 2016, one of the giants of scientific publishing, Nature, surveyed about 1,500 researchers across many different disciplines, asking for their stand on the status of reproducibility in their area of research. One of the many takeaways to the worrisome results of this survey is the following: 90% of the respondents agreed that there is a reproducibility crisis, and the overall top answer to boosting reproducibility was “better understanding of statistics”. Indeed, many factors contributing to the explosion of irreproducible research stem from the neglect of the fact that statistics is no longer as static as it was in the first half of the 20th century, when statistical hypothesis testing came into prominence as a theoretically rigorous proposal for making valid discoveries with high confidence.
It would be great if we could all have household robots do our chores for us.
Chores are tasks that we want done to make our houses cater more to our
preferences; they are a way in which we want our house to be different from
the way it currently is. However, most “different” states are not very
desirable:
Surely our robot wouldn’t be so dumb as to go around breaking stuff when we ask
it to clean our house? Unfortunately, AI systems trained with reinforcement
learning only optimize features specified in the reward function and are
indifferent to anything we might’ve inadvertently left out. Generally, it is
easy to get the reward wrong by forgetting to include preferences for things
that should stay the same, since we are so used to having these preferences
satisfied, and there are so many of them. Consider the room below, and imagine
that we want a robot waiter that serves people at the dining table efficiently.
We might implement this using a reward function that provides 1 reward whenever
the robot serves a dish, and use discounting so that the robot is incentivized
to be efficient. What could go wrong with such a reward function? How would we
need to modify the reward function to take this into account? Take a minute to
think about it.
We are announcing the release of our state-of-the-art off-policy model-free
reinforcement learning algorithm, soft actor-critic (SAC). This algorithm has
been developed jointly at UC Berkeley and Google, and we have been using
it internally for our robotics experiment. Soft actor-critic is, to our
knowledge, one of the most efficient model-free algorithms available today,
making it especially well-suited for real-world robotic learning. In this post,
we will benchmark SAC against state-of-the-art model-free RL algorithms and
showcase a spectrum of real-world robot examples, ranging from manipulation to
locomotion. We also release our implementation of SAC, which is particularly
designed for real-world robotic systems.
We just rolled out general support for multi-agent reinforcement learning in
Ray RLlib 0.6.0. This blog post is a brief tutorial on multi-agent RL and
how we designed for it in RLlib. Our goal is to enable multi-agent RL across a
range of use cases, from leveraging existing single-agent algorithms to training
with custom algorithms at large scale.
Figure: An artistic representation of single-cell RNA sequencing. The
stars in the sky represent cells in a heterogeneous tissue. The projection of
the stars onto the river reveals relationships among them that are not apparent
by looking directly at the sky. Like the river, our Bayesian model, called scVI,
reveals relationships among cells.
The diversity of gene regulatory states in our body is one of the main reasons
why such an amazing array of biological functions can be encoded in a single
genome. Recent advances in microfluidics and sequencing technologies (such as
inDrops) enabled measurement of gene expression at the single-cell level and has
provided tremendous opportunities to unravel the underlying mechanisms of
relationships between individual genes and specific biological phenomena. These
experiments yield approximate measurements for mRNA counts of the entire
transcriptome (i.e around $d = 20,000$ protein-coding genes) and a large number
of cells $n$, which can vary from tens of thousands to a million cells. The
early computational methods to interpret this data relied on linear model and
empirical Bayes shrinkage approaches due to initially extremely low sample-size.
While current research focuses on providing more accurate models for this gene
expression data, most of the subsequent algorithms either exhibit prohibitive
scalability issues or remain limited to a unique downstream analysis task.
Consequently, common practices in the field still rely on ad-hoc preprocessing
pipelines and specific algorithmic procedures, which limits the capabilities of
capturing the underlying data generating process.
In this post, we propose to build up on the increased sample-size and recent
developments in Bayesian approximate inference to improve modeling complexity as
well as algorithmic scalability. Notably, we present our recent work on deep
generative models for single-cell transcriptomics, which addresses all the
mentioned limitations by formalizing biological questions into statistical
queries over a unique graphical model, tailored to single-cell RNA sequencing
(scRNA-seq) datasets. The resulting algorithmic inference procedure, which we
named Single-cell Variational Inference (scVI), is open-source and
scales to over a million cells.
With very little explicit supervision and feedback, humans are able to learn a
wide range of motor skills by simply interacting with and observing the world
through their senses. While there has been significant progress towards building
machines that can learn complex
skills and learn
based on raw sensory information such as image pixels, acquiring large and
diverse repertoires of general skills remains an open challenge. Our goal is
to build a generalist: a robot that can perform many different tasks, like
arranging objects, picking up toys, and folding towels, and can do so with many
different objects in the real world without re-learning for each object or task.
While these basic motor skills are much simpler and less impressive than mastering Chess or even using a spatula, we think that
being able to achieve such generality with a single model is a fundamental
aspect of intelligence.
The key to acquiring generality is diversity. If you deploy a learning
algorithm in a narrow, closed-world environment, the agent will recover skills
that are successful only in a narrow range of settings. That’s why an algorithm
trained to play Breakout will struggle when anything about the images or the
game changes. Indeed, the success of image classifiers relies on large, diverse
datasets like ImageNet. However, having a robot autonomously learn from large
and diverse datasets is quite challenging. While collecting diverse sensory data
is relatively straightforward, it is simply not practical for a person to
annotate all of the robot’s experiences. It is more scalable to collect
completely unlabeled experiences. Then, given only sensory data, akin to what
humans have, what can you learn? With raw sensory data there is no notion of
progress, reward, or success. Unlike games like Breakout, the real world doesn’t
give us a score or extra lives.
We have developed an algorithm that can learn a general-purpose predictive model
using unlabeled sensory experiences, and then use this single model to perform a
wide range of tasks.
With a single model, our approach can perform a wide range of tasks, including
lifting objects, folding shorts, placing an apple onto a plate, rearranging
objects, and covering a fork with a towel.
In this post, we will describe how this works. We will discuss how we can learn
based on only raw sensory interaction data (i.e. image pixels, without requiring
object detectors or hand-engineered perception components). We will show how we
can use what was learned to accomplish many different user-specified tasks. And,
we will demonstrate how this approach can control a real robot from raw pixels,
performing tasks and interacting with objects that the robot has never seen
before.
Figure 1: (left) LED Array Microscope constructed using a standard
commercial microscope and an LED array. (middle) Close up on the LED array dome
mounted on the microscope. (right) LED array displaying patterns at 100Hz.
Computational imaging systems marry the design of hardware and image
reconstruction. For example, in optical microscopy, tomographic,
super-resolution, and phase imaging systems can be constructed from
simple hardware modifications to a commercial microscope (Fig. 1) and
computational reconstruction. Traditionally, we require a large number of
measurements to recover the above quantities; however, for live cell imaging
applications, we are limited in the number of measurements we can acquire due
to motion. Naturally, we want to know what are the best measurements to acquire.
In this post, we highlight our latest work that learns the experimental design to maximize the performance of a
non-linear computational imaging system.
In many tasks in machine learning, it is common to want to answer questions
given fixed, pre-collected datasets. In some applications, however, we are not
given data a priori; instead, we must collect the data we require to answer the
questions of interest. This situation arises, for example, in environmental
contaminant monitoring and census-style surveys. Collecting the data ourselves
allows us to focus our attention on just the most relevant sources of
information. However, determining which of these sources of information will
yield useful measurements can be difficult. Furthermore, when data is collected
by a physical agent (e.g. robot, satellite, human, etc.) we must plan our
measurements so as to reduce costs associated with the motion of the agent over
time. We call this abstract problem embodied adaptive sensing.
We introduce a new approach to the embodied adaptive sensing problem, in which a
robot must traverse its environment to identify locations or items of interest.
Adaptive sensing encompasses many well-studied problems in robotics, including
the rapid identification of accidental contamination leaks and radioactive
sources, and finding individuals in search and rescue missions. In such
settings, it is often critical to devise a sensing trajectory that returns a
correct solution as quickly as possible.
Top left: image of a 3D cube. Top right: example depth image, with darker points
representing areas closer to the camera (source: Wikipedia). Next two
rows: examples of depth and RGB image pairs for grasping objects in a bin. Last
two rows: similar examples for bed-making.
This post explores two independent innovations and the potential for combining
them in robotics. Two years before the AlexNet results on ImageNet
were released in 2012, Microsoft rolled out the Kinect for the X-Box. This class
of low-cost depth sensors emerged just as Deep Learning boosted
Artificial Intelligence by accelerating performance of hyper-parametric function
approximators leading to surprising advances in image classification,
speech recognition, and language translation. Today, Deep Learning is
also showing promise for end-to-end learning of playing video games and
performing robotic manipulation tasks.
For robot perception, convolutional neural networks (CNNs), such as
VGG or ResNet, with three RGB color channels have become standard. For
robotics and computer vision tasks, it is common to borrow one of these
architectures (along with pre-trained weights) and then to perform transfer
learning or fine-tuning on task-specific data. But in some tasks, knowing
the colors in an image may provide only limited benefits. Consider training a
robot to grasp novel, previously unseen objects. It may be more important to
understand the geometry of the environment rather than colors and textures. The
physical process of manipulation — controlling one or more objects by applying
forces through contact — depends on object geometry, pose, and other factors
which are largely color-invariant. When you manipulate a pen with your hand, for
instance, you can often move it seamlessly without looking at the actual pen, so
long as you have a good understanding of the location and orientation of contact
points. Thus, before proceeding, one might ask: does it makes sense to use
color images?
There is an alternative: depth images. These are single-channel grayscale
images that measure depth values from the camera, and give us invariance to the
colors of objects within an image. We can also use depth to “filter” points
beyond a certain distance which can remove background noise, as we demonstrate
later with robot bed-making. Examples of paired depth and real images are shown
above.
In this post, we consider the potential for combining depth images and deep
learning in the context of three ongoing projects in the UC Berkeley
AUTOLab: Dex-Net for robot grasping, segmenting objects in heaps, and robot
bed-making.