In many animals, tool-use skills emerge from a combination of observational
learning and experimentation. For example, by watching one another, chimpanzees
can learn how to use twigs to “fish” for insects. Similarly, capuchin
monkeys demonstrate the ability to wield sticks as sweeping tools to pull
food closer to themselves. While one might wonder whether these are just
illustrations of “monkey see, monkey do,” we believe these tool-use abilities
indicate a greater level of intelligence.
Left: A chimpanzee fishing for termites. Right: A gorilla using a stick to
gather herbs. (source)
The question our new work explores is: can we enable robots to use tools in the
same way — through observation and experimentation?
A requisite for performing complex multi-object manipulation tasks, such as
those involved in tool use, is an understanding of physical cause-and-effect
relationships. Therefore, the ability to predict how one object might
interact with another is crucial. Our prior work has investigated how
visual predictive models of cause-and-effect can be learned from unsupervised
robot interaction with the world. After learning such a model, the robot can
plan to accomplish a diverse set of simple tasks, including cloth folding and
object arrangement. However, if we consider the more complex interactions that
occur in tool-use tasks, such as how a broom can sweep dirt into a dustpan,
undirected experimentation isn’t enough.
Hence, taking inspiration from how animals learn, we designed an algorithm that
allows robots to learn tool-use skills through a similar paradigm of imitation
and interaction. In particular, we show that, with a mix of demonstration data
and unsupervised experience, a robot can use novel objects as tools and even
improvise tools in the absence of traditional ones. Further, depending on the
demands of the task, our method demonstrates the ability to decide whether to
use the provided tools. In this post, we will describe how this works.
We all dream of a future in which autonomous cars can drive us to every corner
of the world. Numerous researchers and companies are working day and night to
chase this dream by overcoming scientific and technological barriers. One of the
greatest challenges we still face is developing machine learning models that can
be trained in a local environment and also perform well in new, unseen
situations. For example, self-driving cars may utilize perception models to
recognize drivable areas from images. Companies in Silicon Valley can build and
perfect such a model using large local datasets from the Bay Area for training.
However, if the same model were deployed in a snowy area such as Boston, it
would likely perform miserably, because it has never seen snow before. Boston,
during winter, and Silicon Valley, during any time of the year, can be labeled
as separate domains for perception models, since they present clear differences
in climate and challenges in perception. In other cases, domains may be much
closer in nature, such as a city street and a nearby highway. The process of
transferring knowledge and models between different domains in machine learning
is called domain adaptation.
A large number of papers on domain adaptation of perception models have appeared
in top publishing venues for machine learning and computer vision. However, most
of these works focus on image classification and semantic segmentation. Hardly
any attention has been paid to instance-level tasks, such as object detection
and tracking, even though localization of nearby objects is arguably more
important for autonomous driving. To foster the study of domain adaptation of
perception models, Berkeley
DeepDrive and Didi Chuxing are
co-hosting two competitions in CVPR 2019 Workshop on
Autonomous Driving. The challenges will focus on domain adaptation of object
detection and tracking based on the BDD100K, from Berkeley DeepDrive, and
D2-City, from Didi Chuxing, datasets. The domain of BDD100K covers US
scenes, while D2-City was collected on China’s streets. The
competitions ask participants to transfer object detectors from BDD100K to
D2-City and object trackers from D2-city to BDD100K. More
information about the challenges can be found on our website and D2-City.
Following our introduction of the BDD100K dataset, we have been busy working to
provide more temporal annotations. Above is an example of object tracking
annotation, created by our open-source annotation platform Scalabel. Some of the tracking labels are
used in the domain adaptation challenge for object tracking. More data will be
released this summer. Of course, we also have object tracking at night.
The University of California Berkeley Artificial Intelligence Research (BAIR)
Lab is pleased to announce the BAIR Open Research Commons, a new industrial
affiliate program launched to accelerate cutting-edge AI research. AI research
is advancing rapidly in both university and corporate research settings, with
existing collaborations already underway driven by individual
researcher-to-researcher collaborations. The BAIR Commons is designed to enhance
and streamline such collaborative cutting-edge research by students, faculty,
and corporate research scholars.
The Commons agreement has been framed with the goal of promoting open research
in AI: all on-campus effort, data, and results in the Commons program will be
non-exclusive with open publication and open-source code release expected.
Fostering an environment for excellence for graduate student research is the
primary motivation of the new program: Berkeley students will lead the design of
projects in the Commons, and the program of research must be approved by their
home departments before a project commences. Students are expected to benefit
from collaboration with leading researchers in industrial research labs, as well
as the availability of partner resources useful to investigate certain open
questions in state-of-the-art AI research. The University will benefit from
membership fees paid by partners to participate in the program. The Commons
agreement provides for collaborative joint projects between the partners and
Berkeley, with intellectual property shared jointly and equally by the parties.
The agreement also provides for joint research “lablets”, which will be embedded
collaborative open research spaces inside BAIR’s 27,000 sq. ft. research
facility opening this summer in the Berkeley Way West facility on the Berkeley
campus. More than a dozen faculty and 120 students will be assigned space in
the new lab, with an equal number of visiting positions allocated for
researchers from other BAIR labs and for visiting industrial partners.
Initial alliance participants include Amazon, Facebook, Google, Samsung, and
Wave Computing. Funding for over twenty joint projects has been committed in the
initial launch of the program, which will support both BAIR facilities and
research efforts. Over 30 faculty and 200 graduate students and postdocs at
Berkeley are affiliated with BAIR. For more information about BAIR or the
Commons program please contact firstname.lastname@example.org.
BAIR will occupy the top floor of Berkeley Way West.
Guiding our fingers while typing, enabling us to nimbly strike a matchstick, and
inserting a key in a keyhole all rely on our sense of touch. It has been
shown that the sense of touch is
very important for dexterous manipulation in humans. Similarly, for many robotic
manipulation tasks, vision alone may not be
often, it may be difficult to resolve subtle details such as the exact position
of an edge, shear forces or surface textures at points of contact, and robotic
arms and fingers can block the line of sight between a camera and its quarry.
Augmenting robots with this crucial sense, however, remains a challenging task.
Our goal is to provide a framework for learning how to perform tactile servoing,
which means precisely relocating an object based on tactile information. To
provide our robot with tactile feedback, we utilize a custom-built tactile
sensor, based on similar principles as the GelSight
sensor developed at MIT. The sensor is
composed of a deformable, elastomer-based gel, backlit by three colored LEDs,
and provides high-resolution RGB images of contact at the gel surface. Compared
to other sensors, this tactile sensor sensor naturally provides geometric
information in the form of rich visual information from which attributes such as
force can be inferred. Previous work using similar sensors has leveraged the
this kind of tactile sensor on tasks such as learning how to
success rates when grasping a variety of objects.
We present a benchmark for studying generalization in deep reinforcement
learning (RL). Systematic empirical evaluation shows that vanilla deep RL
algorithms generalize better than specialized deep RL algorithms designed
specifically for generalization. In other words, simply training on varied
environments is so far the most effective strategy for generalization. The code
can be found at https://github.com/sunblaze-ucb/rl-generalization and the
full paper is at https://arxiv.org/abs/1810.12282.
“Scientific research has changed the world. Now it needs to change itself.”
- The Economist, 2013
There has been a growing concern about the validity of scientific findings. A multitude of journals, papers and reports have recognized the ever smaller number of replicable scientific studies. In 2016, one of the giants of scientific publishing, Nature, surveyed about 1,500 researchers across many different disciplines, asking for their stand on the status of reproducibility in their area of research. One of the many takeaways to the worrisome results of this survey is the following: 90% of the respondents agreed that there is a reproducibility crisis, and the overall top answer to boosting reproducibility was “better understanding of statistics”. Indeed, many factors contributing to the explosion of irreproducible research stem from the neglect of the fact that statistics is no longer as static as it was in the first half of the 20th century, when statistical hypothesis testing came into prominence as a theoretically rigorous proposal for making valid discoveries with high confidence.
It would be great if we could all have household robots do our chores for us.
Chores are tasks that we want done to make our houses cater more to our
preferences; they are a way in which we want our house to be different from
the way it currently is. However, most “different” states are not very
Surely our robot wouldn’t be so dumb as to go around breaking stuff when we ask
it to clean our house? Unfortunately, AI systems trained with reinforcement
learning only optimize features specified in the reward function and are
indifferent to anything we might’ve inadvertently left out. Generally, it is
easy to get the reward wrong by forgetting to include preferences for things
that should stay the same, since we are so used to having these preferences
satisfied, and there are so many of them. Consider the room below, and imagine
that we want a robot waiter that serves people at the dining table efficiently.
We might implement this using a reward function that provides 1 reward whenever
the robot serves a dish, and use discounting so that the robot is incentivized
to be efficient. What could go wrong with such a reward function? How would we
need to modify the reward function to take this into account? Take a minute to
think about it.
We are announcing the release of our state-of-the-art off-policy model-free
reinforcement learning algorithm, soft actor-critic (SAC). This algorithm has
been developed jointly at UC Berkeley and Google, and we have been using
it internally for our robotics experiment. Soft actor-critic is, to our
knowledge, one of the most efficient model-free algorithms available today,
making it especially well-suited for real-world robotic learning. In this post,
we will benchmark SAC against state-of-the-art model-free RL algorithms and
showcase a spectrum of real-world robot examples, ranging from manipulation to
locomotion. We also release our implementation of SAC, which is particularly
designed for real-world robotic systems.
We just rolled out general support for multi-agent reinforcement learning in
Ray RLlib 0.6.0. This blog post is a brief tutorial on multi-agent RL and
how we designed for it in RLlib. Our goal is to enable multi-agent RL across a
range of use cases, from leveraging existing single-agent algorithms to training
with custom algorithms at large scale.
Figure: An artistic representation of single-cell RNA sequencing. The
stars in the sky represent cells in a heterogeneous tissue. The projection of
the stars onto the river reveals relationships among them that are not apparent
by looking directly at the sky. Like the river, our Bayesian model, called scVI,
reveals relationships among cells.
The diversity of gene regulatory states in our body is one of the main reasons
why such an amazing array of biological functions can be encoded in a single
genome. Recent advances in microfluidics and sequencing technologies (such as
inDrops) enabled measurement of gene expression at the single-cell level and has
provided tremendous opportunities to unravel the underlying mechanisms of
relationships between individual genes and specific biological phenomena. These
experiments yield approximate measurements for mRNA counts of the entire
transcriptome (i.e around $d = 20,000$ protein-coding genes) and a large number
of cells $n$, which can vary from tens of thousands to a million cells. The
early computational methods to interpret this data relied on linear model and
empirical Bayes shrinkage approaches due to initially extremely low sample-size.
While current research focuses on providing more accurate models for this gene
expression data, most of the subsequent algorithms either exhibit prohibitive
scalability issues or remain limited to a unique downstream analysis task.
Consequently, common practices in the field still rely on ad-hoc preprocessing
pipelines and specific algorithmic procedures, which limits the capabilities of
capturing the underlying data generating process.
In this post, we propose to build up on the increased sample-size and recent
developments in Bayesian approximate inference to improve modeling complexity as
well as algorithmic scalability. Notably, we present our recent work on deep
generative models for single-cell transcriptomics, which addresses all the
mentioned limitations by formalizing biological questions into statistical
queries over a unique graphical model, tailored to single-cell RNA sequencing
(scRNA-seq) datasets. The resulting algorithmic inference procedure, which we
named Single-cell Variational Inference (scVI), is open-source and
scales to over a million cells.