Learning to Learn with Probabilistic Task Embeddings


To operate successfully in a complex and changing environment, learning agents must be able to acquire new skills quickly. Humans display remarkable skill in this area — we can learn to recognize a new object from one example, adapt to driving a different car in a matter of minutes, and add a new slang word to our vocabulary after hearing it once. Meta-learning is a promising approach for enabling such capabilities in machines. In this paradigm, the agent adapts to a new task from limited data by leveraging a wealth of experience collected in performing related tasks. For agents that must take actions and collect their own experience, meta-reinforcement learning (meta-RL) holds the promise of enabling fast adaptation to new scenarios. Unfortunately, while the trained policy can adapt quickly to new tasks, the meta-training process requires large amounts of data from a range of training tasks, exacerbating the sample inefficiency that plagues RL algorithms. As a result, existing meta-RL algorithms are largely feasible only in simulated environments. In this post, we’ll briefly survey the current landscape of meta-RL and then introduce a new algorithm called PEARL that drastically improves sample efficiency by orders of magnitude. (Check out the research paper and the code.)


1000x Faster Data Augmentation


Effect of Population Based Augmentation applied to images, which differs at different percentages into training.

In this blog post we introduce Population Based Augmentation (PBA), an algorithm that quickly and efficiently learns a state-of-the-art approach to augmenting data for neural network training. PBA matches the previous best result on CIFAR and SVHN but uses one thousand times less compute, enabling researchers and practitioners to effectively learn new augmentation policies using a single workstation GPU. You can use PBA broadly to improve deep learning performance on image recognition tasks.

We discuss the PBA results from our recent paper and then show how to easily run PBA for yourself on a new data set in the Tune framework.


Autonomous Vehicles for Social Good: Learning to Solve Congestion


We are in the midst of an unprecedented convergence of two rapidly growing trends on our roadways: sharply increasing congestion and the deployment of autonomous vehicles. Year after year, highways get slower and slower: famously, China’s roadways were paralyzed by a two-week long traffic jam in 2010. At the same time as congestion worsens, hundreds of thousands of semi-autonomous vehicles (AVs), which are vehicles with automated distance and lane-keeping capabilities, are being deployed on highways worldwide. The second trend offers a perfect opportunity to alleviate the first. The current generation of AVs, while very far from full autonomy, already hold a multitude of advantages over human drivers that make them perfectly poised to tackle this congestion. Humans are imperfect drivers: accelerating when we shouldn’t, braking aggressively, and make short-sighted decisions, all of which creates and amplifies patterns of congestion.


End-to-End Deep Reinforcement Learning
without Reward Engineering


Communicating the goal of a task to another person is easy: we can use language, show them an image of the desired outcome, point them to a how-to video, or use some combination of all of these. On the other hand, specifying a task to a robot for reinforcement learning requires substantial effort. Most prior work that has applied deep reinforcement learning to real robots makes uses of specialized sensors to obtain rewards or studies tasks where the robot’s internal sensors can be used to measure reward. For example, using thermal cameras for tracking fluids, or purpose-built computer vision systems for tracking objects. Since such instrumentation needs to be done for any new task that we may wish to learn, it poses a significant bottleneck to widespread adoption of reinforcement learning for robotics, and precludes the use of these methods directly in open-world environments that lack this instrumentation.

We have developed an end-to-end method that allows robots to learn from a modest number of images that depict successful completion of a task, without any manual reward engineering. The robot initiates learning from this information alone (around 80 images), and occasionally queries a user for additional labels. In these queries, the robot shows the user an image and asks for a label to determine whether that image represents successful completion of the task or not. We require a small number of such queries (around 25-75), and using these queries, the robot is able to learn directly in the real world in 1-4 hours of interaction time, resulting in one of the most efficient real-world image-based robotic RL methods. We have open-sourced our implementation.

Our method allows us to solve a host of real world robotics problems from pixels in an end-to-end fashion without any hand-engineered reward functions.


Model-Based Reinforcement Learning from Pixels with Structured Latent Variable Models


Imagine a robot trying to learn how to stack blocks and push objects using visual inputs from a camera feed. In order to minimize cost and safety concerns, we want our robot to learn these skills with minimal interaction time, but efficient learning from complex sensory inputs such as images is difficult. This work introduces SOLAR, a new model-based reinforcement learning (RL) method that can learn skills – including manipulation tasks on a real Sawyer robot arm – directly from visual inputs with under an hour of interaction. To our knowledge, SOLAR is the most efficient RL method for solving real world image-based robotics tasks.

Our robot learns to stack a Lego block and push a mug onto a coaster with only inputs from a camera pointed at the robot. Each task takes an hour or less of interaction to learn.


Large-Scale Long-Tailed Recognition in an Open World


Existing Computer Vision Setting v.s. Real-World Scenario

One day, an ecologist came to us. He wanted to use modern computer vision techniques to perform automatic animal identification in his wildlife camera trap image datasets. We were so confident because it sounded just like a basic image classification problem. However, we failed. The dataset he provided was extremely long-tailed and open-ended. As usual, when we did not have enough training data, we asked if it was possible to provide more data for the tail classes and just ignore the open classes that might appear in the testing dataset. Unfortunately, collecting more data was not the option. It could take an extremely long time for these ecologists to take photos of rare and secluded animals in the wild. For some endangered animals, they even had to wait for years for one single shot. At the same time, new animal species kept coming in, and old animal species kept leaving. The total class number was never fixed in such a dynamic system. Moreover, the identification of rare and new animals has more conservational values than abundant animals. If we could only do well on the abundant classes, the method would never be practically usable. We tried all possible methods we could think of (data augmentation, sampling techniques, few-shot learning, imbalanced classification, etc.); but none of the existing methods could handle abundant classes, scarce classes and open classes at the same time (Fig. 1).

Figure 1: There exists a considerable gap between the existing computer vision setting and the real-world scenario.


Robots that Learn to Adapt


Figure 1: Our model-based meta reinforcement learning algorithm enables a legged robot to adapt online in the face of an unexpected system malfunction (note the broken front right leg).

Humans have the ability to seamlessly adapt to changes in their environments: adults can learn to walk on crutches in just a few seconds, people can adapt almost instantaneously to picking up an object that is unexpectedly heavy, and children who can walk on flat ground can quickly adapt their gait to walk uphill without having to relearn how to walk. This adaptation is critical for functioning in the real world.


Robots that Learn to Use Improvised Tools


In many animals, tool-use skills emerge from a combination of observational learning and experimentation. For example, by watching one another, chimpanzees can learn how to use twigs to “fish” for insects. Similarly, capuchin monkeys demonstrate the ability to wield sticks as sweeping tools to pull food closer to themselves. While one might wonder whether these are just illustrations of “monkey see, monkey do,” we believe these tool-use abilities indicate a greater level of intelligence.

Left: A chimpanzee fishing for termites. Right: A gorilla using a stick to gather herbs. (source)

The question our new work explores is: can we enable robots to use tools in the same way — through observation and experimentation?

A requisite for performing complex multi-object manipulation tasks, such as those involved in tool use, is an understanding of physical cause-and-effect relationships. Therefore, the ability to predict how one object might interact with another is crucial. Our prior work has investigated how visual predictive models of cause-and-effect can be learned from unsupervised robot interaction with the world. After learning such a model, the robot can plan to accomplish a diverse set of simple tasks, including cloth folding and object arrangement. However, if we consider the more complex interactions that occur in tool-use tasks, such as how a broom can sweep dirt into a dustpan, undirected experimentation isn’t enough.

Hence, taking inspiration from how animals learn, we designed an algorithm that allows robots to learn tool-use skills through a similar paradigm of imitation and interaction. In particular, we show that, with a mix of demonstration data and unsupervised experience, a robot can use novel objects as tools and even improvise tools in the absence of traditional ones. Further, depending on the demands of the task, our method demonstrates the ability to decide whether to use the provided tools. In this post, we will describe how this works.


CVPR 2019 Challenges on Domain Adaptation in Autonomous Driving


We all dream of a future in which autonomous cars can drive us to every corner of the world. Numerous researchers and companies are working day and night to chase this dream by overcoming scientific and technological barriers. One of the greatest challenges we still face is developing machine learning models that can be trained in a local environment and also perform well in new, unseen situations. For example, self-driving cars may utilize perception models to recognize drivable areas from images. Companies in Silicon Valley can build and perfect such a model using large local datasets from the Bay Area for training. However, if the same model were deployed in a snowy area such as Boston, it would likely perform miserably, because it has never seen snow before. Boston, during winter, and Silicon Valley, during any time of the year, can be labeled as separate domains for perception models, since they present clear differences in climate and challenges in perception. In other cases, domains may be much closer in nature, such as a city street and a nearby highway. The process of transferring knowledge and models between different domains in machine learning is called domain adaptation.

A large number of papers on domain adaptation of perception models have appeared in top publishing venues for machine learning and computer vision. However, most of these works focus on image classification and semantic segmentation. Hardly any attention has been paid to instance-level tasks, such as object detection and tracking, even though localization of nearby objects is arguably more important for autonomous driving. To foster the study of domain adaptation of perception models, Berkeley DeepDrive and Didi Chuxing are co-hosting two competitions in CVPR 2019 Workshop on Autonomous Driving. The challenges will focus on domain adaptation of object detection and tracking based on the BDD100K, from Berkeley DeepDrive, and D2-City, from Didi Chuxing, datasets. The domain of BDD100K covers US scenes, while D2-City was collected on China’s streets. The competitions ask participants to transfer object detectors from BDD100K to D2-City and object trackers from D2-city to BDD100K. More information about the challenges can be found on our website and D2-City.

Following our introduction of the BDD100K dataset, we have been busy working to provide more temporal annotations. Above is an example of object tracking annotation, created by our open-source annotation platform Scalabel. Some of the tracking labels are used in the domain adaptation challenge for object tracking. More data will be released this summer. Of course, we also have object tracking at night.


Announcing the BAIR Open Research Commons


The University of California Berkeley Artificial Intelligence Research (BAIR) Lab is pleased to announce the BAIR Open Research Commons, a new industrial affiliate program launched to accelerate cutting-edge AI research. AI research is advancing rapidly in both university and corporate research settings, with existing collaborations already underway driven by individual researcher-to-researcher collaborations. The BAIR Commons is designed to enhance and streamline such collaborative cutting-edge research by students, faculty, and corporate research scholars.

The Commons agreement has been framed with the goal of promoting open research in AI: all on-campus effort, data, and results in the Commons program will be non-exclusive with open publication and open-source code release expected. Fostering an environment for excellence for graduate student research is the primary motivation of the new program: Berkeley students will lead the design of projects in the Commons, and the program of research must be approved by their home departments before a project commences. Students are expected to benefit from collaboration with leading researchers in industrial research labs, as well as the availability of partner resources useful to investigate certain open questions in state-of-the-art AI research. The University will benefit from membership fees paid by partners to participate in the program. The Commons agreement provides for collaborative joint projects between the partners and Berkeley, with intellectual property shared jointly and equally by the parties.

The agreement also provides for joint research “lablets”, which will be embedded collaborative open research spaces inside BAIR’s 27,000 sq. ft. research facility opening this summer in the Berkeley Way West facility on the Berkeley campus. More than a dozen faculty and 120 students will be assigned space in the new lab, with an equal number of visiting positions allocated for researchers from other BAIR labs and for visiting industrial partners.

Initial alliance participants include Amazon, Facebook, Google, Samsung, and Wave Computing. Funding for over twenty joint projects has been committed in the initial launch of the program, which will support both BAIR facilities and research efforts. Over 30 faculty and 200 graduate students and postdocs at Berkeley are affiliated with BAIR. For more information about BAIR or the Commons program please contact bair-admin@berkeley.edu.

BAIR will occupy the top floor of Berkeley Way West.