Simulated humanoid performing a variety of highly dynamic and acrobatic skills.
Motion control problems have become standard benchmarks for reinforcement
learning, and deep RL methods have been shown to be effective for a diverse
suite of tasks ranging from manipulation to locomotion. However, characters
trained with deep RL often exhibit unnatural behaviours, bearing artifacts such
as jittering, asymmetric gaits, and excessive movement of
limbs. Can we train our characters to produce more natural behaviours?
Left: Given movie poster, Right: New movie title generated by MC-GAN.
Text is a prominent visual element of 2D design. Artists invest significant time
into designing glyphs that are visually compatible with other elements in their
shape and texture. This process is labor intensive and artists often design only
the subset of glyphs that are necessary for a title or an annotation, which
makes it difficult to alter the text after the design is created, or to transfer
an observed instance of a font to your own project.
Early research on glyph synthesis focused on geometric modeling of outlines,
which is limited to particular glyph topology (e.g., cannot be applied to
decorative or hand-written glyphs) and cannot be used with image input.
With the rise of deep neural networks, researchers have looked at modeling
glyphs from images. On the other hand, synthesizing data consistent with
partial observations is an interesting problem in computer vision and graphics
such as multi-view image generation, completing missing regions in images,
and generating 3D shapes. Font data is an example that provides a clean factorization
of style and content.
Recent advances in conditional generative adversarial networks (cGANS) [1] have
been successful in many generative applications. However, they do best only with
fairly specialized domains and not with general or multi-domain style transfer.
Similarly, when directly used to generate fonts, cGAN models produce significant
artifacts. For instance, given the following five letters,
a conditional GAN model is not successful in generating all 26 letters with the same style:
Humans physically interact with each other every day – from grabbing someone’s hand when they are about to spill their drink, to giving your friend a nudge to steer them in the right direction, physical interaction is an intuitive way to convey information about personal preferences and how to perform a task correctly.
So why aren’t we physically interacting with current robots the way we do with each other? Seamless physical interaction between a human and a robot requires a lot: lightweight robot designs, reliable torque or force sensors, safe and reactive control schemes, the ability to predict the intentions of human collaborators, and more! Luckily, robotics has made many advances in the design of personal robots specifically developed with humans in mind.
However, consider the example from the beginning where you grab your friend’s hand as they are about to spill their drink. Instead of your friend who is spilling, imagine it was a robot. Because state-of-the-art robot planning and control algorithms typically assume human physical interventions are disturbances, once you let go of the robot, it will resume its erroneous trajectory and continue spilling the drink. The key to this gap comes from how robots reason about physical interaction: instead of thinking about why the human physically intervened and replanning in accordance with what the human wants, most robots simply resume their original behavior after the interaction ends.
We argue that robots should treat physical human interaction as useful information about how they should be doing the task. We formalize reacting to physical interaction as an objective (or reward) learning problem and propose a solution that enables robots to change their behaviors while they are performing a task according to the information gained during these interactions.
Feature selection is a common method for dimensionality reduction that
encourages model interpretability. With large data sets becoming ever more
prevalent, feature selection has seen widespread usage across a variety of
real-world tasks in recent years, including text classification, gene selection
from microarray data, and face recognition. We study the problem of supervised
feature selection, which entails finding a subset of the input features that
explains the output well. This practice can reduce the computational expense of
downstream learning by removing features that are redundant or noisy, while
simultaneously providing insight into the data through the features that remain.
Feature selection algorithms can generally be divided into three main
categories: filter methods, wrapper methods, and embedded methods. Filter
methods select features based on intrinsic properties of the data, independent
of the learning algorithm to be used. For example, we may compute the
correlation between each feature and the response variable, and select the
variables with the highest correlation. Wrapper methods are more specialized in
contrast, aiming to find features that optimize the performance of a specific
predictor. For example, we may train multiple SVMs, each with a different subset
of features, and choose the subset of features with the lowest loss on the
training data. Because there are exponentially many subsets of features, wrapper
methods often employ greedy algorithms. Finally, embedded methods are
multipurpose techniques that incorporate feature selection and prediction into a
single problem, often by optimizing an objective combining a goodness-of-fit
term with a penalty on the number of parameters. One example is the LASSO method
for constructing a linear model, which penalizes the coefficients with an
$\ell_1$ penalty.
In this post, we propose conditional covariance minimization (CCM), a feature
selection method that aims to unify the first two perspectives. We first
describe our approach in the sections that follow. We then demonstrate through
several synthetic experiments that our method is capable of capturing joint
nonlinear relationships between collections of features. Finally, we show that
our algorithm has performance comparable to or better than several other popular
feature selection algorithms on a variety of real-world tasks.
As machine learning algorithms and techniques have advanced, more and more machine learning applications require multiple machines and must exploit parallelism.
However, the infrastructure for doing machine learning on clusters remains ad-hoc. While good solutions for specific use cases (e.g., parameter servers or hyperparameter search) and high-quality distributed systems outside of AI do exist (e.g., Hadoop or Spark), practitioners developing algorithms at the frontier often build their own systems infrastructure from scratch. This amounts to a lot of redundant effort.
As an example, take a conceptually simple algorithm like Evolution Strategies for reinforcement learning. The algorithm is about a dozen lines of pseudocode, and its Python implementation doesn’t take much more than that. However, running the algorithm efficiently on a larger machine or cluster requires significantly more software engineering. The authors’ implementation involves thousands of lines of code and must define communication protocols, message serialization and deserialization strategies, and various data handling strategies.
One of Ray’s goals is to enable practitioners to turn a prototype algorithm that runs on a laptop into a high-performance distributed application that runs efficiently on a cluster (or on a single multi-core machine) with relatively few additional lines of code. Such a framework should include the performance benefits of a hand-optimized system without requiring the user to reason about scheduling, data transfers, and machine failures.
Ivan Evtimov, Kevin Eykholt, Earlence Fernandes, and Bo LiDec 30, 2017
This post is based on recent research by Ivan Evtimov, Kevin Eykholt, Earlence
Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, Dawn Song, and
Florian Tramèr.
Deep neural networks (DNNs) have enabled great progress in a variety of
application areas, including image processing, text analysis, and speech
recognition. DNNs are also being incorporated as an important component in many
cyber-physical systems. For instance, the vision system of a self-driving car
can take advantage of DNNs to better recognize pedestrians, vehicles, and road
signs. However, recent research has shown that DNNs are vulnerable to
adversarial examples: Adding carefully crafted adversarial perturbations to the
inputs can mislead the target DNN into mislabeling them during run time. Such
adversarial examples raise security and safety concerns when applying DNNs in
the real world. For example, adversarially perturbed inputs could mislead the
perceptual systems of an autonomous vehicle into misclassifying road signs, with
potentially catastrophic consequences.
There have been several techniques proposed to generate adversarial examples
and to defend against them. In this blog post we will briefly introduce
state-of-the-art algorithms to generate digital adversarial examples, and
discuss our algorithm to generate physical adversarial examples on real
objects under varying environmental conditions. We will also provide an update
on our efforts to generate physical adversarial examples for object detectors.
Reinforcement Learning (RL) is a powerful technique capable of solving complex tasks such as locomotion, Atari games, racing games, and robotic manipulation tasks, all through training an agent to optimize behaviors over a reward function. There are many tasks, however, for which it is hard to design a reward function that is both easy to train and that yields the desired behavior once optimized. Suppose we want a robotic arm to learn how to place a ring onto a peg. The most natural reward function would be for an agent to receive a reward of 1 at the desired end configuration and 0 everywhere else. However, the required motion for this task–to align the ring at the top of the peg and then slide it to the bottom–is impractical to learn under such a binary reward, because the usual random exploration of our initial policy is unlikely to ever reach the goal, as seen in Video 1a. Alternatively, one can try to shape the reward function to potentially alleviate this problem, but finding a good shaping requires considerable expertise and experimentation. For example, directly minimizing the distance between the center of the ring and the bottom of the peg leads to an unsuccessful policy that smashes the ring against the peg, as in Video 1b. We propose a method to learn efficiently without modifying the reward function, by automatically generating a curriculum over start positions.
Video 1a: A randomly initialized policy is unable to reach the goal from most start positions, hence being unable to learn.
Video 1b: Shaping the reward with a penalty on the distance from the ring center to the peg bottom yields an undesired behavior.
In modern factories, human workers and robots are two major workforces. For safety concerns, the two are normally separated with robots confined in metal cages, which limits the productivity as well as the flexibility of production lines. In recent years, attention has been directed to remove the cages so that human workers and robots may collaborate to create a human-robot co-existing factory. Manufacturers are interested in combining human’s flexibility and robot’s productivity in flexible production lines. The potential benefits of industrial co-robots are huge and extensive, e.g. they may be placed in human-robot teams in flexible production lines, where robot arms and human workers cooperate in handling workpieces, and automated guided vehicles (AGV) co-inhabit with human workers to facilitate factory logistics. In the factories of the future, more and more human-robot interactions are anticipated to take place. Unlike traditional robots that work in structured and deterministic environments, co-robots need to operate in highly unstructured and stochastic environments. The fundamental problem is how to ensure that co-robots operate efficiently and safely in dynamic uncertain environments. In this post, we introduce the robot safe interaction system developed in the Mechanical System Control (MSC) lab.
Fig. 1. The factory of the future with human-robot collaborations.
Sylvia Herbert, David Fridovich-Keil, and Claire TomlinDec 5, 2017
The Problem: Fast and Safe Motion Planning
Real time autonomous motion planning and navigation is hard, especially when we
care about safety. This becomes even more difficult when we have systems with
complicated dynamics, external disturbances (like wind), and a priori unknown
environments. Our goal in this work is to “robustify” existing real-time motion
planners to guarantee safety during navigation of dynamic systems.
Fig 1. A learned neural network dynamics model enables a hexapod robot to learn
to run and follow desired trajectories, using just 17 minutes of real-world
experience.
Enabling robots to act autonomously in the real-world is difficult. Really,
really difficult. Even with expensive robots and teams of world-class
researchers, robots still have difficulty autonomously navigating and
interacting in complex, unstructured environments.
Why are autonomous robots not out in the world among us? Engineering systems
that can cope with all the complexities of our world is hard. From nonlinear
dynamics and partial observability to unpredictable terrain and sensor
malfunctions, robots are particularly susceptible to Murphy’s law: everything
that can go wrong, will go wrong. Instead of fighting Murphy’s law by coding
each possible scenario that our robots may encounter, we could instead choose to
embrace this possibility for failure, and enable our robots to learn from it.
Learning control strategies from experience is advantageous because, unlike
hand-engineered controllers, learned controllers can adapt and improve with more
data. Therefore, when presented with a scenario in which everything does go
wrong, although the robot will still fail, the learned controller will hopefully
correct its mistake the next time it is presented with a similar scenario. In
order to deal with complexities of tasks in the real world, current
learning-based methods often use deep neural networks, which are powerful but
not data efficient: These trial-and-error based learners will most often still
fail a second time, and a third time, and often thousands to millions of times.
The sample inefficiency of modern deep reinforcement learning methods is one of
the main bottlenecks to leveraging learning-based methods in the real-world.
We have been investigating sample-efficient learning-based approaches with
neural networks for robot control. For complex and contact-rich simulated
robots, as well as real-world robots (Fig. 1), our approach is able to learn
locomotion skills of trajectory-following using only minutes of data collected
from the robot randomly acting in the environment. In this blog post, we’ll
provide an overview of our approach and results. More details can be found in
our research papers listed at the bottom of this post, including this paper
with code here.