Do they live in a simulation? Training models for dynamic environment

Generation of synthetic data for machine learning


Hallison Paz

Generation of synthetic data for machine learning

1. Why does synthetic data matter?

2. How to generate synthetic data and train a model with it

3. Do they live in a simulation? Training models for dynamic environments

Recap

Two weeks ago ...

  • We need a lot of data to train machine learning models
  • Creating large scale datasets is hard and presents both technical and ethical challenges
  • Synthetic data can help us overcome these challenges
  • But we must pay the cost of the Reality Gap
    • Domain randomization
    • Domain adaptation

Last week ...

  • There are many tools available to generate synthetic datasets
  • It's important to have a strategy to generate good data
    • We can iterate dataset versions
  • Even a small team can generate large scale synthetic datasets
    • There are already tutorials to help with this task

Today's agenda

  • Why simulations?
  • Techniques to train models for dynamic environments
  • Challenges of simulations for machine learning
  • Tools to generate simulations and capture data

Why Simulations?

Train agents for dynamic and complex environments

Tasks | Self-driving cars

h:480 center

Johnson-Roberson et al, 2017

Tasks | Unmanned Aircraft Systems

  • Aerial photography
  • Search and rescue
  • Package delivery...
Jeremy Castagno, Yu Yao and Ella Atkins. Realtime Rooftop Landing Site Identification and Selection in Urban City Simulation. 2019

Tasks | Human like manipulation

Tasks | Perform alongside humans

BMW Factory Digital Twin

Behavior | Digital Humans

  • How to train an agent to interact with humans?
    • Simulating humans

center

Anderson, C.; Du, X.; Vasudevan R.; & Johnson-Roberson, M. Stochastic Sampling Simulation for Pedestrian Trajectory Prediction. In 2019 IEEE/RSJ.

Behavior | Crowds

h:450 center

Behavior | Crowds

height:480

Computer Graphics | Stunts

DeepMotion: Physically Simulated AI Agents | Can They Replace Stuntmen?

Computer Graphics | Animation

center

New possibilities | Digital Humans...or dogs

What if we perform tasks in the virtual world?

Why learn on simulations

We want to...

  • Train intelligent agents to perform task in the real world
  • Simulate behaviors according to the real world

Why learn on simulations

We also want to...

  • Get accurate and better results
    • More data; dataset diversity...
  • Have a cheaper and easier process
    • Automatic annotations
  • Train Faster
    • "Time is an illusion": we can go faster than real-time
  • Train Safer
    • Some scenarios are dangerous to experiment in real life

Techniques

Reinforcement Learning

A model for the ideia of trial and error to train agents to make decisions in complex environments

Reinforcement Learning

  • Markov Decision Process (MDP), 5-tuple, S,A,R,P,ρ0\langle S, A, R, P, \rho_0 \rangle, where
    • S is the set of all valid states
    • A is the set of all valid actions
    • R:S×A×SRR : S \times A \times S \to \mathbb{R} is the reward function, with rt=R(st,at,st+1)r_t = R(s_t, a_t, s_{t+1})
    • P:S×AP(S)P : S \times A \to \mathcal{P}(S) is the transition probability function, with P(ss,a)P(s'|s,a) being the probability of transitioning into state ss' if you start in state ss and take action aa
    • ρ0\rho_0 is the starting state distribution.

Reinforcement Learning

  • Policy: rule used to decide what actions to take
    • Deterministic: at=μθ(st)a_t =\mu_{\theta}(s_t)
    • Stochastic: atπθ(st)a_t \sim \pi_{\theta}(\cdot | s_t)
  • Deep Reinforcement Learning: θ\theta parameters of the network
  • Trajectory: sequence of states and actions
    • τ=(s0,a0,s1,a1,...)\tau = (s_0, a_0, s_1, a_1, ...)

Reinforcement Learning

  • Reward or return

rt=R(st,at,st+1)r_t = R(s_t, a_t, s_{t+1})

  • Cumulative reward over a trajectory

R(τ)=t=0γtrtR(\tau) = \sum_{t=0}^{\infty} \gamma^t r_t

or

R(τ)=t=0TrtR(\tau) = \sum_{t=0}^T r_t

Reinforcement Learning

  • The goal in RL is to select a policy π\pi which maximizes expected return

π=argmaxπJ(π)\pi^* = \arg \max_{\pi} J(\pi)

J(π)=τP(τπ)R(τ)=EτπR(τ)J(\pi) = \int_{\tau} P(\tau|\pi) R(\tau) = \underset{\tau\sim \pi}{E}{R(\tau)}

P(τπ)=ρ0(s0)t=0T1P(st+1st,at)π(atst)P(\tau|\pi) = \rho_0 (s_0) \prod_{t=0}^{T-1} P(s_{t+1} | s_t, a_t) \pi(a_t | s_t)

Reinforcement Learning

Imitation Learning

IDEA: learn from an expert demonstration, rather than a carefully designed reward function.

  • Provides prior knowledgde to the system, rather than trial and error.

Imitation Learning

  • Behavioral Cloning
    • Directly replicating desired behaviour
  • Inverse Reinforcement Learning / Inverse Optimal Control
    • Learning the hidden objectives of the desired behavior

Challenges

Challenges

  • We know the goal, we know some methods, but...
  • We still need to deal with the Reality Gap
    • Now it's a dynamic environment (more parameters)
    • We have physical constraints
  • Agents usually must act in the real world
    • Sometimes models learn to "cheat" the simulator
  • How to reduce the effort to generate coeherent simulations?

Challenges

  • Domain randomization might require carefull and task-specific selection of parameters
Chebotar, Y.; Handa, A.; Makoviychuk, V.; Macklin, M.; Issac, J.; Ratliff, N.; & Fox, D. Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience". ICRA, 2019.

Closing the Sim-to-Real Loop

  • Overly wide distribution might be disadvantageous
    • Unrealistic scenarios
    • Stress over physical components
  • Randomization might be biased by the expertise of the practitioner
Chebotar, Y.; Handa, A.; Makoviychuk, V.; Macklin, M.; Issac, J.; Ratliff, N.; & Fox, D. Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience". ICRA, 2019.

Closing the Sim-to-Real Loop

  • Start with some initial distribution of the simulation parameters
  • Learn in simulation
  • Use real world roll-outs of learned policies to gradually change the simulation randomization

Challenges

  • Domain adaptation might be useful for general tasks

Grasp GAN

  • Pixel-level domain adaptation
  • Pseudo-real images correct some sim-to-real gap
  • However, multi-pixel features or structures may be arbitrarily modified or removed
Konstantinos Bousmalis, Alex Irpan, Paul Wohlhart, Yunfei Bai, Matthew Kelcey, Mrinal Kalakrishnan, Laura Downs, Julian Ibarz, Peter Pastor Sampedro, Kurt Konolige, Sergey Levine, & Vincent Vanhoucke (2018). Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping.

Retina GAN

  • Retina-GAN involves a CycleGAN
  • Strong object semantic awareness through an object detection consistency loss
  • Tests object detection feature understanding for robotics applications
Daniel Ho, Kanishka Rao, Zhuo Xu, Eric Jang, Mohi Khansari, & Yunfei Bai. (2020). RetinaGAN: An Object-aware Approach to Sim-to-Real Transfer.

Retina GAN


Retina GAN

  • Different task and method
  • Training on data from three separate RetinaGAN models
    • different random seeds and consistency loss weights

Tools

Tools

Conclusion

Conclusion

  • Simulations are important resources for training intelligent agents
  • Reinforcement Learning is a powerful technique for this task
  • Dynamic environments present new possibilities, but also new challenges
  • Simulation platforms keep evolving to support this paradigm - even adding differentiability!

Wow! What's next?

We made it!

Generation of synthetic data for machine learning

1. Why does synthetic data matter?

2. How to generate synthetic data and train a model with it

3. Do they live in a simulation? Training models for dynamic environments

THANK YOU!

hallpaz@impa.br

_footer: [Deep Dribble](https://blog.deepmotion.com/2018/08/07/deepdribble-simulating-basketball-with-ai/): Simulating Basketball with AI

Simulating many years of robotic interaction is quite feasible with modern parallel computing, physics simulation, and rendering technology. Moreover, the resulting data comes with automatically-generated annotations, which is particularly important for tasks where success is hard to infer automatically. The challenge with simulated training is that even the best available simulators do not perfectly capture reality. Models trained purely on synthetic data fail to generalize to the real world, as there is a discrepancy between simulated and real environments, in terms of both visual and physical properties. In fact, the more we increase the fidelity of our simulations, the more effort we have to expend in order to build them, both in terms of implementing complex physical phenomena and in terms of creating the content (e.g., objects, backgrounds) to populate these simulations. This difficulty is compounded by the fact that powerful optimization methods based on deep learning are exceptionally proficient at exploiting simulator flaws: the more powerful the machine learning algorithm, the more likely it is to discover how to "cheat" the simulator to succeed in ways that are infeasible in the real world. The question then becomes: how can a robot utilize simulation to enable it to perform useful tasks in the real world