Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency
Shanghai Jiao Tong Univeristy1
UC Berkeley2
New York Univeristy3
UC San Diego4


   
Abstract:

At the heart of many robotics problems is the challenge of learning correspondences across domains. For instance, imitation learning requires obtaining correspondence between humans and robots; sim-to-real requires correspondence between physics simulators and the real world; transfer learning requires correspondences between different robotics environments. This paper aims to learn correspondence across domains differing in representation (vision vs. internal state), physics parameters (mass and friction), and morphology (number of limbs). Importantly, correspondences are learned using unpaired and randomly collected data from the two domains. We propose dynamics cycles that align dynamic robot behavior across two domains using a cycle-consistency constraint. Once this correspondence is found, we can directly transfer the policy trained on one domain to the other, without needing any additional fine-tuning on the second domain. We perform experiments across a variety of problem domains, both in simulation and on real robot. Our framework is able to align uncalibrated monocular video of a real robot arm to dynamic state-action trajectories of a simulated arm without paired data.







Real Xarm Robot Experiment Visualization


Estimate robot arm joint pose without any labels.



               (a) Real image input;                                              (b) Our model prediciton;                                (c) Cycle-GAN model prediction.



Cross morphology Policy Transfer


Cross-morphology experiment visualization. Please notice that there is no any new reward for fine-tuning during test time.

             

(a) Train a policy on two-leg cheetah and test on three-leg cheetah.                     (b) Train a policy on three-limb swimmer and test on four-limb.




HalfCheetah Training Process Visualization


Estimated state visualization during training process: (a) One random sampled image from the dataset. (b) Estimate the state and rerender the image for different training iterations. (c) L1 error (between model output state and the groundtruth) changing curve during the training process: the horizontal axis represents iteration number and the vertical axis represents L1 error. Please notice that there is no any paired data for training.

      
     (a) One random sampled image.                 (b) Estimate the state and rerender the image.                  (c) (L1 error)---(training iteration) curve.




HalfCheetah State Estimation Comparison


Self-supervised state estimation visualization: estimate the state from image observation and rerender the new image.

             

                 (a) Our (Cycle-Dynamics) experiment results.                                                           (b) Cycle-GAN baseline results.




Paper:

Bibtex:
   
      @inproceedings{
          zhang2021learning,
          title={Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency},
          author={Qiang Zhang and Tete Xiao and Alexei A Efros and Lerrel Pinto and Xiaolong Wang},
          booktitle={International Conference on Learning Representations},
          year={2021},
          url={https://openreview.net/forum?id=QIRlze3I6hX}
      }
    

Acknowledgements:
We like to thank Jeannette Bohg for helpful discussions on this project. This work was supported, in part, by grants from DARPA, NSF 1730158 CI-New: Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI), NSF ACI-1541349 CC*DNI Pacific Research Platform, research grants from Qualcomm, Berkeley DeepDrive, and SAP, and a gift from TuSimple.