top of page

Shuang Ma

I am a research scientist at Apple AI/ML. My research interests center around multimodal learning, foundation models, reinforcement learning, and robotics. Recently, I have been primarily focused on developing large-scale foundational models for perception, decision making and reasoning. Before joining Apple, I was a researcher at Microsoft Research. I obtained my Ph.D. degree in Computer Science from SUNY Buffalo. I was lucky to have my advisor Prof. Chang Wen Chen and my mentors Daniel Mcduff, Yale Song, and Ashish Kapoor.

Email: yunyikristy <AT> gmail <DOT> com


Research Interests

  • Foundation models

  • Multimodal Learning

  • Computer Vision

  • Robotics and Reinforcement Learning


  • Our paper 'TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning' is accepted by NeurIPS 2023. [Paper]

  • Our paper 'Is Imitation All You Need? Generalized Decision-Making with Dual-Phase Training' is accepted by ICCV 2023. [Paper][Code

  • I am organizing workshop "PerDream: PERception, Decision making and REAsoning through Multimodal foundational modeling" at ICCV 2023. [Website]

  • I am co-organizing "Workshop on Robot Learning and SLAM" at ICCV 2023. [Website]

  • I am serving as Area Chair for NeurIPS 2023.

  • Our paper 'PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training' is accepted by IROS 2023. [Paper][Code]

  • We announced SMART and released our code! [Blog][MSR Research Focus][Code]

  • Our paper 'SMART: Self-supervised Multi-task pretrAining with contRol Transformers' is accepted by ICLR 2023 as notable top 25% (Spotlight).

  • Our paper 'LaTTe: Language Trajectory TransformEr' is accepted by ICRA 2023. [Blog][Code]

  • Our team EgoMotion-COMPASS got the 2nd place on two tasks of Ego4D challenge (ECCV 2022). 

  • We announced COMPASS and released our code! [Blog][Code]

  • Our paper 'COMPASS: Contrastive multimodal pretraining for autonomous systems' is accepted by IROS 2022.

  • Our paper 'Reshaping robot trajectories using natural language commands: A study of multi-modal data alignment using transformers' is accepted by IROS 2022

  • Our paper 'CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning' is accepted by CLeaR 2022. [Blog][Code]


  • Jianchen Lei, Zhejiang University 2023

  • Yao Wei, Zhejiang University 2023

  • Ruijie Zheng, University of Maryland 2023

  • Yanchao Sun, University of Maryland 2022

  • Arthur Fender Coelho Bucker,  Technical University of Munich (TUM) 2022

  • Weijian Xu, UC San Diego 2021

  • Cherie Ho, Carnegie Mellon University 2021

  • Zhaoyang Zeng, Sun Yat-sen University 2020

  • Mingzhi Yu, University of Pittsburgh 2020

bottom of page