top of page

Shuang Ma


I am a senior researcher in the Autonomous Systems Research team at Microsoft. I work in computer vision, machine learning and robotics. Most recently, my research is focused on building foundation models for perception and control. I am particularly interested in unifying perception and control from unlabeled ego-view videos and spatio-temporal multimodal data.  


I obtained my Ph.D. degree in Computer Science from SUNY Buffalo, where I was a member of Ubiquitous Multimedia Lab (UBMM). I was lucky to have my advisor Prof. Chang Wen Chen and my mentors Daniel McduffYale SongMary Czerwinski, Jianlong Fu and Tao Mei


Research Interests

  • Multimodal Learning

  • Computer Vision

  • Robotics and Reinforcement Learning


  • Our paper 'SMART: Self-supervised Multi-task pretrAining with contRol Transformers' is accepted by ICLR 2023 as notable top 25% (Spotlight).

  • Our paper 'LaTTe: Language Trajectory TransformEr' is accepted by ICRA 2023.

  • Our team EgoMotion-COMPASS got the 2nd place on two tasks of Ego4D challenge (ECCV 2022). 

  • Our paper 'Compass: Contrastive multimodal pretraining for autonomous systems' is accepted by IROS 2022.

  • Our paper 'Reshaping robot trajectories using natural language commands: A study of multi-modal data alignment using transformers' is accepted by IROS 2022

  • Our paper 'CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning' is accepted by CLeaR 2022.


  • Jianchen Lei, Zhejiang University 2023

  • Yao Wei, Zhejiang University 2023

  • Ruijie Zheng, University of Maryland 2023

  • Yanchao Sun, University of Maryland 2022

  • Arthur Fender Coelho Bucker,  Technical University of Munich (TUM) 2022

  • Weijian Xu, UC San Diego 2021

  • Cherie Ho, Carnegie Mellon University 2021

  • Zhaoyang Zeng, Sun Yat-sen University 2020

  • Mingzhi Yu, University of Pittsburgh 2020

bottom of page