My name is Hao Shi (石昊), a Master’s student in the Department of Automation at Tsinghua University, in a joint program with MEGVII Research, advised by Prof. Gao Huang and Xiangyu Zhang, and I also work closely with Tiancai Wang.

My research focuses on Embodied AI, Robot Learning, VLA, and World Model, aiming to build foundation models for general robotic systems.

I am expected to join HKU MMLab as a Ph.D. student advised by Prof. Ping Luo in Fall 2026.

📝 Research

Under Review 2026
sym

MemoryVLA++: Temporal Modeling via Memory and Imagination in Vision-Language-Action Models

Hao Shi, Weiye Li, Bin Xie, Yulin Wang, Renping Zhou, Tiancai Wang, Xiangyu Zhang, Ping Luo, Gao Huang✉

Under Review 2026 | Paper | Code | Homepage | Huggingface

  • MemoryVLA++ is the extended journal version of MemoryVLA, advancing it from past-only memory modeling to full temporal modeling with both past memory and future imagination.
Under Review 2026
sym

RMBench: Memory-Dependent Robotic Manipulation Benchmark with Insights into Policy Design

Tianxing Chen*, Yuran Wang*, Mingleyang Li*, Yan Qin*, Hao Shi, Zixuan Li, Yifan Hu, Yingsheng Zhang, Kaixuan Wang, Yue Chen, Hongcheng Wang, Renjing Xu, Ruihai Wu, Yao Mu, Yaodong Yang, Hao Dong✉, Ping Luo✉

Under Review 2026 | Paper | Code | Homepage | Huggingface

  • RMBench is a memory-oriented benchmark built on the RoboTwin, and it also provides a memory-enhanced hierarchical VLA model, Mem-0.
ICLR 2026
sym

MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation

Hao Shi, Bin Xie, Yingfei Liu, Lin Sun, Fengrong Liu, Tiancai Wang, Erjin Zhou, Haoqiang Fan, Xiangyu Zhang, Gao Huang✉

ICLR 2026 | CVPR 2026 Workshop Oral | Paper | Code | Homepage | Huggingface

  • MemoryVLA is among the early works to explore memory in VLA models, introducing a hippocampus-inspired memory to capture temporal dependencies. It has since been cited over 100 times, including by Physical Intelligence.
AAAI 2026 (Oral)
sym

SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulation

Hao Shi, Bin Xie, Yingfei Liu, Yang Yue, Tiancai Wang, Haoqiang Fan, Xiangyu Zhang, Gao Huang✉

AAAI 2026 Oral (Accept rate≈4%) | Paper | Code | Homepage | Huggingface

  • SpatialActor is a disentangled spatial representations framework for robust robotic manipulation.
IROS 2026
sym

GeoVLA: Enpowering 3D Representations in Vision-Language-Action Models

Lin Sun*, Bin Xie*, Yingfei Liu, Hao Shi, Tiancai Wang, Jiale Cao✉

IROS 2026 | Paper | Code | Homepage

  • GeoVLA is a unified VLA framework that bridges 2D semantics and 3D geometry.
CVPR 2026 Findings
sym

DEGround: An Effective Baseline for Ego-centric 3D Visual Grounding with a Homogeneous Framework

Yani Zhang*, Dongming Wu*, Hao Shi, Yingfei Liu, Tiancai Wang, Haoqiang Fan, Xingping Dong✉

CVPR 2026 Findings | Paper | Code

  • DEGround is an embodied perception framework for 3D grounding, achieving 1st place on EmbodiedScan.
Technical Report 2025
sym

Dexbotic: Open-Source Vision-Language-Action Toolbox

Dexbotic Team

Technical Report 2025 | Paper | Code | Homepage | Huggingface

  • Dexbotic is an open-source VLA codebase, similar to MMDetection, that unifies mainstream VLA frameworks and benchmarks, provides strong pretrained models, and has received 1200+ GitHub stars.
ICLR 2025
sym

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding

Henry Zheng*, Hao Shi*, Qihang Peng, Yong Xien Chng, Rui Huang, Yepeng Weng, Zhongchao Shi, Gao Huang✉

*: equal contribution, ✉: corresponding author.

ICLR 2025 | CVPR 2024 Workshop Oral | Paper | Code

  • DenseGrounding is an embodied perception framework for multi-view 3D visual grounding, which won the 1st Place and Innovation Award in CVPR 2024 Autonomous Grand Challenge ($9000).
NeurIPS 2023
sym

Open Compound Domain Adaptation with Object Style Compensation for Semantic Segmentation

Tingliang Feng*, Hao Shi*, Xueyang Liu, Wei Feng, Liang Wan, Yanlin Zhou, Di Lin✉

*: equal contribution, ✉: corresponding author.

NeurIPS 2023 | Paper | Code

  • We propose a memory-bank-based object-style compensation method for open compound domain adaptation.

🎖 Honors and Awards

  • 2026.06, 3rd Prize in CVPR 2026 ManiSkill-ViTac Challenge
  • 2026.05, Beijing Outstanding Graduate Award. (Only 1 Master in Dept. Automation, THU)
  • 2026.05, ICML Gold Reviewer Award.
  • 2026.01, Tsinghua Deng Feng Fund, Tsinghua University. (¥15000)
  • 2025.11, Minghong Scholarship, Comprehensive Excellence 1st Prize, Tsinghua University. (Top 10% in THU, ¥10000)
  • 2024.11, Philobiblion Scholarship, Comprehensive Excellence 1st Prize, Tsinghua University. (Top 10% in THU, ¥10000)
  • 2024.06, 1st Place and Innovation Award in CVPR 2024 Autonomous Grand Challenge, Embodied 3D Grounding Track. (1/154 submission, $9000)
  • 2023.11, CXMT Scholarship, Comprehensive Excellence 1st Prize, Tsinghua University. (Top 10% in THU, ¥10000)
  • 2023.06, Outstanding Bachelor’s Thesis Award, Tianjin University.
  • 2021.12, Huawei Intelligent Base Scholarship, Ministry of Education-Huawei Intelligent Base Future Stars.

📖 Education

The University of Hong Kong
2026.09 – 2030.06 (expected)
Incoming Ph.D. @ MMLAB, HKU, Hong Kong.
Advisor: Prof. Ping Luo
Tsinghua University
2023.09 – 2026.06 (expected)
M.Eng. @ LeapLab, Tsinghua University, Beijing.
Advisors: Prof. Gao Huang and Dr. Xiangyu Zhang
Tianjin University
2020.06 – 2023.06
B.Eng. in Computer Science, Tianjin University.
Academic advisor: Prof. Di Lin
Tianjin University
2019.09 – 2020.06
B.Eng. student in Materials Science, Tianjin University.

💻 Internship

Dexmal
2025.03 – present
Dexmal, Embodied Foundation Algorithm Group, Beijing
Mentors: Tiancai Wang, Yingfei Liu and Bin Xie
MEGVII
2024.08 – 2025.02
MEGVII, Foundation Model Group, Beijing
Mentors: Tiancai Wang and Yingfei Liu

💬 Invited Talks

🎓 Service

Reviewer / PC Member:

  • International Conference on Learning Representations (ICLR)
  • International Conference on Machine Learning (ICML)
  • Annual Conference on Neural Information Processing Systems (NeurIPS)
  • IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR)
  • IEEE / CVF International Conference on Computer Vision (ICCV)
  • Annual AAAI Conference on Artificial Intelligence (AAAI)
  • IEEE / RSJ International Conference on Intelligent Robots and Systems (IROS)
  • Transactions on Machine Learning Research (TMLR)