My name is Hao Shi (石昊), a Master’s student in the Department of Automation at Tsinghua University, in a joint program with MEGVII Research, advised by Prof. Gao Huang and Xiangyu Zhang, and I also work closely with Tiancai Wang.
My research focuses on Embodied AI, Robot Learning, VLA, and World Model, aiming to build foundation models for general robotic systems.
I am expected to join HKU MMLab as a Ph.D. student advised by Prof. Ping Luo in Fall 2026.
📖 Education
M.Eng. in AI, Department of Automation, Tsinghua University, Beijing.
Advisors: Prof. Gao Huang and Xiangyu Zhang
B.Eng. student in Materials Science, Tianjin University.
📝 Research

RMBench: Memory-Dependent Robotic Manipulation Benchmark with Insights into Policy Design
Tianxing Chen*, Yuran Wang*, Mingleyang Li*, Yan Qin*, Hao Shi, Zixuan Li, Yifan Hu, Yingsheng Zhang, Kaixuan Wang, Yue Chen, Hongcheng Wang, Renjing Xu, Ruihai Wu, Yao Mu, Yaodong Yang, Hao Dong✉, Ping Luo✉
Under Review 2026 | Paper | Code | Homepage | Huggingface
- RMBench is a memory-oriented benchmark built on the RoboTwin platform, and it also provides a memory-enhanced hierarchical VLA model, Mem-0.

MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation
Hao Shi, Bin Xie, Yingfei Liu, Lin Sun, Fengrong Liu, Tiancai Wang, Erjin Zhou, Haoqiang Fan, Xiangyu Zhang, Gao Huang✉
ICLR 2026 | Paper | Code | Homepage | Huggingface
- MemoryVLA is among the early works exploring memory in Vision-Language-Action models. Inspired by human memory systems, it builds a hippocampal-like memory to capture the temporal dependencies. It has since been cited by numerous research institutions, including Physical Intelligence.

SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulation
Hao Shi, Bin Xie, Yingfei Liu, Yang Yue, Tiancai Wang, Haoqiang Fan, Xiangyu Zhang, Gao Huang✉
AAAI 2026 Oral (Accept rate≈4%) | Paper | Code | Homepage | Huggingface
- SpatialActor is a disentangled spatial representations framework for robust robotic manipulation.

Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding
Yani Zhang*, Dongming Wu*, Hao Shi, Yingfei Liu, Tiancai Wang, Haoqiang Fan, Xingping Dong✉
CVPR 2026 Findings | Paper | Code
- DEGround is an embodied perception framework for 3D grounding, achieving 1st place on EmbodiedScan.

Dexbotic: Open-Source Vision-Language-Action Toolbox
Dexbotic Team
Technical Report 2025 | Paper | Code | Homepage | Huggingface
- Dexbotic is an open-source VLA codebase (similar to MMDetection). It unifies multiple mainstream VLA frameworks and benchmarks, provides strong pretrained models, and has garnered nearly 1000 GitHub stars.

GeoVLA: Enpowering 3D Representations in Vision-Language-Action Models
Under Review 2025 | Paper | Code | Homepage
Lin Sun*, Bin Xie*, Yingfei Liu, Hao Shi, Tiancai Wang, Jiale Cao✉
- GeoVLA is a framework that bridges 2D semantics and 3D geometry for VLA, it achieves robustness across diverse camera views, object heights, and sizes.

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding
Henry Zheng*, Hao Shi*, Qihang Peng, Yong Xien Chng, Rui Huang, Yepeng Weng, Zhongchao Shi, Gao Huang✉
*: equal contribution, ✉: corresponding author.
- DenseGrounding is an embodied perception framework for multi-view 3D visual grounding, and it achieved first place on EmbodiedScan.

DenseG: Alleviating Vision-Language Feature Sparsity in Multi-View 3D Visual Grounding
Henry Zheng*, Hao Shi*, Yong Xien Chng, Rui Huang, Zanlin Ni, Tianyi Tan, Qihang Peng, Yepeng Weng, Zhongchao Shi, Gao Huang✉
*: equal contribution, ✉: corresponding author.
CVPR 2024 Workshop (Oral) | Paper | Code
- 1st Place and Innovation Award in CVPR 2024 Autonomous Grand Challenge, Embodied 3D Grounding Track (1/64 teams, 1/154 submissions).

Open Compound Domain Adaptation with Object Style Compensation for Semantic Segmentation
Tingliang Feng*, Hao Shi*, Xueyang Liu, Wei Feng, Liang Wan, Yanlin Zhou, Di Lin✉
*: equal contribution, ✉: corresponding author.
- We propose a memory-bank-based object-style compensation method for open compound domain adaptation.
🎖 Honors and Awards
- 2026.01, Tsinghua Deng Feng Fund, Tsinghua University. (¥15000)
- 2025.11, Minghong Scholarship, Comprehensive Excellence 1st Prize, Tsinghua University. (Top 10% in THU, ¥10000)
- 2024.11, Philobiblion Scholarship, Comprehensive Excellence 1st Prize, Tsinghua University. (Top 10% in THU, ¥10000)
- 2024.06, 1st Place and Innovation Award in CVPR 2024 Autonomous Grand Challenge, Embodied 3D Grounding Track. (1/154 submission, $9000)
- 2023.11, CXMT Scholarship, Comprehensive Excellence 1st Prize, Tsinghua University. (Top 10% in THU, ¥10000)
- 2023.06, Outstanding Bachelor’s Thesis Award, Tianjin University.
- 2023.06, Excellent Graduate Award, Tianjin University.
- 2021.12, Ministry of Education-Huawei Intelligent Base Future Stars.
- 2021.12, Huawei Intelligent Base Scholarship.
💻 Internship
Dexmal, Beijing
Department: Embodied Foundation Research Group
Mentors: Tiancai Wang, Yingfei Liu and Bin Xie
MEGVII, Beijing
Department: Foundation Model Group
Mentors: Tiancai Wang and Yingfei Liu
💬 Invited Talks
- 2026.01, invited talk about SpatialActor, AAAI 2026 Main Conference, Singapore
- 2025.09, invited talk about MemoryVLA, 3D视觉工坊, Online
- 2025.09, invited talk about MemoryVLA, 具身智能之心, Online
- 2024.06, invited talk about DenseGrounding, CVPR 2024 Workshop on Foundation Models for Autonomous Systems, Seattle
- 2024.06, invited talk about DenseGrounding, Technical Seminar on End-to-End Embodied Agent, Shanghai
🎓 Service
Reviewer / PC Member:
- ICLR 2026, ICLR 2025
- ICML 2026
- CVPR 2026
- ICCV 2025
- AAAI 2026
- IROS 2026