My name is Hao Shi (石昊), a Master’s student in the Department of Automation at Tsinghua University, in a joint program with MEGVII Research, advised by Prof. Gao Huang and Xiangyu Zhang, and I also work closely with Tiancai Wang.

My research focuses on Embodied AI, Robot Learning, VLA, and World Model, aiming to build foundation models for general robotic systems.

I am expected to join HKU MMLab as a Ph.D. student advised by Prof. Ping Luo in Fall 2026.

📝 Research

Under Review 2026

RMBench: Memory-Dependent Robotic Manipulation Benchmark with Insights into Policy Design

Tianxing Chen*, Yuran Wang*, Mingleyang Li*, Yan Qin*, Hao Shi, Zixuan Li, Yifan Hu, Yingsheng Zhang, Kaixuan Wang, Yue Chen, Hongcheng Wang, Renjing Xu, Ruihai Wu, Yao Mu, Yaodong Yang, Hao Dong✉, Ping Luo✉

Under Review 2026 | Paper | Code | Homepage | Huggingface

RMBench is a memory-oriented benchmark built on the RoboTwin platform, and it also provides a memory-enhanced hierarchical VLA model, Mem-0.

ICLR 2026

MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation

Hao Shi, Bin Xie, Yingfei Liu, Lin Sun, Fengrong Liu, Tiancai Wang, Erjin Zhou, Haoqiang Fan, Xiangyu Zhang, Gao Huang✉

MemoryVLA is among the early works exploring memory in Vision-Language-Action models. Inspired by human memory systems, it builds a hippocampal-like memory to capture the temporal dependencies. It has since been cited by numerous research institutions, including Physical Intelligence.

AAAI 2026 (Oral)

SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulation

Hao Shi, Bin Xie, Yingfei Liu, Yang Yue, Tiancai Wang, Haoqiang Fan, Xiangyu Zhang, Gao Huang✉

AAAI 2026 Oral (Accept rate≈4%) | Paper | Code | Homepage | Huggingface

SpatialActor is a disentangled spatial representations framework for robust robotic manipulation.

CVPR 2026 Findings

Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding

Yani Zhang*, Dongming Wu*, Hao Shi, Yingfei Liu, Tiancai Wang, Haoqiang Fan, Xingping Dong✉

CVPR 2026 Findings | Paper | Code

DEGround is an embodied perception framework for 3D grounding, achieving 1st place on EmbodiedScan.

Technical Report 2025

Dexbotic: Open-Source Vision-Language-Action Toolbox

Dexbotic Team

Technical Report 2025 | Paper | Code | Homepage | Huggingface

Dexbotic is an open-source VLA codebase (similar to MMDetection). It unifies multiple mainstream VLA frameworks and benchmarks, provides strong pretrained models, and has garnered nearly 1000 GitHub stars.

Under Review 2025

GeoVLA: Enpowering 3D Representations in Vision-Language-Action Models

Under Review 2025 | Paper | Code | Homepage

Lin Sun*, Bin Xie*, Yingfei Liu, Hao Shi, Tiancai Wang, Jiale Cao✉

GeoVLA is a framework that bridges 2D semantics and 3D geometry for VLA, it achieves robustness across diverse camera views, object heights, and sizes.

ICLR 2025

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding

Henry Zheng*, Hao Shi*, Qihang Peng, Yong Xien Chng, Rui Huang, Yepeng Weng, Zhongchao Shi, Gao Huang✉

*: equal contribution, ✉: corresponding author.

ICLR 2025 | Paper | Code

DenseGrounding is an embodied perception framework for multi-view 3D visual grounding, and it achieved first place on EmbodiedScan.

CVPRW 2024 (Oral)

DenseG: Alleviating Vision-Language Feature Sparsity in Multi-View 3D Visual Grounding

Henry Zheng*, Hao Shi*, Yong Xien Chng, Rui Huang, Zanlin Ni, Tianyi Tan, Qihang Peng, Yepeng Weng, Zhongchao Shi, Gao Huang✉

*: equal contribution, ✉: corresponding author.

CVPR 2024 Workshop Oral | Paper | Code

1st Place and Innovation Award in CVPR 2024 Autonomous Grand Challenge, Embodied 3D Grounding Track (1/64 teams, 1/154 submissions).

NeurIPS 2023

Open Compound Domain Adaptation with Object Style Compensation for Semantic Segmentation

Tingliang Feng*, Hao Shi*, Xueyang Liu, Wei Feng, Liang Wan, Yanlin Zhou, Di Lin✉

*: equal contribution, ✉: corresponding author.

NeurIPS 2023 | Paper | Code

We propose a memory-bank-based object-style compensation method for open compound domain adaptation.

🎖 Honors and Awards

2026.05, Beijing Outstanding Graduate Award. (Only 1 Master in Dept. Automation, THU)
2026.05, ICML Gold Reviewer Award.
2026.01, Tsinghua Deng Feng Fund, Tsinghua University. (￥15000)
2025.11, Minghong Scholarship, Comprehensive Excellence 1st Prize, Tsinghua University. (Top 10% in THU, ￥10000)
2024.11, Philobiblion Scholarship, Comprehensive Excellence 1st Prize, Tsinghua University. (Top 10% in THU, ￥10000)
2024.06, 1st Place and Innovation Award in CVPR 2024 Autonomous Grand Challenge, Embodied 3D Grounding Track. (1/154 submission, $9000)
2023.11, CXMT Scholarship, Comprehensive Excellence 1st Prize, Tsinghua University. (Top 10% in THU, ￥10000)
2023.06, Outstanding Bachelor’s Thesis Award, Tianjin University.
2023.06, Outstanding Graduate Award, Tianjin University.
2021.12, Huawei Intelligent Base Scholarship, Ministry of Education-Huawei Intelligent Base Future Stars.

📖 Education

2026.09 – 2030.06 (expected)
Incoming Ph.D. @ MMLAB, HKU, Hong Kong.
Advisor: Prof. Ping Luo

2023.09 – 2026.06 (expected)
M.Eng. @ LeapLab, Tsinghua University, Beijing.
Advisors: Prof. Gao Huang and Dr. Xiangyu Zhang

2020.06 – 2023.06
B.Eng. in Computer Science, Tianjin University.
Academic advisor: Prof. Di Lin

2019.09 – 2020.06
B.Eng. student in Materials Science, Tianjin University.

💻 Internship

2025.03 – present
Dexmal, Embodied Foundation Algorithm Group, Beijing
Mentors: Tiancai Wang, Yingfei Liu and Bin Xie

2024.08 – 2025.02
MEGVII, Foundation Model Group, Beijing
Mentors: Tiancai Wang and Yingfei Liu

💬 Invited Talks

2026.01, invited talk about SpatialActor, AAAI 2026 Main Conference, Singapore
2025.09, invited talk about MemoryVLA, 3D视觉工坊, Online
2025.09, invited talk about MemoryVLA, 具身智能之心, Online
2024.06, invited talk about DenseGrounding, CVPR 2024 Workshop on Foundation Models for Autonomous Systems, Seattle
2024.06, invited talk about DenseGrounding, Technical Seminar on End-to-End Embodied Agent, Shanghai

🎓 Service

Reviewer / PC Member:

ICLR 2026, ICLR 2025
ICML 2026
NeurIPS 2026
CVPR 2026
ICCV 2025
AAAI 2026
IROS 2026