My name is Hao Shi (石昊), a Master’s student in the Department of Automation at Tsinghua University, in a joint program with MEGVII Research, advised by Prof. Gao Huang and Xiangyu Zhang, and I also work closely with Tiancai Wang.

My research focuses on Embodied AI, Robot Learning, VLA, and World Model, aiming to build foundation models for general robotic systems.

I am expected to join HKU MMLab as a Ph.D. student advised by Prof. Ping Luo in Fall 2026.

📖 Education

Tsinghua University
2023.09 – 2026.06 (expected)
M.Eng. in AI, Department of Automation, Tsinghua University, Beijing.
Advisors: Prof. Gao Huang and Xiangyu Zhang
Tianjin University
2020.06 – 2023.06
B.Eng. in Computer Science, Tianjin University.
Academic advisor: Prof. Di Lin
Tianjin University
2019.09 – 2020.06
B.Eng. student in Materials Science, Tianjin University.

📝 Research

Under Review 2026
sym

RMBench: Memory-Dependent Robotic Manipulation Benchmark with Insights into Policy Design

Tianxing Chen*, Yuran Wang*, Mingleyang Li*, Yan Qin*, Hao Shi, Zixuan Li, Yifan Hu, Yingsheng Zhang, Kaixuan Wang, Yue Chen, Hongcheng Wang, Renjing Xu, Ruihai Wu, Yao Mu, Yaodong Yang, Hao Dong✉, Ping Luo✉

Under Review 2026 | Paper | Code | Homepage | Huggingface

  • RMBench is a memory-oriented benchmark built on the RoboTwin platform, and it also provides a memory-enhanced hierarchical VLA model, Mem-0.
ICLR 2026
sym

MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation

Hao Shi, Bin Xie, Yingfei Liu, Lin Sun, Fengrong Liu, Tiancai Wang, Erjin Zhou, Haoqiang Fan, Xiangyu Zhang, Gao Huang✉

ICLR 2026 | Paper | Code | Homepage | Huggingface

  • MemoryVLA is among the early works exploring memory in Vision-Language-Action models. Inspired by human memory systems, it builds a hippocampal-like memory to capture the temporal dependencies. It has since been cited by numerous research institutions, including Physical Intelligence.
AAAI 2026 (Oral)
sym

SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulation

Hao Shi, Bin Xie, Yingfei Liu, Yang Yue, Tiancai Wang, Haoqiang Fan, Xiangyu Zhang, Gao Huang✉

AAAI 2026 Oral (Accept rate≈4%) | Paper | Code | Homepage | Huggingface

  • SpatialActor is a disentangled spatial representations framework for robust robotic manipulation.
CVPR 2026 Findings
sym

Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding

Yani Zhang*, Dongming Wu*, Hao Shi, Yingfei Liu, Tiancai Wang, Haoqiang Fan, Xingping Dong✉

CVPR 2026 Findings | Paper | Code

  • DEGround is an embodied perception framework for 3D grounding, achieving 1st place on EmbodiedScan.
Technical Report 2025
sym

Dexbotic: Open-Source Vision-Language-Action Toolbox

Dexbotic Team

Technical Report 2025 | Paper | Code | Homepage | Huggingface

  • Dexbotic is an open-source VLA codebase (similar to MMDetection). It unifies multiple mainstream VLA frameworks and benchmarks, provides strong pretrained models, and has garnered nearly 1000 GitHub stars.
Under Review 2025
sym

GeoVLA: Enpowering 3D Representations in Vision-Language-Action Models

Under Review 2025 | Paper | Code | Homepage

Lin Sun*, Bin Xie*, Yingfei Liu, Hao Shi, Tiancai Wang, Jiale Cao✉

  • GeoVLA is a framework that bridges 2D semantics and 3D geometry for VLA, it achieves robustness across diverse camera views, object heights, and sizes.
ICLR 2025
sym

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding

Henry Zheng*, Hao Shi*, Qihang Peng, Yong Xien Chng, Rui Huang, Yepeng Weng, Zhongchao Shi, Gao Huang✉

*: equal contribution, ✉: corresponding author.

ICLR 2025 | Paper | Code

  • DenseGrounding is an embodied perception framework for multi-view 3D visual grounding, and it achieved first place on EmbodiedScan.
CVPRW 2024 (Oral)
sym

DenseG: Alleviating Vision-Language Feature Sparsity in Multi-View 3D Visual Grounding

Henry Zheng*, Hao Shi*, Yong Xien Chng, Rui Huang, Zanlin Ni, Tianyi Tan, Qihang Peng, Yepeng Weng, Zhongchao Shi, Gao Huang✉

*: equal contribution, ✉: corresponding author.

CVPR 2024 Workshop (Oral) | Paper | Code

  • 1st Place and Innovation Award in CVPR 2024 Autonomous Grand Challenge, Embodied 3D Grounding Track (1/64 teams, 1/154 submissions).
NeurIPS 2023
sym

Open Compound Domain Adaptation with Object Style Compensation for Semantic Segmentation

Tingliang Feng*, Hao Shi*, Xueyang Liu, Wei Feng, Liang Wan, Yanlin Zhou, Di Lin✉

*: equal contribution, ✉: corresponding author.

NeurIPS 2023 | Paper | Code

  • We propose a memory-bank-based object-style compensation method for open compound domain adaptation.

🎖 Honors and Awards

  • 2026.01, Tsinghua Deng Feng Fund, Tsinghua University. (¥15000)
  • 2025.11, Minghong Scholarship, Comprehensive Excellence 1st Prize, Tsinghua University. (Top 10% in THU, ¥10000)
  • 2024.11, Philobiblion Scholarship, Comprehensive Excellence 1st Prize, Tsinghua University. (Top 10% in THU, ¥10000)
  • 2024.06, 1st Place and Innovation Award in CVPR 2024 Autonomous Grand Challenge, Embodied 3D Grounding Track. (1/154 submission, $9000)
  • 2023.11, CXMT Scholarship, Comprehensive Excellence 1st Prize, Tsinghua University. (Top 10% in THU, ¥10000)
  • 2023.06, Outstanding Bachelor’s Thesis Award, Tianjin University.
  • 2023.06, Excellent Graduate Award, Tianjin University.
  • 2021.12, Ministry of Education-Huawei Intelligent Base Future Stars.
  • 2021.12, Huawei Intelligent Base Scholarship.

💻 Internship

Dexmal
2025.03 – present
Dexmal, Beijing
Department: Embodied Foundation Research Group
Mentors: Tiancai Wang, Yingfei Liu and Bin Xie
MEGVII
2024.08 – 2025.02
MEGVII, Beijing
Department: Foundation Model Group
Mentors: Tiancai Wang and Yingfei Liu

💬 Invited Talks

🎓 Service

Reviewer / PC Member:

  • ICLR 2026, ICLR 2025
  • ICML 2026
  • CVPR 2026
  • ICCV 2025
  • AAAI 2026
  • IROS 2026