My name is Hao Shi (石昊), an incoming Ph.D. student at HKU MMLab, where I will be advised by Prof. Ping Luo starting in Fall 2026.

Previously, I received my Master’s degree from Tsinghua University, where I was a member of Tsinghua Leap Lab and was advised by Prof. Gao Huang and Dr. Xiangyu Zhang.

My research focuses on Embodied AI, Robot Learning, VLA, and World Model, aiming to build foundation models for general robotic systems.

📝 Research

Under Review 2026
sym

MemoryVLA++: Temporal Modeling via Memory and Imagination in Vision-Language-Action Models

Hao Shi, Weiye Li, Bin Xie, Yulin Wang, Renping Zhou, Tiancai Wang, Xiangyu Zhang, Ping Luo, Gao Huang✉

Under Review 2026 | Paper | Code | Homepage | Huggingface

  • MemoryVLA++ is the extended journal version of MemoryVLA, advancing it from past-only memory modeling to full temporal modeling with both past memory and future imagination.
Under Review 2026
sym

RMBench: Memory-Dependent Robotic Manipulation Benchmark with Insights into Policy Design

Tianxing Chen*, Yuran Wang*, Mingleyang Li*, Yan Qin*, Hao Shi, Zixuan Li, Yifan Hu, Yingsheng Zhang, Kaixuan Wang, Yue Chen, Hongcheng Wang, Renjing Xu, Ruihai Wu, Yao Mu, Yaodong Yang, Hao Dong✉, Ping Luo✉

Under Review 2026 | Paper | Code | Homepage | Huggingface

  • RMBench is a memory-oriented benchmark built on the RoboTwin, and it also provides a memory-enhanced hierarchical VLA model, Mem-0.
ICLR 2026
sym

MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation

Hao Shi, Bin Xie, Yingfei Liu, Lin Sun, Fengrong Liu, Tiancai Wang, Erjin Zhou, Haoqiang Fan, Xiangyu Zhang, Gao Huang✉

ICLR 2026 | CVPR 2026 Workshop Oral | Paper | Code | Homepage | Huggingface

  • MemoryVLA is among the early works to explore memory in VLA models, introducing a hippocampus-inspired memory to capture temporal dependencies. It has since been cited over 100 times, including by Physical Intelligence.
AAAI 2026 (Oral)
sym

SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulation

Hao Shi, Bin Xie, Yingfei Liu, Yang Yue, Tiancai Wang, Haoqiang Fan, Xiangyu Zhang, Gao Huang✉

AAAI 2026 Oral (Accept rate≈4%) | Paper | Code | Homepage | Huggingface

  • SpatialActor is a disentangled spatial representations framework for robust robotic manipulation.
IROS 2026
sym

GeoVLA: Enpowering 3D Representations in Vision-Language-Action Models

Lin Sun*, Bin Xie*, Yingfei Liu, Hao Shi, Tiancai Wang, Jiale Cao✉

IROS 2026 | Paper | Code | Homepage

  • GeoVLA is a unified VLA framework that bridges 2D semantics and 3D geometry.
CVPR 2026 Findings
sym

DEGround: An Effective Baseline for Ego-centric 3D Visual Grounding with a Homogeneous Framework

Yani Zhang*, Dongming Wu*, Hao Shi, Yingfei Liu, Tiancai Wang, Haoqiang Fan, Xingping Dong✉

CVPR 2026 Findings | Paper | Code

  • DEGround is an embodied perception framework for 3D grounding, achieving 1st place on EmbodiedScan.
Technical Report 2025
sym

Dexbotic: Open-Source Vision-Language-Action Toolbox

Dexbotic Team

Technical Report 2025 | Paper | Code | Homepage | Huggingface

  • Dexbotic is an open-source VLA codebase, similar to MMDetection, that unifies mainstream VLA frameworks and benchmarks, provides strong pretrained models, and has received 1200+ GitHub stars.
ICLR 2025
sym

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding

Henry Zheng*, Hao Shi*, Qihang Peng, Yong Xien Chng, Rui Huang, Yepeng Weng, Zhongchao Shi, Gao Huang✉

*: equal contribution, ✉: corresponding author.

ICLR 2025 | CVPR 2024 Workshop Oral | Paper | Code

  • DenseGrounding is an embodied perception framework for multi-view 3D visual grounding, which won the 1st Place and Innovation Award in CVPR 2024 Autonomous Grand Challenge ($9000).
NeurIPS 2023
sym

Open Compound Domain Adaptation with Object Style Compensation for Semantic Segmentation

Tingliang Feng*, Hao Shi*, Xueyang Liu, Wei Feng, Liang Wan, Yanlin Zhou, Di Lin✉

*: equal contribution, ✉: corresponding author.

NeurIPS 2023 | Paper | Code

  • We propose a memory-bank-based object-style compensation method for open compound domain adaptation.

🎖 Honors and Awards

  • 2026.06, Tsinghua University Outstanding Master’s Thesis Award. (Top 5% in THU)
  • 2026.06, Beijing Outstanding Graduate Award. (Only 1 Master in Dept. Automation, THU)
  • 2026.06, 3rd Prize in CVPR 2026 ManiSkill-ViTac Challenge.
  • 2026.05, ICML Gold Reviewer Award.
  • 2026.01, Tsinghua University Deng Feng Fund. (¥15000)
  • 2025.11, Minghong Scholarship, Tsinghua University Comprehensive Excellence 1st Prize. (Top 10% in THU, ¥10000)
  • 2024.11, Philobiblion Scholarship, Tsinghua University Comprehensive Excellence 1st Prize. (Top 10% in THU, ¥10000)
  • 2024.06, 1st Place and Innovation Award in CVPR 2024 Autonomous Grand Challenge, Embodied 3D Grounding Track. (1/154 submission, $9000)
  • 2023.11, CXMT Scholarship, Tsinghua University Comprehensive Excellence 1st Prize. (Top 10% in THU, ¥10000)
  • 2023.06, Tianjin University Outstanding Bachelor’s Thesis Award.
  • 2021.12, Huawei Intelligent Base Scholarship, Ministry of Education-Huawei Intelligent Base Future Star.

📖 Education

The University of Hong Kong
2026.09 – 2030.06 (expected)
Incoming Ph.D. @ MMLAB, HKU, Hong Kong.
Advisor: Prof. Ping Luo
Tsinghua University
2023.09 – 2026.06
M.Eng. @ LeapLab, Tsinghua University, Beijing.
Advisors: Prof. Gao Huang and Dr. Xiangyu Zhang
Tianjin University
2020.06 – 2023.06
B.Eng. in Computer Science, Tianjin University.
Academic advisor: Prof. Di Lin
Tianjin University
2019.09 – 2020.06
B.Eng. student in Materials Science, Tianjin University.

💻 Internship

Dexmal
2025.03 – present
Dexmal, Embodied Foundation Algorithm Group, Beijing
Mentors: Tiancai Wang, Yingfei Liu and Bin Xie
MEGVII
2024.08 – 2025.02
MEGVII, Foundation Model Group, Beijing
Mentors: Tiancai Wang and Yingfei Liu

💬 Invited Talks

🎓 Service

Reviewer / PC Member:

  • ICML, ICLR, NeurIPS, CVPR, ICCV, AAAI, IROS, TMLR