Robotic manipulation requires precise spatial understanding to interact with objects in the real world. Point-based methods suffer from sparse sampling, leading to the loss of finegrained semantics. Image-based methods typically feed RGB and depth into 2D backbones pre-trained on 3D auxiliary tasks, but their entangled semantics and geometry are sensitive to inherent depth noise in real-world that disrupts semantic understanding. Moreover, these methods focus on high level geometry while overlooking low-level spatial cues essential for precise interaction. We propose SpatialActor, a disentangled framework for robust robotic manipulation that explicitly decouples semantics and geometry. The Semantic-guided Geometric Module adaptively fuses two complementary geometry from noisy depth and semantic-guided expert priors. Also, a Spatial Transformer leverages low-level spatial cues for accurate 2D-3D mapping and enables interaction among spatial features. We evaluate SpatialActor on multiple simulation and real-world scenarios across 50+ tasks. It achieves state-of-the-art performance with 87.4% on RLBench and improves by 13.9% to 19.4% under varying noisy conditions, showing strong robustness. Moreover, it significantly enhances few-shot generalization to new tasks and maintains robustness under various spatial perturbations.
RLBench Demos
Close Jar
Insert Onto Square Peg
Light Bulb In
Meat Off Grill
Open Drawer
Place Cups
Place Shape In Shape Sorter
Place Wine At Rack Location
Push Buttons
Put Groceries In Cupboard
Put Item In Drawer
Put Money In Safe
Reach And Drag
Slide Block To Color Target
Stack Blocks
Stack Cups
Sweep To Dustpan Of Size
Turn Tap
Insert Ring Onto Cone
Pick Glue To Box
Place Carrot To Box
Push Button
Slide Block
Stack Block
Stack Cup
Wipe Table
@inproceedings{shi2026spatialactor,
title={Spatialactor: Exploring disentangled spatial representations for robust robotic manipulation},
author={Shi, Hao and Xie, Bin and Liu, Yingfei and Yue, Yang and Wang, Tiancai and Fan, Haoqiang and Zhang, Xiangyu and Huang, Gao},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={40},
number={11},
pages={8969--8977},
year={2026}
}