Let the robot make a hamburger from scratch! Incredible progress on long-horizon dexterous manipulation. Vision alone often falls short—this work smartly integrates high-res tactile sensing via cross-modal learning. ViTacFormer not only anticipates contact, but also enables robust imitation learning with anthropomorphic hands. A milestone for multi-modal robotic control.
Haoran Geng
Haoran Geng8.7. klo 23.15
🤖 What if a humanoid robot could make a hamburger from raw ingredients—all the way to your plate? 🔥 Excited to announce ViTacFormer: our new pipeline for next-level dexterous manipulation with active vision + high-resolution touch. 🎯 For the first time ever, we demonstrate ~2.5 minutes of continuous, autonomous control—combining active vision, high-res touch, and high-DoF robot hands SharpaWave — to complete complex, real-world tasks. Code is fully released; check out our: Homepage: Paper link: Github:
3,65K