Qihang Peng

Qihang Peng | 彭启航

I'm a senior undergraduate student at Xingjian College, Tsinghua University, and an incoming Ph.D. student at MMLab, CUHK.

My recent research mainly focuses on native multimodal pretraining and spatial intelligence. I also work with collaborators on visual generation, embodied intelligence, and LLM/VLM reasoning.

Now I'm working closely with Prof. Gao Huang and Prof. Hongsheng Li. If you are also interested in my research, please feel free to contact me.

CV / Google Scholar / Github / Twitter

WeChat: qihang_peng Email: pengqihang22@gmail.com

News

2026-06: Released three technical reports of the Qwen-Robot series.
2026-02: One paper is accepted by CVPR 2026.
2025-06: Awarded by Sensetime Scholarship. 30 undergraduate students nationwide.
2025-02: One paper is accepted by CVPR 2025. My first article as the first author!
2025-01: One paper is accepted by ICLR 2025.
2024-10: Awarded by National Scholarship. Highest honor for undergraduates in China.
2024-09: Supported by Beijing Natural Science Foundation Undergraduate Research Program.
2024-06: Outstanding Championship and Innovation Award in the Track on Multi-View 3D Visual Grounding of the AGC at CVPR 2024.
2023-10: Awarded by Wang Dazhong Scholarship, Tsinghua University. 1 student per major.

Selected Publications

*Equal contribution

	Qwen-RobotManip Technical Report: Alignment Unlocks Scale for Robotic Manipulation Foundation Models Qwen Team (*Qihang Peng* as *Core Contributor) Tech Report*, 2026 [arXiv] [Blog] A generalist robotic manipulation foundation model that aligns representations, motions, and behaviors across 15 robot platforms. Trained on ~38,100 hours of pre-training data. Ranks first on RoboChallenge and surpasses π0.5 across 6 OOD benchmarks.
	Qwen-RobotNav Technical Report: A Scalable Navigation Model Designed for an Agentic Navigation System Qwen Team (*Qihang Peng* as *Core Contributor* & *Co-first Author) Tech Report*, 2026 [arXiv] [Blog] A unified navigation model built on Qwen3-VL spanning 5 task families with 15.6M training samples. Achieves SOTA on VLN-CE RxR, HM3Dv2, EVT-Bench, and NAVSIM with real-world deployment on Unitree Go2 quadruped robot.
	Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation Qwen Team (*Qihang Peng* as Contributor) Tech Report, 2026 [arXiv] [Blog] A language-conditioned video world model with dual-stream MMDiT and MLLM action encoding, trained on 8.6M video-text pairs across 20+ robot morphologies. Ranks #1 on EWMBench, DreamGen Bench, and leads open-source models on WorldModelBench and PBench.
	ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving *Qihang Peng, Xuesong Chen, Chenye Yang, Shaoshuai Shi, Hongsheng Li IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2026 [arXiv] [Code] [Project Page] ColaVLA moves VLM reasoning into a compact latent space and decodes multi-scale causal trajectories in one pass. State-of-the-art in both open-loop and closed-loop settings with favorable efficiency and robustness on the nuScenes benchmark.
	ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding *Qihang Peng, Henry Zheng, Gao Huang IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2025 [arXiv] [Code] [Project Page] Make full use of multimodal information in ego-centric 3D visual grounding for point enhancement. State-of-the-art on the EmbodiedScan benchmark.
	DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding Henry Zheng, Shi Hao, *Qihang Peng, et al. International Conference on Learning Representations (ICLR)*, 2025 [arXiv] [ICLR 2025] [AGC 2024] Use LLM and Ground Truth to enhance semantic details in prompt to reduce the ambiguity during training. Extract individual view semantics and enrich visual representation with global scene-level semantic.

Education

Tsinghua University
B.Eng. in Mechanics & Vehicle Engineering
Sep. 2022 - Jun. 2026 (Expected)
Rank 1st in major with National Scholarship

Experience

	Qwen Team, Alibaba Group Research Intern, working on Embodied Intelligence and VL Pretraining. Apr. 2026 - Present Advised by Dr. Xiong-Hui Chen and Shuai Bai
	Voyager Research, Didi AutoDriving Research Intern, working on VLAs for autonomous driving. Jul. 2025 - Mar. 2026 Advised by Dr. Shaoshuai Shi
	LeapLab, Tsinghua University Research Assistant, working on 3D visual grounding. Feb. 2024 - May 2025 Advised by Prof. Gao Huang

Honors and Awards

Sensetime Scholarship, 30 undergraduate students nationwide (2025)

National Scholarship, Highest honor for undergraduates in China (2024)

Outstanding Championship & Innovation Award, 3D Visual Grounding Track, Autonomous Grand Challenge at CVPR 2024

Beijing Natural Science Foundation Undergraduate Research Program (2024)

Wang Dazhong Scholarship, Tsinghua University, 1 student per major (2023)

Teaching

Volunteer Lecturer, Advanced Algebra, Xingjian College, Tsinghua University, 2024
[Midterm Lecture] [Final Lecture]

Miscellaneous

🏀 I really like playing basketball. My favorite basketball player is Kevin Durant.

🎮 I often play games in my leisure time, such as Sekiro, Elden Ring, Black Myth: Wukong.

Website Template