|
Qihang Peng | 彭启航
I'm a senior undergraduate student at Xingjian College, Tsinghua University, and an incoming Ph.D. student at MMLab, CUHK. My research interests lie in Physical World Models and Embodied Foundation Models.
My research objective is to build an embodied system that can independently complete the understanding, planning, and world modeling of diverse scenes, and apply it to fields such as Robotics and Autonomous Driving.
Now I'm working closely with Prof. Gao Huang and Prof. Hongsheng Li. If you are also interested in this domain, please feel free to contact me.
CV  / 
Google Scholar  / 
Github  / 
Twitter
WeChat: qihang_peng Email: pengqihang22@gmail.com
|
|
News
-
2026-02: One paper is accepted by CVPR 2026.
-
2025-06: Awarded by Sensetime Scholarship. 30 undergraduate students nationwide.
-
2025-02: One paper is accepted by CVPR 2025. My first article as the first author!
-
2025-01: One paper is accepted by ICLR 2025.
-
2024-10: Awarded by National Scholarship. Highest honor for undergraduates in China.
-
2024-09: Supported by Beijing Natural Science Foundation Undergraduate Research Program.
-
2024-06: Outstanding Championship and Innovation Award in the Track on Multi-View 3D Visual Grounding of the AGC at CVPR 2024.
-
2023-10: Awarded by Wang Dazhong Scholarship, Tsinghua University. 1 student per major.
|
|
Selected Publications
*Equal contribution
|
|
ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving
Qihang Peng,
Xuesong Chen,
Chenye Yang,
Shaoshuai Shi,
Hongsheng Li
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
[arXiv]
[Code]
[Project Page]
ColaVLA moves VLM reasoning into a compact latent space and decodes multi-scale causal trajectories in one pass. State-of-the-art in both open-loop and closed-loop settings with favorable efficiency and robustness on the nuScenes benchmark.
|
|
ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding
Qihang Peng,
Henry Zheng,
Gao Huang
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
[arXiv]
[Code]
[Project Page]
Make full use of multimodal information in ego-centric 3D visual grounding for point enhancement. State-of-the-art on the EmbodiedScan benchmark.
|
|
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding
Henry Zheng*,
Shi Hao*,
Qihang Peng,
et al.
International Conference on Learning Representations (ICLR), 2025
[arXiv]
[ICLR 2025]
[AGC 2024]
Use LLM and Ground Truth to enhance semantic details in prompt to reduce the ambiguity during training. Extract individual view semantics and enrich visual representation with global scene-level semantic.
|
|
Tsinghua University
B.Eng. in Mechanics & Vehicle Engineering
Sep. 2022 - Jun. 2026 (Expected)
Rank 1st in major with National Scholarship
|
|
Qwen Team, Alibaba Group
Research Intern, working on VLAs for navigation and manipulation.
Apr. 2026 - Present
|
|
Voyager Research, Didi AutoDriving
Research Intern, working on VLAs for autonomous driving.
Jul. 2025 - Mar. 2026
Advised by Dr. Shaoshuai Shi
|
|
LeapLab, Tsinghua University
Research Assistant, working on 3D visual grounding.
Feb. 2024 - May 2025
Advised by Prof. Gao Huang
|
|
Honors and Awards
Sensetime Scholarship, 30 undergraduate students nationwide (2025)
National Scholarship, Highest honor for undergraduates in China (2024)
Outstanding Championship & Innovation Award, 3D Visual Grounding Track, Autonomous Grand Challenge at CVPR 2024
Beijing Natural Science Foundation Undergraduate Research Program (2024)
Wang Dazhong Scholarship, Tsinghua University, 1 student per major (2023)
|
|