Peihao Chen

20 papers · 2019–2026 · 8 conferences · across top CS/AI conferences

Achievements

+10 more ↓

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (6) 🌈 Renaissance Researcher (7) 🌍 Conference Polyglot (8) 🗺️ Taxonomy Completionist (39)

🗺️ Taxonomy Completionist (39) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🤝 Dynamic Duo (17) 🏆 Grand Slam 🧬 Topic Evolution ⚡ Prolific Year (5) 🗃️ Keyword Collector (88) 💎 Century Club (19) 🔥 Unstoppable (7)

Conferences

CVPR (6) NIPS (4) AAAI (3) ECCV (2) ICCV (2) ICLR (1) ICML (1) IJCAI (1)

Top co-authors

Chuang Gan (17) Mingkui Tan (11) Yining Hong (5) Junyan Li (4) Runhao Zeng (4) Kunyang Lin (4) Yilun Du (3) Deng Huang (3) Xinyu Sun (3) Thomas Li (3)

Keywords

large language model (4) self-supervised learning (3) embodied ai (2) action recognition (2) video representation learning (2) vision-language model (2) spatial reasoning (2) video representation (2) curriculum learning (1) trajectory prediction (1) vision-language navigation (1) 3d vision (1) multi-modal learning (1) video understanding (1) zero-shot learning (1) cross-modal learning (1) natural language understanding (1) audio-visual learning (1) temporal modeling (1) object tracking (1)

Papers

NaVLA$^2$: A Vision-Language-Audio-Action Model for Multimodal Instruction Navigation AAAI 2026 LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences CVPR 2025 3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning CVPR 2025 Enhancing User-Oriented Proactivity in Open-Domain Dialogues with Critic Guidance IJCAI 2025 MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World CVPR 2024 CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding ICLR 2024 FlexAttention for Efficient High-Resolution Vision-Language Models ECCV 2024 3D-VLA: A 3D Vision-Language-Action Generative World Model ICML 2024 RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation CVPR 2024 Learning Vision-and-Language Navigation from YouTube Videos ICCV 2023 FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation NIPS 2023 Masked Motion Encoding for Self-Supervised Video Representation Learning CVPR 2023 3D-LLM: Injecting the 3D World into Large Language Models NIPS 2023 Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation NIPS 2022 Learning Active Camera for Multi-Object Navigation NIPS 2022 RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning AAAI 2021 Foley Music: Learning to Generate Music from Videos ECCV 2020 Dense Regression Network for Video Grounding CVPR 2020 Location-Aware Graph Convolutional Networks for Video Question Answering AAAI 2020 Self-Supervised Moving Vehicle Tracking With Stereo Sound ICCV 2019