Shizhe Chen

26 papers · 2019–2025 · 13 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (7) 🏃 Academic Marathon (6) 🌍 Conference Polyglot (13) 🗺️ Taxonomy Completionist (46)

🗺️ Taxonomy Completionist (46) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🤝 Dynamic Duo (12) 🧬 Topic Evolution 🏆 Keyword Champion 🚀 Conference Pioneer 💎 Century Club (26) 🔥 Unstoppable (7) 🗃️ Keyword Collector (127) 📈 Trend Setter ⚡ Prolific Year (5)

Conferences

CVPR (8) ICCV (4) CORL (2) ECCV (2) NIPS (2) AAAI (1) ACL (1) COLING (1) EMNLP (1) ICLR (1) IJCAI (1) IJCNLP (1) INTERSPEECH (1)

Top co-authors

Qin Jin (12) Cordelia Schmid (10) Ivan Laptev (9) Pierre-Louis Guhur (6) Makarand Tapaswi (5) Qi Wu (3) Anwen Hu (2) Liang Zhang (2) Yongcheng Wang (2) Zerui Chen (2)

Keywords

video understanding (4) robotic manipulation (3) point cloud (3) video captioning (3) instructional video (2) scene understanding (2) multimodal learning (2) multimodal transformer (2) vision-and-language navigation (2) visual question answering (2) event detection (2) paragraph generation (2) 3d reconstruction (2) embodied agent (2) graph neural network (2) multilingual nlp (1) machine translation (1) self-supervised learning (1) vision-language navigation (1) zero-shot learning (1)

Papers

NextBestPath: Efficient 3D Mapping of Unseen Environments ICLR 2025 HORT: Monocular Hand-held Objects Reconstruction with Transformers ICCV 2025 MuKA: Multimodal Knowledge Augmented Visual Information-Seeking COLING 2025 SUGAR: Pre-training 3D Visual Representations for Robotics CVPR 2024 PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation CORL 2023 InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation ACL 2023 Explore and Tell: Embodied Visual Captioning in 3D Environments ICCV 2023 gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction CVPR 2023 VRDFormer: End-to-End Video Visual Relation Detection With Transformers CVPR 2022 Language Conditioned Spatial Relation Reasoning for 3D Object Grounding NIPS 2022 Instruction-driven history-aware policies for robotic manipulations CORL 2022 Think Global, Act Local: Dual-Scale Graph Transformer for Vision-and-Language Navigation CVPR 2022 Learning from Unlabeled 3D Environments for Vision-and-Language Navigation ECCV 2022 Few-Shot Action Recognition with Hierarchical Matching and Contrastive Learning ECCV 2022 History Aware Multimodal Transformer for Vision-and-Language Navigation NIPS 2021 Sketch, Ground, and Refine: Top-Down Dense Video Captioning CVPR 2021 Towards Diverse Paragraph Captioning for Untrimmed Videos CVPR 2021 Elaborative Rehearsal for Zero-Shot Action Recognition ICCV 2021 Airbert: In-Domain Pretraining for Vision-and-Language Navigation ICCV 2021 Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning CVPR 2020 Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs CVPR 2020 From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots IJCAI 2019 YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension EMNLP 2019 YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension IJCNLP 2019 Unsupervised Bilingual Lexicon Induction from Mono-Lingual Multimodal Data AAAI 2019 Speech Emotion Recognition in Dyadic Dialogues with Attentive Interaction Modeling INTERSPEECH 2019