Shizhe Chen
26 papers · 2019–2025 · 13 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+12 more ↓ Show less ↑
π Interdisciplinary Bridge π Renaissance Researcher (7) π Academic Marathon (6) π Conference Polyglot (13) πΊοΈ Taxonomy Completionist (46)
πΊοΈ
Taxonomy Completionist
(46)
π§
Keyword Pioneer
π£
Hot Topic Early Bird
π€
Dynamic Duo
(12)
π§¬
Topic Evolution
π
Keyword Champion
π
Conference Pioneer
π
Century Club
(26)
π₯
Unstoppable
(7)
ποΈ
Keyword Collector
(127)
π
Trend Setter
β‘
Prolific Year
(5)
Conferences
CVPR (8)
ICCV (4)
CORL (2)
ECCV (2)
NIPS (2)
AAAI (1)
ACL (1)
COLING (1)
EMNLP (1)
ICLR (1)
IJCAI (1)
IJCNLP (1)
INTERSPEECH (1)
Top co-authors
Keywords
video understanding
(4)
robotic manipulation
(3)
point cloud
(3)
video captioning
(3)
instructional video
(2)
scene understanding
(2)
multimodal learning
(2)
multimodal transformer
(2)
vision-and-language navigation
(2)
visual question answering
(2)
event detection
(2)
paragraph generation
(2)
3d reconstruction
(2)
embodied agent
(2)
graph neural network
(2)
multilingual nlp
(1)
machine translation
(1)
self-supervised learning
(1)
vision-language navigation
(1)
zero-shot learning
(1)
Papers
NextBestPath: Efficient 3D Mapping of Unseen Environments
ICLR 2025
HORT: Monocular Hand-held Objects Reconstruction with Transformers
ICCV 2025
MuKA: Multimodal Knowledge Augmented Visual Information-Seeking
COLING 2025
SUGAR: Pre-training 3D Visual Representations for Robotics
CVPR 2024
PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation
CORL 2023
InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation
ACL 2023
Explore and Tell: Embodied Visual Captioning in 3D Environments
ICCV 2023
gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction
CVPR 2023
VRDFormer: End-to-End Video Visual Relation Detection With Transformers
CVPR 2022
Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
NIPS 2022
Instruction-driven history-aware policies for robotic manipulations
CORL 2022
Think Global, Act Local: Dual-Scale Graph Transformer for Vision-and-Language Navigation
CVPR 2022
Learning from Unlabeled 3D Environments for Vision-and-Language Navigation
ECCV 2022
Few-Shot Action Recognition with Hierarchical Matching and Contrastive Learning
ECCV 2022
History Aware Multimodal Transformer for Vision-and-Language Navigation
NIPS 2021
Sketch, Ground, and Refine: Top-Down Dense Video Captioning
CVPR 2021
Towards Diverse Paragraph Captioning for Untrimmed Videos
CVPR 2021
Elaborative Rehearsal for Zero-Shot Action Recognition
ICCV 2021
Airbert: In-Domain Pretraining for Vision-and-Language Navigation
ICCV 2021
Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning
CVPR 2020
Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs
CVPR 2020
From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots
IJCAI 2019
YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension
EMNLP 2019
YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension
IJCNLP 2019
Unsupervised Bilingual Lexicon Induction from Mono-Lingual Multimodal Data
AAAI 2019
Speech Emotion Recognition in Dyadic Dialogues with Attentive Interaction Modeling
INTERSPEECH 2019