Ronghang Hu

20 papers · 2014–2025 · 6 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🌍 Conference Polyglot (6) 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🏃 Academic Marathon (11)

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏃 Academic Marathon (11) 🌟 Keyword Trendsetter Combo (3) 🤝 Dynamic Duo (12) 🧬 Topic Evolution 📈 Trend Setter 🚀 Conference Pioneer 💎 Century Club (20) ❓ The Questioner 🗃️ Keyword Collector (84) 🔥 Unstoppable (10)

Conferences

CVPR (7) ICCV (6) ECCV (3) NIPS (2) ACL (1) ICLR (1)

Top co-authors

Trevor Darrell (12) Kate Saenko (9) Marcus Rohrbach (7) Jacob Andreas (4) Amanpreet Singh (4) Anna Rohrbach (3) Ross Girshick (3) Christoph Feichtenhofer (2) Kaiming He (2) Xinlei Chen (2)

Keywords

multimodal learning (4) transformer architecture (3) object detection (3) visual grounding (2) image classification (2) vision-language navigation (2) semantic segmentation (2) vision language model (2) multi-modal learning (2) visual question answering (2) vision-and-language navigation (2) transfer learning (2) convolutional neural network (2) cross-modal learning (1) data augmentation (1) self-supervised learning (1) multi-task learning (1) instruction following (1) instance segmentation (1) object localization (1)

Papers

SAM 2: Segment Anything in Images and Videos ICLR 2025 ConvNeXt V2: Co-Designing and Scaling ConvNets With Masked Autoencoders CVPR 2023 Scaling Language-Image Pre-Training via Masking CVPR 2023 UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding ICCV 2023 FLAVA: A Foundational Language and Vision Alignment Model CVPR 2022 Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis From a Single Image ICCV 2021 UniT: Multimodal Multitask Learning With a Unified Transformer ICCV 2021 Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA CVPR 2020 TextCaps: a Dataset for Image Captioning with Reading Comprehension ECCV 2020 Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation ACL 2019 Language-Conditioned Graph Networks for Relational Reasoning ICCV 2019 Grounding Visual Explanations ECCV 2018 Speaker-Follower Models for Vision-and-Language Navigation NIPS 2018 Learning to Segment Every Thing CVPR 2018 Explainable Neural Computation via Stack Neural Module Networks ECCV 2018 Learning to Reason: End-To-End Module Networks for Visual Question Answering ICCV 2017 Modeling Relationships in Referential Expressions With Compositional Modular Networks CVPR 2017 Natural Language Object Retrieval CVPR 2016 Spatial Semantic Regularisation for Large Scale Object Detection ICCV 2015 LSDA: Large Scale Detection through Adaptation NIPS 2014