Ronghang Hu
20 papers · 2014–2025 · 6 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+12 more ↓ Show less ↑
🌍 Conference Polyglot (6) 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🏃 Academic Marathon (11)
🧭
Keyword Pioneer
🐣
Hot Topic Early Bird
🏃
Academic Marathon
(11)
🌟
Keyword Trendsetter Combo
(3)
🤝
Dynamic Duo
(12)
🧬
Topic Evolution
📈
Trend Setter
🚀
Conference Pioneer
💎
Century Club
(20)
❓
The Questioner
🗃️
Keyword Collector
(84)
🔥
Unstoppable
(10)
Conferences
CVPR (7)
ICCV (6)
ECCV (3)
NIPS (2)
ACL (1)
ICLR (1)
Top co-authors
Keywords
multimodal learning
(4)
transformer architecture
(3)
object detection
(3)
visual grounding
(2)
image classification
(2)
vision-language navigation
(2)
semantic segmentation
(2)
vision language model
(2)
multi-modal learning
(2)
visual question answering
(2)
vision-and-language navigation
(2)
transfer learning
(2)
convolutional neural network
(2)
cross-modal learning
(1)
data augmentation
(1)
self-supervised learning
(1)
multi-task learning
(1)
instruction following
(1)
instance segmentation
(1)
object localization
(1)
Papers
SAM 2: Segment Anything in Images and Videos
ICLR 2025
ConvNeXt V2: Co-Designing and Scaling ConvNets With Masked Autoencoders
CVPR 2023
Scaling Language-Image Pre-Training via Masking
CVPR 2023
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding
ICCV 2023
FLAVA: A Foundational Language and Vision Alignment Model
CVPR 2022
Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis From a Single Image
ICCV 2021
UniT: Multimodal Multitask Learning With a Unified Transformer
ICCV 2021
Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA
CVPR 2020
TextCaps: a Dataset for Image Captioning with Reading Comprehension
ECCV 2020
Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation
ACL 2019
Language-Conditioned Graph Networks for Relational Reasoning
ICCV 2019
Grounding Visual Explanations
ECCV 2018
Speaker-Follower Models for Vision-and-Language Navigation
NIPS 2018
Learning to Segment Every Thing
CVPR 2018
Explainable Neural Computation via Stack Neural Module Networks
ECCV 2018
Learning to Reason: End-To-End Module Networks for Visual Question Answering
ICCV 2017
Modeling Relationships in Referential Expressions With Compositional Modular Networks
CVPR 2017
Natural Language Object Retrieval
CVPR 2016
Spatial Semantic Regularisation for Large Scale Object Detection
ICCV 2015
LSDA: Large Scale Detection through Adaptation
NIPS 2014