Hexiang Hu

36 papers · 2016–2025 · 10 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🌈 Renaissance Researcher (7) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (9) 🌍 Conference Polyglot (10) 🗺️ Taxonomy Completionist (68)

🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (10) 🏃 Academic Marathon (9) 🌟 Keyword Trendsetter Combo (3) 👥 Mega-Team (43) 🔬 Deep Specialist (12) 🧬 Topic Evolution 👑 Triple Crown 🤝 Dynamic Duo (12) ❓ The Questioner (2) 🚀 Conference Pioneer ⚡ Prolific Year (5) 🔥 Unstoppable (10) 🗃️ Keyword Collector (152) 💎 Century Club (36) 📈 Trend Setter

Conferences

CVPR (12) EMNLP (5) NIPS (5) ICCV (3) ICLR (3) ECCV (2) ICML (2) NAACL (2) ACL (1) COLING (1)

Top co-authors

Fei Sha (12) Wenhu Chen (7) Ming-Wei Chang (7) Wei-Lun Chao (6) Yandong Li (6) Soravit Changpinyo (6) Kenton Lee (5) Boqing Gong (5) Bowen Zhang (4) Peter Shaw (4)

Keywords

multimodal learning (7) visual question answering (6) transfer learning (4) object detection (3) imitation learning (3) few-shot learning (3) multi-modal learning (3) vision language model (2) instruction following (2) retrieval-augmented generation (2) instance segmentation (2) embedding learning (2) domain adaptation (2) diffusion model (2) image generation (2) reinforcement learning (2) in-context learning (2) image captioning (2) visual grounding (2) long-tailed distribution (2)

Papers

LOFT: Scalable and More Realistic Long-Context Evaluation NAACL 2025 Scaling Inference Time Compute for Diffusion Models CVPR 2025 OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities ICLR 2025 MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks ICLR 2025 On Scaling Up a Multilingual Vision and Language Model CVPR 2024 UniIR: Training and Benchmarking Universal Multimodal Information Retrievers ECCV 2024 MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions ICML 2024 Instruct-Imagen: Image Generation with Multi-modal Instruction CVPR 2024 Subject-driven Text-to-Image Generation via Apprenticeship Learning NIPS 2023 Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? EMNLP 2023 Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities ICCV 2023 PreSTU: Pre-Training for Scene-Text Understanding ICCV 2023 Re-Imagen: Retrieval-Augmented Text-to-Image Generator ICLR 2023 Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding ICML 2023 From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces NIPS 2023 MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text EMNLP 2022 On Model Calibration for Long-Tailed Object Detection and Instance Segmentation NIPS 2021 Systematic Generalization on gSCAN: What is Nearly Solved and What is Next? EMNLP 2021 Visually Grounded Concept Composition EMNLP 2021 MosaicOS: A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection ICCV 2021 Learning the Best Pooling Strategy for Visual Semantic Embedding CVPR 2021 Few-Shot Learning via Embedding Adaptation With Set-to-Set Functions CVPR 2020 BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps ACL 2020 Learning to Represent Image and Text with Denotation Graph EMNLP 2020 Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation NIPS 2019 Engaging Image Captioning via Personality CVPR 2019 Multi-Task Learning for Sequence Tagging: An Empirical Study COLING 2018 Synthesized Policies for Transfer and Adaptation across Tasks and Environments NIPS 2018 Cross-Modal and Hierarchical Modeling of Video and Text ECCV 2018 Learning Answer Embeddings for Visual Question Answering CVPR 2018 Compressed Video Action Recognition CVPR 2018 Cross-Dataset Adaptation for Visual Question Answering CVPR 2018 Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets NAACL 2018 FastMask: Segment Multi-Scale Object Candidates in One Shot CVPR 2017 Learning Structured Inference Neural Networks With Label Relations CVPR 2016 Structure Inference Machines: Recurrent Neural Networks for Analyzing Relations in Group Activity Recognition CVPR 2016