Haoxuan You

24 papers · 2019–2025 · 10 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🏃 Academic Marathon (6) 🌍 Conference Polyglot (10) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (12)

🐝 Cross-Pollinator (12) 🌈 Renaissance Researcher (8) 🗺️ Taxonomy Completionist (54) 👥 Mega-Team (23) 🤝 Dynamic Duo (12) ❓ The Questioner ⚡ Prolific Year (7) 🚀 Conference Pioneer 💎 Century Club (24) 📈 Trend Setter 🗃️ Keyword Collector (103) 🔥 Unstoppable (7)

Conferences

ICLR (5) AAAI (4) EMNLP (4) ACL (2) CVPR (2) ECCV (2) NIPS (2) ICCV (1) IJCAI (1) NAACL (1)

Top co-authors

Shih-fu Chang (12) Zhecan Wang (11) Kai-Wei Chang (9) Rui Sun (4) Yinfei Yang (3) Yue Gao (3) Yifan Feng (3) Can Qin (3) Wenhao Li (3) Zhe Gan (3)

Keywords

visual question answering (5) multimodal learning (3) vision-language model (3) visual commonsense (2) point cloud (2) adversarial training (2) visual commonsense reasoning (2) vision language model (2) commonsense reasoning (2) domain generalization (1) commonsense knowledge (1) image generation (1) image captioning (1) representation learning (1) question answering (1) geometric deep learning (1) benchmark evaluation (1) visual reasoning (1) hypergraph learning (1) 3d vision (1)

Papers

MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA ICLR 2025 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning ICLR 2025 DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models CVPR 2025 CoBIT: A Contrastive Bi-directional Image-Text Generation Model ICLR 2024 JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images NIPS 2024 Ferret: Refer and Ground Anything Anywhere at Any Granularity ICLR 2024 Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond EMNLP 2023 UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding ACL 2023 IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models EMNLP 2023 SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning AAAI 2022 Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework ICLR 2022 Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training ECCV 2022 Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense EMNLP 2022 Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding EMNLP 2022 Bridging the Gap between Recognition-level Pre-training and Commonsensical Vision-language Tasks ACL 2022 Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions NAACL 2021 Learning Visual Commonsense for Robust Scene Graph Generation ECCV 2020 Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering CVPR 2019 PVRNet: Point-View Relation Neural Network for 3D Shape Recognition AAAI 2019 MeshNet: Mesh Neural Network for 3D Shape Representation AAAI 2019 Hypergraph Neural Networks AAAI 2019 Decoding EEG by Visual-guided Deep Neural Networks IJCAI 2019 Multi-Modality Latent Interaction Network for Visual Question Answering ICCV 2019 PointDAN: A Multi-Scale 3D Domain Adaption Network for Point Cloud Representation NIPS 2019