Haoxuan You
24 papers · 2019–2025 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+12 more ↓ Show less ↑
π Academic Marathon (6) π Conference Polyglot (10) π§ Keyword Pioneer π Interdisciplinary Bridge π Cross-Pollinator (12)
π
Cross-Pollinator
(12)
π
Renaissance Researcher
(8)
πΊοΈ
Taxonomy Completionist
(54)
π₯
Mega-Team
(23)
π€
Dynamic Duo
(12)
β
The Questioner
β‘
Prolific Year
(7)
π
Conference Pioneer
π
Century Club
(24)
π
Trend Setter
ποΈ
Keyword Collector
(103)
π₯
Unstoppable
(7)
Conferences
ICLR (5)
AAAI (4)
EMNLP (4)
ACL (2)
CVPR (2)
ECCV (2)
NIPS (2)
ICCV (1)
IJCAI (1)
NAACL (1)
Top co-authors
Keywords
visual question answering
(5)
multimodal learning
(3)
vision-language model
(3)
visual commonsense
(2)
point cloud
(2)
adversarial training
(2)
visual commonsense reasoning
(2)
vision language model
(2)
commonsense reasoning
(2)
domain generalization
(1)
commonsense knowledge
(1)
image generation
(1)
image captioning
(1)
representation learning
(1)
question answering
(1)
geometric deep learning
(1)
benchmark evaluation
(1)
visual reasoning
(1)
hypergraph learning
(1)
3d vision
(1)
Papers
MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA
ICLR 2025
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
ICLR 2025
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
CVPR 2025
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
ICLR 2024
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
NIPS 2024
Ferret: Refer and Ground Anything Anywhere at Any Granularity
ICLR 2024
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
EMNLP 2023
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
ACL 2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
EMNLP 2023
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
AAAI 2022
Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework
ICLR 2022
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
ECCV 2022
Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
EMNLP 2022
Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
EMNLP 2022
Bridging the Gap between Recognition-level Pre-training and Commonsensical Vision-language Tasks
ACL 2022
Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions
NAACL 2021
Learning Visual Commonsense for Robust Scene Graph Generation
ECCV 2020
Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering
CVPR 2019
PVRNet: Point-View Relation Neural Network for 3D Shape Recognition
AAAI 2019
MeshNet: Mesh Neural Network for 3D Shape Representation
AAAI 2019
Hypergraph Neural Networks
AAAI 2019
Decoding EEG by Visual-guided Deep Neural Networks
IJCAI 2019
Multi-Modality Latent Interaction Network for Visual Question Answering
ICCV 2019
PointDAN: A Multi-Scale 3D Domain Adaption Network for Point Cloud Representation
NIPS 2019