Wenhu Chen
90 papers · 2018–2026 · 15 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+17 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (13) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π£ Hot Topic Early Bird
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(13)
π
Cross-Pollinator
(14)
π
Conference Loyalist
(23)
π€
Dynamic Duo
(29)
π
Triple Crown
π
Keyword Champion
(2)
π
Grand Slam
π₯
Mega-Team
(32)
π¬
Deep Specialist
(21)
π§¬
Topic Evolution
π
Trend Setter
β
The Questioner
ποΈ
Keyword Collector
(368)
π
Century Club
(89)
β‘
Prolific Year
(11)
π₯
Unstoppable
(8)
Conferences
ACL (24)
EMNLP (17)
ICLR (12)
NIPS (11)
NAACL (6)
CVPR (5)
IJCNLP (3)
AAAI (2)
EACL (2)
ICML (2)
WACV (2)
AACL (1)
ECCV (1)
ICCV (1)
INTERSPEECH (1)
Top co-authors
Research topics
Keywords
large language model
(16)
multimodal learning
(11)
question answering
(11)
vision-language model
(9)
few-shot learning
(8)
benchmark evaluation
(6)
in-context learning
(5)
video generation
(5)
visual reasoning
(4)
video understanding
(4)
instruction tuning
(4)
multimodal reasoning
(4)
question-answer pair
(4)
pre-trained language model
(4)
zero-shot learning
(4)
retrieval-augmented generation
(4)
text generation
(3)
reinforcement learning
(3)
knowledge base
(3)
data augmentation
(3)
Papers
BrowseComp-Plus: A Fair and Disentangled Evaluation Benchmark for Deep Search Agents
ACL 2026
VISA: Retrieval Augmented Generation with Visual Source Attribution
ACL 2025
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
ACL 2025
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
ACL 2025
ACECODER: Acing Coder RL via Automated Test-Case Synthesis
ACL 2025
TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding
ACL 2025
UniRAG: Universal Retrieval Augmentation for Large Vision Language Models
NAACL 2025
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
ICLR 2025
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
ICLR 2025
T2V-Turbo-v2: Enhancing Video Model Post-Training through Data, Reward, and Conditional Guidance Design
ICLR 2025
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
ICLR 2025
Harnessing Webpage UIs for Text-Rich Visual Understanding
ICLR 2025
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
ICCV 2025
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
EMNLP 2025
Unleashing the Reasoning Potential of LLMs by Critique Fine-Tuning on One Problem
EMNLP 2025
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
EMNLP 2025
TC-Bench: Benchmarking Temporal Compositionality in Conditional Video Generation
ACL 2025
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
CVPR 2025
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
ICLR 2024
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
NIPS 2024
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback
NIPS 2024
GenAI Arena: An Open Evaluation Platform for Generative Models
NIPS 2024
MAmmoTH2: Scaling Instructions from the Web
NIPS 2024
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
NIPS 2024
VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation
ACL 2024
E2-LLM: Efficient and Extreme Length Extension of Large Language Models
ACL 2024
ChatMusician: Understanding and Generating Music Intrinsically with LLM
ACL 2024
Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models
ACL 2024
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
ACL 2024
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
ACL 2024
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
CVPR 2024
Instruct-Imagen: Image Generation with Multi-modal Instruction
CVPR 2024
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
ECCV 2024
VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
EMNLP 2024
Unifying Multimodal Retrieval via Document Screenshot Embedding
EMNLP 2024
Augmenting Black-box LLMs with Medical Textbooks for Biomedical Question Answering
EMNLP 2024
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
ICLR 2024
ImagenHub: Standardizing the evaluation of conditional image generation models
ICLR 2024
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
ICLR 2024
Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation
ICML 2024
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
ICML 2024
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response
NAACL 2024
Synthesizing Coherent Story With Auto-Regressive Latent Diffusion Models
WACV 2024
QA Is the New KR: Question-Answer Pairs as Knowledge Bases
AAAI 2023
Re-Imagen: Retrieval-Augmented Text-to-Image Generator
ICLR 2023
Attacking Open-domain Question Answering by Injecting Misinformation
AACL 2023
Large Language Models are few(1)-shot Table Reasoners
EACL 2023
Few-shot In-context Learning on Knowledge Base Question Answering
ACL 2023
Subject-driven Text-to-Image Generation via Apprenticeship Learning
NIPS 2023
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
NIPS 2023
MARBLE: Music Audio Representation Benchmark for Universal Evaluation
NIPS 2023
Attacking Open-domain Question Answering by Injecting Misinformation
IJCNLP 2023
Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering
EACL 2023
EDIS: Entity-Driven Image Search over Multimodal Web Content
EMNLP 2023
TheoremQA: A Theorem-driven Question Answering Dataset
EMNLP 2023
On the Risk of Misinformation Pollution with Large Language Models
EMNLP 2023
DePlot: One-shot visual language reasoning by plot-to-table translation
ACL 2023
HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data
ACL 2022
MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text
EMNLP 2022
Controllable Dialogue Simulation with In-context Learning
EMNLP 2022
Counterfactual Maximum Likelihood Estimation for Training Deep Networks
NIPS 2021
Zero-shot Fact Verification by Claim Generation
IJCNLP 2021
Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding
EMNLP 2021
FinQA: A Dataset of Numerical Reasoning over Financial Data
EMNLP 2021
Open Question Answering over Tables and Text
ICLR 2021
A Systematic Investigation of KB-Text Embedding Alignment at Scale
ACL 2021
Zero-shot Fact Verification by Claim Generation
ACL 2021
Meta Module Network for Compositional Visual Reasoning
WACV 2021
Unsupervised Multi-hop Question Answering by Question Generation
NAACL 2021
Local Explanation of Dialogue Response Generation
NIPS 2021
A Systematic Investigation of KB-Text Embedding Alignment at Scale
IJCNLP 2021
Few-Shot NLG with Pre-Trained Language Model
ACL 2020
Violin: A Large-Scale Dataset for Video-and-Language Inference
CVPR 2020
TabFact: A Large-scale Dataset for Table-based Fact Verification
ICLR 2020
Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs
AAAI 2020
Logic2Text: High-Fidelity Natural Language Generation from Logical Forms
EMNLP 2020
HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data
EMNLP 2020
KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation
EMNLP 2020
Logical Natural Language Generation from Open-Domain Tables
ACL 2020
Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention
ACL 2019
How Large a Vocabulary Does Text Classification Need? A Variational Approach to Vocabulary Selection
NAACL 2019
Interpreting and Improving Deep Neural SLU Models via Vocabulary Importance
INTERSPEECH 2019
Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting
NIPS 2019
Global Textual Relation Embedding for Relational Understanding
ACL 2019
XL-NBT: A Cross-lingual Neural Belief Tracking Framework
EMNLP 2018
Triangular Architecture for Rare Language Translation
ACL 2018
Variational Knowledge Graph Reasoning
NAACL 2018
Generative Bridging Network for Neural Sequence Prediction
NAACL 2018
Video Captioning via Hierarchical Reinforcement Learning
CVPR 2018
No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling
ACL 2018