Xiang Yue
49 papers · 2020–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (12) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π£ Hot Topic Early Bird
π
Conference Polyglot
(9)
π
Academic Marathon
(5)
π
Interdisciplinary Bridge
π¬
Deep Specialist
(12)
π
Triple Crown
π§¬
Topic Evolution
π€
Dynamic Duo
(14)
π₯
Mega-Team
(28)
ποΈ
Keyword Collector
(165)
β
The Questioner
(3)
β‘
Prolific Year
(13)
π
Century Club
(47)
π₯
Unstoppable
(6)
π
Trend Setter
Conferences
ACL (19)
EMNLP (7)
ICLR (7)
ICML (4)
NAACL (4)
NIPS (4)
CVPR (2)
IJCAI (1)
IJCNLP (1)
Top co-authors
Research topics
Keywords
large language model
(13)
benchmark evaluation
(8)
question answering
(5)
instruction tuning
(5)
vision-language model
(5)
multimodal large language model
(3)
multimodal reasoning
(3)
synthetic datum
(3)
chain-of-thought reasoning
(3)
data augmentation
(3)
reasoning benchmark
(3)
multimodal understanding
(2)
mathematical reasoning
(2)
knowledge distillation
(2)
visual question answering
(2)
multimodal learning
(2)
code generation
(2)
model evaluation
(2)
text generation
(2)
visual reasoning
(2)
Papers
Temporal Sampling for Forgotten Reasoning in LLMs
ACL 2026
Video-MMMU: Evaluating Knowledge Acquisition from Multidisciplinary Professional Videos
ACL 2026
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
EMNLP 2025
AgentDiagnose: An Open Toolkit for Diagnosing LLM Agent Trajectories
EMNLP 2025
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
EMNLP 2025
Harnessing Webpage UIs for Text-Rich Visual Understanding
ICLR 2025
KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks
ICLR 2025
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
ICLR 2025
MuPT: A Generative Symbolic Music Pretrained Transformer
ICLR 2025
MixEval-X: Any-to-any Evaluations from Real-world Data Mixture
ICLR 2025
SimulBench: Evaluating Language Models with Creative Simulation Tasks
NAACL 2025
ESPnet-SpeechLM: An Open Speech Language Model Toolkit
NAACL 2025
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation
NAACL 2025
Evaluating Language Models as Synthetic Data Generators
ACL 2025
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
ACL 2025
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
ACL 2025
Synthetic Data in the Era of Large Language Models
ACL 2025
LIME: Less Is More for MLLM Evaluation
ACL 2025
Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA
ACL 2025
Small Models Struggle to Learn from Strong Reasoners
ACL 2025
Evaluating Vision-Language Models as Evaluators in Path Planning
CVPR 2025
Demystifying Long Chain-of-Thought Reasoning
ICML 2025
Overtrained Language Models Are Harder to Fine-Tune
ICML 2025
Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning
ICML 2025
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
ICLR 2025
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
ICLR 2024
Grokking of Implicit Reasoning in Transformers: A Mechanistic Journey to the Edge of Generalization
NIPS 2024
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
NIPS 2024
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
NIPS 2024
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents
ACL 2024
Machine Unlearning of Pre-trained Large Language Models
ACL 2024
VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation
ACL 2024
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
ACL 2024
AttributionBench: How Hard is Automatic Attribution Evaluation?
ACL 2024
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
CVPR 2024
MAmmoTH2: Scaling Instructions from the Web
NIPS 2024
Data Engineering for Scaling Language Models to 128K Context
ICML 2024
TableLlama: Towards Open Large Generalist Models for Tables
NAACL 2024
Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe
ACL 2023
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate
EMNLP 2023
Automatic Evaluation of Attribution by Large Language Models
EMNLP 2023
Synthetic Question Value Estimation for Domain Adaptation of Question Answering
ACL 2022
C-MORE: Pretraining to Answer Open-Domain Questions by Consulting Millions of References
ACL 2022
Differential Privacy for Text Analytics via Natural Text Sanitization
ACL 2021
COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval
EMNLP 2021
Differential Privacy for Text Analytics via Natural Text Sanitization
IJCNLP 2021
Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset
ACL 2020
Towards Making the Most of Context in Neural Machine Translation
IJCAI 2020
PHICON: Improving Generalization of Clinical Text De-identification Models via Data Augmentation
EMNLP 2020