Xiang Yue

49 papers · 2020–2026 · 9 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🗺️ Taxonomy Completionist (12) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🐣 Hot Topic Early Bird

🌍 Conference Polyglot (9) 🏃 Academic Marathon (5) 🌉 Interdisciplinary Bridge 🔬 Deep Specialist (12) 👑 Triple Crown 🧬 Topic Evolution 🤝 Dynamic Duo (14) 👥 Mega-Team (28) 🗃️ Keyword Collector (165) ❓ The Questioner (3) ⚡ Prolific Year (13) 💎 Century Club (47) 🔥 Unstoppable (6) 📈 Trend Setter

Conferences

ACL (19) EMNLP (7) ICLR (7) ICML (4) NAACL (4) NIPS (4) CVPR (2) IJCAI (1) IJCNLP (1)

Top co-authors

Huan Sun (14) Graham Neubig (12) Wenhu Chen (12) Ge Zhang (9) Bo Li (6) Tianyu Zheng (6) Wenhao Huang (5) Bill Yuchen Lin (5) Yu Su (5) Yuansheng Ni (5)

Research topics

Reasoning (1) Privacy (1)

Keywords

large language model (13) benchmark evaluation (8) question answering (5) instruction tuning (5) vision-language model (5) multimodal large language model (3) multimodal reasoning (3) synthetic datum (3) chain-of-thought reasoning (3) data augmentation (3) reasoning benchmark (3) multimodal understanding (2) mathematical reasoning (2) knowledge distillation (2) visual question answering (2) multimodal learning (2) code generation (2) model evaluation (2) text generation (2) visual reasoning (2)

Papers

Temporal Sampling for Forgotten Reasoning in LLMs ACL 2026 Video-MMMU: Evaluating Knowledge Acquisition from Multidisciplinary Professional Videos ACL 2026 VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search EMNLP 2025 AgentDiagnose: An Open Toolkit for Diagnosing LLM Agent Trajectories EMNLP 2025 VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation EMNLP 2025 Harnessing Webpage UIs for Text-Rich Visual Understanding ICLR 2025 KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks ICLR 2025 Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages ICLR 2025 MuPT: A Generative Symbolic Music Pretrained Transformer ICLR 2025 MixEval-X: Any-to-any Evaluations from Real-world Data Mixture ICLR 2025 SimulBench: Evaluating Language Models with Creative Simulation Tasks NAACL 2025 ESPnet-SpeechLM: An Open Speech Language Model Toolkit NAACL 2025 JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation NAACL 2025 Evaluating Language Models as Synthetic Data Generators ACL 2025 MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale ACL 2025 MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark ACL 2025 Synthetic Data in the Era of Large Language Models ACL 2025 LIME: Less Is More for MLLM Evaluation ACL 2025 Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA ACL 2025 Small Models Struggle to Learn from Strong Reasoners ACL 2025 Evaluating Vision-Language Models as Evaluators in Path Planning CVPR 2025 Demystifying Long Chain-of-Thought Reasoning ICML 2025 Overtrained Language Models Are Harder to Fine-Tune ICML 2025 Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning ICML 2025 MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks ICLR 2025 MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning ICLR 2024 Grokking of Implicit Reasoning in Transformers: A Mechanistic Journey to the Edge of Generalization NIPS 2024 MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark NIPS 2024 MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures NIPS 2024 Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents ACL 2024 Machine Unlearning of Pre-trained Large Language Models ACL 2024 VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation ACL 2024 OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement ACL 2024 AttributionBench: How Hard is Automatic Attribution Evaluation? ACL 2024 MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI CVPR 2024 MAmmoTH2: Scaling Instructions from the Web NIPS 2024 Data Engineering for Scaling Language Models to 128K Context ICML 2024 TableLlama: Towards Open Large Generalist Models for Tables NAACL 2024 Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe ACL 2023 Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate EMNLP 2023 Automatic Evaluation of Attribution by Large Language Models EMNLP 2023 Synthetic Question Value Estimation for Domain Adaptation of Question Answering ACL 2022 C-MORE: Pretraining to Answer Open-Domain Questions by Consulting Millions of References ACL 2022 Differential Privacy for Text Analytics via Natural Text Sanitization ACL 2021 COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval EMNLP 2021 Differential Privacy for Text Analytics via Natural Text Sanitization IJCNLP 2021 Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset ACL 2020 Towards Making the Most of Context in Neural Machine Translation IJCAI 2020 PHICON: Improving Generalization of Clinical Text De-identification Models via Data Augmentation EMNLP 2020