conftrace_

Yilun Zhao

68 papers · 2022–2026 · 7 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓
+12 more ↓ 🌍 Conference Polyglot (7) 🐝 Cross-Pollinator (15) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌈 Renaissance Researcher (10)
🌈 Renaissance Researcher (10) 🐣 Hot Topic Early Bird 🏠 Conference Loyalist (26) 👥 Mega-Team (35) 🤝 Dynamic Duo (41) 🔬 Deep Specialist (18) 🏆 Keyword Champion (4) 🗃️ Keyword Collector (220) Prolific Year (8) The Questioner (6) 🔥 Unstoppable (5) 💎 Century Club (55)

Conferences

ACL (28) EMNLP (26) NAACL (7) EACL (3) ICLR (2) CVPR (1) NIPS (1)

Papers

MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application ACL 2026 SciMDR: Advancing Scientific Multimodal Document Reasoning ACL 2026 A Survey of Multimodal Mathematical Reasoning: From Perception, Alignment to Reasoning ACL 2026 A Survey of Reasoning-Intensive Retrieval: Progress and Challenges ACL 2026 Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing ACL 2026 Patient-Similarity Cohort Reasoning in Clinical Text-to-SQL EACL 2026 SciRAG: Adaptive, Citation-Aware, and Outline-Guided Retrieval and Synthesis for Scientific Literature EACL 2026 Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems ACL 2026 TexOCR: Advancing Document OCR Models for Compilable Page-to-LaTeX Reconstruction ACL 2026 MMSciCode: Real-world Evaluation of Multilingual Multi-Discipline Scientific Research Coding ACL 2026 Can AI Be a Good Peer Reviewer? A Survey of Peer Review Process, Evaluation, and the Future ACL 2026 Experience Retrieval-Augmentation with Electronic Health Records Enables Accurate Discharge QA ACL 2026 Anchor: Branch-Point Data Generation for GUI Agents ACL 2026 MMVU: Measuring Expert-Level Multi-Discipline Video Understanding CVPR 2025 IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval NAACL 2025 Are Multimodal LLMs Robust Against Adversarial Perturbations? RoMMath: A Systematic Evaluation on Multimodal Math Reasoning NAACL 2025 ReIFE: Re-evaluating Instruction-Following Evaluation NAACL 2025 SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification ACL 2025 AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research ACL 2025 Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers ACL 2025 VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos ACL 2025 Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure ACL 2025 SportReason: Evaluating Retrieval-Augmented Reasoning across Tables and Text for Sports Question Answering EMNLP 2025 Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective EMNLP 2025 Table-R1: Inference-Time Scaling for Table Reasoning Tasks EMNLP 2025 FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain EMNLP 2025 LimRank: Less is More for Reasoning-Intensive Information Reranking EMNLP 2025 SciSketch: An Open-source Framework for Automated Schematic Diagram Generation in Scientific Papers EMNLP 2025 Z1: Efficient Test-time Scaling with Code EMNLP 2025 Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers EMNLP 2025 MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search EMNLP 2025 FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering EMNLP 2025 Judging with Many Minds: Do More Perspectives Mean Less Prejudice? On Bias Amplification and Resistance in Multi-Agent Based LLM-as-Judge EMNLP 2025 TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models ICLR 2025 ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning ICLR 2025 Physics: Benchmarking Foundation Models on University-Level Physics Problem Solving ACL 2025 HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task ACL 2025 Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers ACL 2025 P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains EMNLP 2024 TaPERA: Enhancing Faithfulness and Interpretability in Long-Form Table QA by Content Planning and Execution-based Reasoning ACL 2024 FinanceMATH: Knowledge-Intensive Math Reasoning in Finance Domains ACL 2024 DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents ACL 2024 MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning ACL 2024 Unveiling the Spectrum of Data Contamination in Language Model: A Survey from Detection to Remediation ACL 2024 Revisiting Automated Evaluation for Long-form Table Question Answering EMNLP 2024 FinDVer: Explainable Claim Verification over Long and Hybrid-content Financial Documents EMNLP 2024 FOLIO: Natural Language Reasoning with First-Order Logic EMNLP 2024 TAIL: A Toolkit for Automatic and Realistic Long-Context Large Language Model Evaluation EMNLP 2024 OpenT2T: An Open-Source Toolkit for Table-to-Text Generation EMNLP 2024 MIMIR: A Customizable Agent Tuning Platform for Enhanced Scientific Applications EMNLP 2024 OMG-QA: Building Open-Domain Multi-Modal Generative Question Answering Systems EMNLP 2024 M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models EMNLP 2024 Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in LLMs NIPS 2024 Investigating Data Contamination in Modern Benchmarks for Large Language Models NAACL 2024 Struc-Bench: Are Large Language Models Good at Generating Complex Structured Tabular Data? NAACL 2024 Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization NAACL 2024 On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering NAACL 2024 RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations ACL 2023 Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation ACL 2023 Enhancing Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies EMNLP 2023 QTSumm: Query-Focused Summarization over Tabular Data EMNLP 2023 Investigating Table-to-Text Generation Capabilities of Large Language Models in Real-World Information Seeking Scenarios EMNLP 2023 Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation EMNLP 2023 LoFT: Enhancing Faithfulness and Diversity for Table-to-Text Generation via Logic Form Control EACL 2023 OpenRT: An Open-source Framework for Reasoning Over Tabular Data ACL 2023 MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data ACL 2022 ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples EMNLP 2022 R2D2: Robust Data-to-Text with Replacement Detection EMNLP 2022