Research Explorer

Benchmarking LLMs on Semantic Overlap Summarization

John Salvador, Naman Bansal, Mousumi Akter et al.

2025 EMNLP

ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection

Jeonghye Kim, Sojeong Rhee, Minbeom Kim et al.

2025 EMNLP

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

Shudong Liu, Hongwei Liu, Junnan Liu et al.

2025 EMNLP

A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making

Xiao Wu, Ting-Zhu Huang, Liang-Jian Deng et al.

2025 EMNLP

NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls

Kinjal Basu, Ibrahim Abdelaziz, Kiran Kate et al.

2025 EMNLP

DIWALI - Diversity and Inclusivity aWare cuLture specific Items for India: Dataset and Assessment of LLMs for Cultural Text Adaptation in Indian Context

Pramit Sahoo, Maharaj Brahma, Maunendra Sankar Desarkar

2025 EMNLP

seqBench: A Tunable Benchmark to Quantify Sequential Reasoning Limits of LLMs

Mohammad Ramezanali, Mo Vazifeh, Paolo Santi

2025 EMNLP

SATBench: Benchmarking LLMs’ Logical Reasoning via Automated Puzzle Generation from SAT Formulas

Anjiang Wei, Yuheng Wu, Yingjia Wan et al.

2025 EMNLP

Personalized LLM Decoding via Contrasting Personal Preference

Hyungjune Bu, ChanJoo Jung, Minjae Kang et al.

2025 EMNLP

MPCG: Multi-Round Persona-Conditioned Generation for Modeling the Evolution of Misinformation with LLMs

Chong Jun Rong Brian, Yixuan Tang, Anthony Kum Hoe Tung

2025 EMNLP

Multi-LMentry: Can Multilingual LLMs Solve Elementary Tasks Across Languages?

Luca Moroni, Javier Aula-Blasco, Simone Conia et al.

2025 EMNLP

EduAdapt: A Question Answer Benchmark Dataset for Evaluating Grade-Level Adaptability in LLMs

Numaan Naeem, Abdellah El Mekki, Muhammad Abdul-Mageed

2025 EMNLP

NitiBench: Benchmarking LLM Frameworks on Thai Legal Question Answering Capabilities

Pawitsapak Akarajaradwong, Pirat Pothavorn, Chompakorn Chaksangchaichot et al.

2025 EMNLP

Conflicting Needles in a Haystack: How LLMs behave when faced with contradictory information

Murathan Kurfali, Robert Östling

2025 EMNLP

Reward-Weighted Sampling: Enhancing Non-Autoregressive Characteristics in Masked Diffusion LLMs

Daehoon Gwak, Minseo Jung, Junwoo Park et al.

2025 EMNLP

AI Argues Differently: Distinct Argumentative and Linguistic Patterns of LLMs in Persuasive Contexts

Esra Dönmez, Maximilian Maurer, Gabriella Lapesa et al.

2025 EMNLP

The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs

Denis Janiak, Jakub Binkowski, Albert Sawczyn et al.

2025 EMNLP

Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification

Boyang Zhang, Yicong Tan, Yun Shen et al.

2025 EMNLP

Trojsten Benchmark: Evaluating LLM Problem-Solving in Slovak STEM Competition Problems

Adam Zahradník, Marek Suppa

2025 EMNLP

A Simple Yet Effective Method for Non-Refusing Context Relevant Fine-grained Safety Steering in LLMs

Shaona Ghosh, Amrita Bhattacharjee, Yftah Ziser et al.

2025 EMNLP

so much depends / upon / a whitespace: Why Whitespace Matters for Poets and LLMs

Sriharsh Bhyravajjula, Melanie Walsh, Anna Preus et al.

2025 EMNLP

Certified Mitigation of Worst-Case LLM Copyright Infringement

Jingyu Zhang, Jiacan Yu, Marc Marone et al.

2025 EMNLP

CourtReasoner: Can LLM Agents Reason Like Judges?

Sophia Simeng Han, Yoshiki Takashima, Shannon Zejiang Shen et al.

2025 EMNLP

Retracing the Past: LLMs Emit Training Data When They Get Lost

Myeongseob Ko, Nikhil Reddy Billa, Adam Nguyen et al.

2025 EMNLP

Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Fine-tuning

Junjie Xing, Yeye He, Mengyu Zhou et al.

2025 EMNLP

Papers