Research Explorer

Mind the Blind Spots: A Focus-Level Evaluation Framework for LLM Reviews

Hyungyu Shin, Jingyu Tang, Yoonjoo Lee et al.

2025 EMNLP

A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs

Artem Shelmanov, Ekaterina Fadeeva, Akim Tsvigun et al.

2025 EMNLP

AgentDiagnose: An Open Toolkit for Diagnosing LLM Agent Trajectories

Tianyue Ou, Wanyao Guo, Apurva Gandhi et al.

2025 EMNLP

MedTutor: A Retrieval-Augmented LLM System for Case-Based Medical Education

Dongsuk Jang, Ziyao Shangguan, Kyle Tegtmeyer et al.

2025 EMNLP

LLM×MapReduce-V3: Enabling Interactive In-Depth Survey Generation through a MCP-Driven Hierarchically Modular Agent System

Yu Chao, Siyu Lin, Xiaorong Wang et al.

2025 EMNLP

TruthTorchLM: A Comprehensive Library for Predicting Truthfulness in LLM Outputs

Duygu Nur Yaldiz, Yavuz Faruk Bakman, Sungmin Kang et al.

2025 EMNLP

Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Ziyang Miao, Qiyu Sun, Jingyuan Wang et al.

2025 EMNLP

SAGE: A Generic Framework for LLM Safety Evaluation

Madhur Jindal, Hari Shrawgi, Parag Agrawal et al.

2025 EMNLP

CRAB: A Benchmark for Evaluating Curation of Retrieval-Augmented LLMs in Biomedicine

Hanmeng Zhong, Linqing Chen, Wentao Wu et al.

2025 EMNLP

Aligning LLMs for Multilingual Consistency in Enterprise Applications

Amit Agarwal, Hansa Meghwani, Hitesh Laxmichand Patel et al.

2025 EMNLP

Mirror in the Model: Ad Banner Image Generation via Reflective Multi-LLM and Multi-modal Agents

Zhao Wang, Bowen Chen, Yotaro Shimose et al.

2025 EMNLP

ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues?

Haoxin Wang, Xianhan Peng, Huang Cheng et al.

2025 EMNLP

ProCut: LLM Prompt Compression via Attribution Estimation

Zhentao Xu, Fengyi Li, Albert C. Chen et al.

2025 EMNLP

Select-then-Route : Taxonomy guided Routing for LLMs

Soham Shah, Kumar Shridhar

2025 EMNLP

AutoCVSS: Assessing the Performance of LLMs for Automated Software Vulnerability Scoring

Davide Sanvito, Giovanni Arriciati, Giuseppe Siracusano et al.

2025 EMNLP

Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt Generation for Enhanced LLM Content Moderation

Daniel Schwartz, Dmitriy Bespalov, Zhe Wang et al.

2025 EMNLP

Memory-Efficient Backpropagation for Fine-Tuning LLMs on Resource-Constrained Mobile Devices

Congzheng Song, Xinyu Tang

2025 EMNLP

Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards

Manveer Singh Tamber, Forrest Sheng Bao, Chenyu Xu et al.

2025 EMNLP

Group Preference Alignment: Customizing LLM Responses from In-Situ Conversations Only When Needed

Ishani Mondal, Jack W. Stokes, Sujay Kumar Jauhar et al.

2025 EMNLP

Can LLMs Narrate Tabular Data? An Evaluation Framework for Natural Language Representations of Text-to-SQL System Outputs

Jyotika Singh, Weiyi Sun, Amit Agarwal et al.

2025 EMNLP

Auto prompting without training labels: An LLM cascade for product quality assessment in e-commerce catalogs

Soham Satyadharma, Fatemeh Sheikholeslami, Swati Kaul et al.

2025 EMNLP

LLMs on a Budget? Say HOLA

Zohaib Hasan Siddiqui, Jiechao Gao, Ebad Shabbir et al.

2025 EMNLP

Learning from LLM Agents: In-Context Generative Models for Text Casing in E-Commerce Ads

Yingxue Zhou, Tan Zhu, Tao Zeng et al.

2025 EMNLP

AutoQual: An LLM Agent for Automated Discovery of Interpretable Features for Review Quality Assessment

Xiaochong Lan, Jie Feng, Yinxing Liu et al.

2025 EMNLP

JSON Whisperer: Efficient JSON Editing with LLMs

Sarel Duanis, Asnat Greenstein-Messica, Eliya Habba

2025 EMNLP

Papers