Papers
Reward-Weighted Sampling: Enhancing Non-Autoregressive Characteristics in Masked Diffusion LLMs
Daehoon Gwak, Minseo Jung, Junwoo Park et al.
AI Argues Differently: Distinct Argumentative and Linguistic Patterns of LLMs in Persuasive Contexts
Esra Dönmez, Maximilian Maurer, Gabriella Lapesa et al.
The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs
Denis Janiak, Jakub Binkowski, Albert Sawczyn et al.
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification
Boyang Zhang, Yicong Tan, Yun Shen et al.
Trojsten Benchmark: Evaluating LLM Problem-Solving in Slovak STEM Competition Problems
Adam Zahradník, Marek Suppa
A Simple Yet Effective Method for Non-Refusing Context Relevant Fine-grained Safety Steering in LLMs
Shaona Ghosh, Amrita Bhattacharjee, Yftah Ziser et al.
so much depends / upon / a whitespace: Why Whitespace Matters for Poets and LLMs
Sriharsh Bhyravajjula, Melanie Walsh, Anna Preus et al.
Certified Mitigation of Worst-Case LLM Copyright Infringement
Jingyu Zhang, Jiacan Yu, Marc Marone et al.
CourtReasoner: Can LLM Agents Reason Like Judges?
Sophia Simeng Han, Yoshiki Takashima, Shannon Zejiang Shen et al.
Retracing the Past: LLMs Emit Training Data When They Get Lost
Myeongseob Ko, Nikhil Reddy Billa, Adam Nguyen et al.
Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Fine-tuning
Junjie Xing, Yeye He, Mengyu Zhou et al.
Mind the Blind Spots: A Focus-Level Evaluation Framework for LLM Reviews
Hyungyu Shin, Jingyu Tang, Yoonjoo Lee et al.
A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs
Artem Shelmanov, Ekaterina Fadeeva, Akim Tsvigun et al.
AgentDiagnose: An Open Toolkit for Diagnosing LLM Agent Trajectories
Tianyue Ou, Wanyao Guo, Apurva Gandhi et al.
MedTutor: A Retrieval-Augmented LLM System for Case-Based Medical Education
Dongsuk Jang, Ziyao Shangguan, Kyle Tegtmeyer et al.
LLM×MapReduce-V3: Enabling Interactive In-Depth Survey Generation through a MCP-Driven Hierarchically Modular Agent System
Yu Chao, Siyu Lin, Xiaorong Wang et al.
TruthTorchLM: A Comprehensive Library for Predicting Truthfulness in LLM Outputs
Duygu Nur Yaldiz, Yavuz Faruk Bakman, Sungmin Kang et al.
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents
Ziyang Miao, Qiyu Sun, Jingyuan Wang et al.
SAGE: A Generic Framework for LLM Safety Evaluation
Madhur Jindal, Hari Shrawgi, Parag Agrawal et al.
CRAB: A Benchmark for Evaluating Curation of Retrieval-Augmented LLMs in Biomedicine
Hanmeng Zhong, Linqing Chen, Wentao Wu et al.
Aligning LLMs for Multilingual Consistency in Enterprise Applications
Amit Agarwal, Hansa Meghwani, Hitesh Laxmichand Patel et al.
Mirror in the Model: Ad Banner Image Generation via Reflective Multi-LLM and Multi-modal Agents
Zhao Wang, Bowen Chen, Yotaro Shimose et al.
ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues?
Haoxin Wang, Xianhan Peng, Huang Cheng et al.
ProCut: LLM Prompt Compression via Attribution Estimation
Zhentao Xu, Fengyi Li, Albert C. Chen et al.
Select-then-Route : Taxonomy guided Routing for LLMs
Soham Shah, Kumar Shridhar