Papers
Mind the Blind Spots: A Focus-Level Evaluation Framework for LLM Reviews
Hyungyu Shin, Jingyu Tang, Yoonjoo Lee et al.
A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs
Artem Shelmanov, Ekaterina Fadeeva, Akim Tsvigun et al.
AgentDiagnose: An Open Toolkit for Diagnosing LLM Agent Trajectories
Tianyue Ou, Wanyao Guo, Apurva Gandhi et al.
MedTutor: A Retrieval-Augmented LLM System for Case-Based Medical Education
Dongsuk Jang, Ziyao Shangguan, Kyle Tegtmeyer et al.
LLM×MapReduce-V3: Enabling Interactive In-Depth Survey Generation through a MCP-Driven Hierarchically Modular Agent System
Yu Chao, Siyu Lin, Xiaorong Wang et al.
TruthTorchLM: A Comprehensive Library for Predicting Truthfulness in LLM Outputs
Duygu Nur Yaldiz, Yavuz Faruk Bakman, Sungmin Kang et al.
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents
Ziyang Miao, Qiyu Sun, Jingyuan Wang et al.
SAGE: A Generic Framework for LLM Safety Evaluation
Madhur Jindal, Hari Shrawgi, Parag Agrawal et al.
CRAB: A Benchmark for Evaluating Curation of Retrieval-Augmented LLMs in Biomedicine
Hanmeng Zhong, Linqing Chen, Wentao Wu et al.
Aligning LLMs for Multilingual Consistency in Enterprise Applications
Amit Agarwal, Hansa Meghwani, Hitesh Laxmichand Patel et al.
Mirror in the Model: Ad Banner Image Generation via Reflective Multi-LLM and Multi-modal Agents
Zhao Wang, Bowen Chen, Yotaro Shimose et al.
ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues?
Haoxin Wang, Xianhan Peng, Huang Cheng et al.
ProCut: LLM Prompt Compression via Attribution Estimation
Zhentao Xu, Fengyi Li, Albert C. Chen et al.
Select-then-Route : Taxonomy guided Routing for LLMs
Soham Shah, Kumar Shridhar
AutoCVSS: Assessing the Performance of LLMs for Automated Software Vulnerability Scoring
Davide Sanvito, Giovanni Arriciati, Giuseppe Siracusano et al.
Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt Generation for Enhanced LLM Content Moderation
Daniel Schwartz, Dmitriy Bespalov, Zhe Wang et al.
Memory-Efficient Backpropagation for Fine-Tuning LLMs on Resource-Constrained Mobile Devices
Congzheng Song, Xinyu Tang
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
Manveer Singh Tamber, Forrest Sheng Bao, Chenyu Xu et al.
Group Preference Alignment: Customizing LLM Responses from In-Situ Conversations Only When Needed
Ishani Mondal, Jack W. Stokes, Sujay Kumar Jauhar et al.
Can LLMs Narrate Tabular Data? An Evaluation Framework for Natural Language Representations of Text-to-SQL System Outputs
Jyotika Singh, Weiyi Sun, Amit Agarwal et al.
Auto prompting without training labels: An LLM cascade for product quality assessment in e-commerce catalogs
Soham Satyadharma, Fatemeh Sheikholeslami, Swati Kaul et al.
LLMs on a Budget? Say HOLA
Zohaib Hasan Siddiqui, Jiechao Gao, Ebad Shabbir et al.
Learning from LLM Agents: In-Context Generative Models for Text Casing in E-Commerce Ads
Yingxue Zhou, Tan Zhu, Tao Zeng et al.
AutoQual: An LLM Agent for Automated Discovery of Interpretable Features for Review Quality Assessment
Xiaochong Lan, Jie Feng, Yinxing Liu et al.
JSON Whisperer: Efficient JSON Editing with LLMs
Sarel Duanis, Asnat Greenstein-Messica, Eliya Habba