Papers
What Does Infect Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs
Xinlan Yan, Di Wu, Yibin Lei et al.
Redefining Retrieval Evaluation in the Era of LLMs
Giovanni Trappolini, Florin Cuconasu, Simone Filice et al.
Debate, Deliberate, Decide (D3): A Cost-Aware Adversarial Framework for Reliable and Interpretable LLM Evaluation
Abir Harrasse, Chaithanya Bandi, Hari Bandi
Korean Canonical Legal Benchmark: Toward Knowledge-Independent Evaluation of LLMs’ Legal Reasoning Capabilities
Hongseok Oh, Wonseok Hwang, Kyoung-Woon On
Measuring Linguistic Competence of LLMs on Indigenous Languages of the Americas
Justin Vasselli, Arturo Mp, Frederikus Hudi et al.
Communication Enables Cooperation in LLM Agents: A Comparison with Curriculum-Based Approaches
Hachem Madmoun, Salem Lahlou
Beyond Tokens: Concept-Level Training Objectives for LLMs
Laya Iyer, Pranav Somani, Alice Guo et al.
Persuasion Tokens for Editing Factual Knowledge in LLMs
Paul Youssef, Christin Seifert, Jörg Schlötterer
Funny or Persuasive, but Not Both: Evaluating Fine-Grained Multi-Concept Control in LLMs
Arya Labroo, Ivaxi Sheth, Vyas Raina et al.
Confidence Leaps in LLM Reasoning: Early Stopping and Cross-Model Transfer
Pavel Tikhonov, Ivan Oseledets, Elena Tutubalina
LLMs Know More About Numbers than They Can Say
Fengting Yuchi, Li Du, Jason Eisner
DeepPavlov Strikes Back: A Toolkit for Improving LLM Reliability and Trustworthiness
Evgenii Nikolaev, Timur Ionov, Anna Korzanova et al.
IntelliCode: A Multi-Agent LLM Tutoring System with Centralized Learner Modeling
Jones David, Shreya Ghosh
EvalSense: A Framework for Domain-Specific LLM (Meta-)Evaluation
Adam Dejl, Jonathan Pearson
Machine Translation for Low-Resource Languages through Monolingual Data and LLM: A Case Study of English-to-Basque
Nam Luu, Aitor Soroa, German Rigau et al.
Generalising LLM Routing using Past Performance Retrieval: A Few-Shot Router is Sufficient
Clovis Varangot-Reille, Christophe Bouvard, Antoine Gourru
Evaluating the Impact of SAE-based Language Steering on LLM Performance
Sebastian Zwirner, Wentao Hu, Koshiro Aoki et al.
Analysing LLM Persona Generation and Fairness Interpretation in Polarised Geopolitical Contexts
Maida Aizaz, Quang Minh Nguyen
From Detection to Explanation: Modeling Fine-Grained Emotional Social Influence Techniques with LLMs and Human Preferences
Maciej Markiewicz, Wiktoria Mieleszczenko-Kowszewicz, Beata Bajcar et al.
Evaluating Cost-Efficiency of LLMs in a RAG Setup on Polish Wikipedia: Quality vs. Energy Consumption
Patrycja Smits, Tomasz Walkowiak
An Evaluation of Classifiers for Mapping Generative LLM Responses to Answer Options of Multiple-choice Questionnaires
Alisea Stroligo, Anna Shamray, Julian Schelb et al.
PersonaTrace: Synthesizing Realistic Digital Footprints with LLM Agents
Minjia Wang, Yunfeng Wang, Xiao Ma et al.
Evaluating the Pre-Consultation Ability of LLMs using Diagnostic Guidelines
Jean Seo, Gibaeg Kim, Kihun Shin et al.
SELENE: Selective and Evidence-Weighted LLM Debating for Efficient and Reliable Reasoning
Akshay Verma, Swapnil Gupta, Deepak Gupta et al.