Papers
5,479 papers found
Funny or Persuasive, but Not Both: Evaluating Fine-Grained Multi-Concept Control in LLMs
Arya Labroo, Ivaxi Sheth, Vyas Raina et al.
Confidence Leaps in LLM Reasoning: Early Stopping and Cross-Model Transfer
Pavel Tikhonov, Ivan Oseledets, Elena Tutubalina
LLMs Know More About Numbers than They Can Say
Fengting Yuchi, Li Du, Jason Eisner
DeepPavlov Strikes Back: A Toolkit for Improving LLM Reliability and Trustworthiness
Evgenii Nikolaev, Timur Ionov, Anna Korzanova et al.
IntelliCode: A Multi-Agent LLM Tutoring System with Centralized Learner Modeling
Jones David, Shreya Ghosh
EvalSense: A Framework for Domain-Specific LLM (Meta-)Evaluation
Adam Dejl, Jonathan Pearson
Machine Translation for Low-Resource Languages through Monolingual Data and LLM: A Case Study of English-to-Basque
Nam Luu, Aitor Soroa, German Rigau et al.
Generalising LLM Routing using Past Performance Retrieval: A Few-Shot Router is Sufficient
Clovis Varangot-Reille, Christophe Bouvard, Antoine Gourru
Evaluating the Impact of SAE-based Language Steering on LLM Performance
Sebastian Zwirner, Wentao Hu, Koshiro Aoki et al.
Analysing LLM Persona Generation and Fairness Interpretation in Polarised Geopolitical Contexts
Maida Aizaz, Quang Minh Nguyen
From Detection to Explanation: Modeling Fine-Grained Emotional Social Influence Techniques with LLMs and Human Preferences
Maciej Markiewicz, Wiktoria Mieleszczenko-Kowszewicz, Beata Bajcar et al.
Evaluating Cost-Efficiency of LLMs in a RAG Setup on Polish Wikipedia: Quality vs. Energy Consumption
Patrycja Smits, Tomasz Walkowiak
An Evaluation of Classifiers for Mapping Generative LLM Responses to Answer Options of Multiple-choice Questionnaires
Alisea Stroligo, Anna Shamray, Julian Schelb et al.
PersonaTrace: Synthesizing Realistic Digital Footprints with LLM Agents
Minjia Wang, Yunfeng Wang, Xiao Ma et al.
Evaluating the Pre-Consultation Ability of LLMs using Diagnostic Guidelines
Jean Seo, Gibaeg Kim, Kihun Shin et al.
SELENE: Selective and Evidence-Weighted LLM Debating for Efficient and Reliable Reasoning
Akshay Verma, Swapnil Gupta, Deepak Gupta et al.
Scaling Intent Understanding: A Framework for Classification with Clarification using Lightweight LLMs
Subhadip Nandi, Tanishka Agarwal, Anshika Singh et al.
Beyond IVR: Benchmarking Customer Support LLM Agents for Business-Adherence
Sumanth Balaji, Piyush Mishra, Aashraya Sachdeva et al.
LingVarBench: Benchmarking LLMs on Entity Recognitions and Linguistic Verbalization Patterns in Phone-Call Transcripts
Seyedali Mohammadi, Manas Paldhe, Amit Chhabra et al.
The Subtle Art of Defection: Understanding Uncooperative Behaviors in LLM based Multi-Agent Systems
Devang Kulshreshtha, Wanyu Du, Raghav Jain et al.
Aligning Paralinguistic Understanding and Generation in Speech LLMs via Multi-Task Reinforcement Learning
Minseok Kim, Jingxiang Chen, Seong-Gyun Leem et al.
ELO: Efficient Layer-Specific Optimization for Continual Pretraining of Multilingual LLMs
Hangyeol Yoo, ChangSu Choi, Minjun Kim et al.
A Hybrid Supervised-LLM Pipeline for Actionable Suggestion Mining in Unstructured Customer Reviews
Aakash Trivedi, Aniket Upadhyay, Pratik Narang et al.
Balanced Accuracy: The Right Metric for Evaluating LLM Judges - Explained through Youden’s J statistic
Stephane Collot, Colin Fraser, Justin Zhao et al.
Being Kind Isn’t Always Being Safe: Diagnosing Affective Hallucination in LLMs
Sewon Kim, Jiwon Kim, SeungWoo Shin et al.