Papers
Adapting Psycholinguistic Research for LLMs: Gender-inclusive Language in a Coreference Context
Marion Bartl, Thomas Brendan Murphy, Susan Leavy
Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans
Javier Conde, Miguel González Saiz, María Grandury et al.
The Fellowship of the LLMs: Multi-Model Workflows for Synthetic Preference Optimization Dataset Generation
Samee Arif, Sualeha Farid, Abdul Hameed Azeemi et al.
Knockout LLM Assessment: Using Large Language Models for Evaluations through Iterative Pairwise Comparisons
Isik Baran Sandan, Tu Anh Dinh, Jan Niehues
Can LLMs Detect Intrinsic Hallucinations in Paraphrasing and Machine Translation?
Evangelia Gogoulou, Shorouq Zahra, Liane Guillou et al.
Evaluating LLMs with Multiple Problems at once
Zhengxiang Wang, Jordan Kodner, Owen Rambow
Modeling the One-to-Many Property in Open-Domain Dialogue with LLMs
Jing Yang Lee, Kong Aik Lee, Woon-Seng Gan
Cleanse: Uncertainty Estimation Approach Using Clustering-based Semantic Consistency in LLMs
Minsuh Joo, Hyunsoo Cho
Clustering Zero-Shot Uncertainty Estimations to Assess LLM Response Accuracy for Yes/No Q&A
Christopher T. Franck, Amy Vennos, W. Graham Mueller et al.
Using LLM Judgements for Sanity Checking Results and Reproducibility of Human Evaluations in NLP
Rudali Huidrom, Anya Belz
HuGME: A benchmark system for evaluating Hungarian generative LLMs
Noémi Ligeti-Nagy, Gabor Madarasz, Flora Foldesi et al.
ELAB: Extensive LLM Alignment Benchmark in Persian Language
Zahra Pourbahman, Fatemeh Rajabi, Mohammadhossein Sadeghi et al.
Fine-Tune on the Format: First Improving Multiple-Choice Evaluation for Intermediate LLM Checkpoints
Alec Bunn, Sarah Wiegreffe, Ben Bogin
Prompt, Translate, Fine-Tune, Re-Initialize, or Instruction-Tune? Adapting LLMs for In-Context Learning in Low-Resource Languages
Christopher Toukmaji, Jeffrey Flanigan
From Calculation to Adjudication: Examining LLM Judges on Mathematical Reasoning Tasks
Andreas Stephan, Dawei Zhu, Matthias Aßenmacher et al.
Single- vs. Dual-Prompt Dialogue Generation with LLMs for Job Interviews in Human Resources
Joachim De Baer, A. Seza Doğruöz, Thomas Demeester et al.
SparQLe: Speech Queries to Text Translation Through LLMs
Amirbek Djanibekov, Hanan Aldarmaki
Prompting LLMs: Length Control for Isometric Machine Translation
Dávid Javorský, Ondřej Bojar, François Yvon
Simultaneous Translation with Offline Speech and LLM Models in CUNI Submission to IWSLT 2025
Dominik Macháček, Peter Polák
Can LLMs Recognize Their Own Analogical Hallucinations? Evaluating Uncertainty Estimation for Analogical Reasoning
Zheng Chen, Zhaoxin Feng, Jianfei Ma et al.
Reasoning or Memorization? Investigating LLMs’ Capability in Restoring Chinese Internet Homophones
Jianfei Ma, Zhaoxin Feng, Huacheng Song et al.
On the Way to LLM Personalization: Learning to Remember User Conversations
Lucie Charlotte Magister, Katherine Metcalf, Yizhe Zhang et al.
Understanding Verbatim Memorization in LLMs Through Circuit Discovery
Ilya Lasy, Peter Knees, Stefan Woltran
Memorization is Language-Sensitive: Analyzing Memorization and Inference Risks of LLMs in a Multilingual Setting
Ali Satvaty, Anna Visman, Dan Seidel et al.