Research Explorer

The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

Yuji Zhang, Sha Li, Cheng Qian et al.

2025 ACL

Gender Bias in Nepali-English Machine Translation: A Comparison of LLMs and Existing MT Systems

Supriya Khadka, Bijayan Bhattarai

2025 ACL

Assessing the Reliability of LLMs Annotations in the Context of Demographic Bias and Model Explanation

Hadi Mohammadi, Tina Shahedi, Pablo Mosteiro et al.

2025 ACL

Colombian Waitresses y Jueces canadienses: Gender and Country Biases in Occupation Recommendations from LLMs

Elisa Forcada Rodríguez, Olatz Perez-de-Vinaspre, Jon Ander Campos et al.

2025 ACL

Examining the Cultural Encoding of Gender Bias in LLMs for Low-Resourced African Languages

Abigail Oppong, Hellina Hailu Nigatu, Chinasa T. Okolo

2025 ACL

Adapting Psycholinguistic Research for LLMs: Gender-inclusive Language in a Coreference Context

Marion Bartl, Thomas Brendan Murphy, Susan Leavy

2025 ACL

Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans

Javier Conde, Miguel González Saiz, María Grandury et al.

2025 ACL

The Fellowship of the LLMs: Multi-Model Workflows for Synthetic Preference Optimization Dataset Generation

Samee Arif, Sualeha Farid, Abdul Hameed Azeemi et al.

2025 ACL

Knockout LLM Assessment: Using Large Language Models for Evaluations through Iterative Pairwise Comparisons

Isik Baran Sandan, Tu Anh Dinh, Jan Niehues

2025 ACL

Can LLMs Detect Intrinsic Hallucinations in Paraphrasing and Machine Translation?

Evangelia Gogoulou, Shorouq Zahra, Liane Guillou et al.

2025 ACL

Evaluating LLMs with Multiple Problems at once

Zhengxiang Wang, Jordan Kodner, Owen Rambow

2025 ACL

Modeling the One-to-Many Property in Open-Domain Dialogue with LLMs

Jing Yang Lee, Kong Aik Lee, Woon-Seng Gan

2025 ACL

Cleanse: Uncertainty Estimation Approach Using Clustering-based Semantic Consistency in LLMs

Minsuh Joo, Hyunsoo Cho

2025 ACL

Clustering Zero-Shot Uncertainty Estimations to Assess LLM Response Accuracy for Yes/No Q&A

Christopher T. Franck, Amy Vennos, W. Graham Mueller et al.

2025 ACL

Using LLM Judgements for Sanity Checking Results and Reproducibility of Human Evaluations in NLP

Rudali Huidrom, Anya Belz

2025 ACL

HuGME: A benchmark system for evaluating Hungarian generative LLMs

Noémi Ligeti-Nagy, Gabor Madarasz, Flora Foldesi et al.

2025 ACL

ELAB: Extensive LLM Alignment Benchmark in Persian Language

Zahra Pourbahman, Fatemeh Rajabi, Mohammadhossein Sadeghi et al.

2025 ACL

Fine-Tune on the Format: First Improving Multiple-Choice Evaluation for Intermediate LLM Checkpoints

Alec Bunn, Sarah Wiegreffe, Ben Bogin

2025 ACL

Prompt, Translate, Fine-Tune, Re-Initialize, or Instruction-Tune? Adapting LLMs for In-Context Learning in Low-Resource Languages

Christopher Toukmaji, Jeffrey Flanigan

2025 ACL

From Calculation to Adjudication: Examining LLM Judges on Mathematical Reasoning Tasks

Andreas Stephan, Dawei Zhu, Matthias Aßenmacher et al.

2025 ACL

Single- vs. Dual-Prompt Dialogue Generation with LLMs for Job Interviews in Human Resources

Joachim De Baer, A. Seza Doğruöz, Thomas Demeester et al.

2025 ACL

SparQLe: Speech Queries to Text Translation Through LLMs

Amirbek Djanibekov, Hanan Aldarmaki

2025 ACL

Prompting LLMs: Length Control for Isometric Machine Translation

Dávid Javorský, Ondřej Bojar, François Yvon

2025 ACL

CUNI-NL@IWSLT 2025: End-to-end Offline Speech Translation and Instruction Following with LLMs

Nam Luu, Ondřej Bojar

2025 ACL

Simultaneous Translation with Offline Speech and LLM Models in CUNI Submission to IWSLT 2025

Dominik Macháček, Peter Polák

2025 ACL

Papers