Papers
ContractEval: Benchmarking LLMs for Clause-Level Legal Risk Identification in Commercial Contracts
Shuang Liu, Zelong Li, Ruoyun Ma et al.
Contemporary LLMs struggle with extracting formal legal arguments
Lena Held, Ivan Habernal
Aligning LLMs for Thai Legal Question Answering with Efficient Semantic-Similarity Rewards
Pawitsapak Akarajaradwong, Chompakorn Chaksangchaichot, Pirat Pothavorn et al.
Not ready for the bench: LLM legal interpretation is unstable and uncalibrated to human judgments
Abhishek Purushothama, Junghyun Min, Brandon Waldon et al.
Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning
Kush Juvekar, Arghya Bhattacharya, Sai Khadloya et al.
Explanations explained. Influence of Free-text Explanations on LLMs and the Role of Implicit Knowledge
Andrea Zaninello, Roberto Dessi, Malvina Nissim et al.
Latent Traits and Cross-Task Transfer: Deconstructing Dataset Interactions in LLM Fine-tuning
Shambhavi Krishna, Atharva Naik, Chaitali Agarwal et al.
LLMs as annotators of argumentation
Anna Lindahl
Beyond Human Judgment: A Bayesian Evaluation of LLMs’ Moral Values Understanding
Maciej Skorski, Alina Landowska
On the Role of Unobserved Sequences on Sample-based Uncertainty Quantification for LLMs
Lucie Kunitomo-Jacquin, Edison Marrese-Taylor, Ken Fukuda
Confidence-Based Response Abstinence: Improving LLM Trustworthiness via Activation-Based Uncertainty Estimation
Zhiqi Huang, Vivek Datla, Chenyang Zhu et al.
Towards Trustworthy Summarization of Cardiovascular Articles: A Factuality-and-Uncertainty-Aware Biomedical LLM Approach
Eleni Partalidou, Tatiana Passali, Chrysoula Zerva et al.
Causal Understanding by LLMs: The Role of Uncertainty
Oscar William Lithgow-Serrano, Vani Kanjirangat, Alessandro Antonucci
Read Your Own Mind: Reasoning Helps Surface Self-Confidence Signals in LLMs
Jakub Podolak, Rajeev Verma
Probing Gender Bias in Multilingual LLMs: A Case Study of Stereotypes in Persian
Ghazal Kalhor, Behnam Bahrak
ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs
Hua Shen, Tiffany Knearem, Reshmi Ghosh et al.
That Ain’t Right: Assessing LLM Performance on QA in African American and West African English Dialects
William Coggins, Jasmine McKenzie, Sangpil Youm et al.
Findings of the WMT 2025 Shared Task on Model Compression: Early Insights on Compressing LLMs for Machine Translation
Marco Gaido, Roman Grundkiewicz, Thamme Gowda et al.
Findings of the WMT 2025 Shared Task LLMs with Limited Resources for Slavic Languages: MT and QA
Shu Okabe, Daryna Dementieva, Marion Di Marco et al.
Marco Large Translation Model at WMT2025: Transforming Translation Capability in LLMs via Quality-Aware Training and Decoding
Hao Wang, Linlong Xu, Heng Liu et al.
A* Decoding for Machine Translation in LLMs - SRPOL Participation in WMT2025
Adam Dobrowolski, Paweł Przewłocki, Paweł Przybysz et al.
IRB-MT at WMT25 Translation Task: A Simple Agentic System Using an Off-the-Shelf LLM
Ivan Grubišić, Damir Korencic
Evaluation of LLM for English to Hindi Legal Domain Machine Translation Systems
Kshetrimayum Boynao Singh, Deepak Kumar, Asif Ekbal
Tagged Span Annotation for Detecting Translation Errors in Reasoning LLMs
Taemin Yeom, Yonghyun Ryu, Yoonjung Choi et al.