Papers
BanglaMATH : A Bangla benchmark dataset for testing LLM mathematical reasoning at grades 6, 7, and 8
Tabia Tanzin Prama, Christopher M. Danforth, Peter Dodds
Synthetic Proofs with Tool-Integrated Reasoning: Contrastive Alignment for LLM Mathematics with Lean
Mark Obozov, Michael Diskin, Aleksandr Beznosikov et al.
CoCo-CoLa: Evaluating and Improving Language Adherence in Multilingual LLMs
Elnaz Rahmati, Alireza Salkhordeh Ziabari, Morteza Dehghani
Unlocking LLM Safeguards for Low-Resource Languages via Reasoning and Alignment with Minimal Training Data
Zhuowei Chen, Bowei Zhang, Nankai Lin et al.
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs
Lucas Bandarkar, Nanyun Peng
Reassessing Speech Translation for Low-Resource Languages: Do LLMs Redefine the State-of-the-Art Against Cascaded Models?
Jonah Dauvet, Min Ma, Jessica Ojo et al.
TenseLoC: Tense Localization and Control in a Multilingual LLM
Ariun-Erdene Tumurchuluun, Yusser Al Ghussin, David Mareček et al.
Translating Tax Law to Code with LLMs: A Benchmark and Evaluation Framework
Gabriele Lorenzo, Aldo Pietromatera, Nils Holzenberger
Modeling Motivated Reasoning in Law: Evaluating Strategic Role Conditioning in LLM Summarization
Eunjung Cho, Alexander Hoyle, Yoan Hermstrüwer
Validate Your Authority: Benchmarking LLMs on Multi-Label Precedent Treatment Classification
M. Mikail Demir, M Abdullah Canbaz
ContractEval: Benchmarking LLMs for Clause-Level Legal Risk Identification in Commercial Contracts
Shuang Liu, Zelong Li, Ruoyun Ma et al.
Contemporary LLMs struggle with extracting formal legal arguments
Lena Held, Ivan Habernal
Aligning LLMs for Thai Legal Question Answering with Efficient Semantic-Similarity Rewards
Pawitsapak Akarajaradwong, Chompakorn Chaksangchaichot, Pirat Pothavorn et al.
Not ready for the bench: LLM legal interpretation is unstable and uncalibrated to human judgments
Abhishek Purushothama, Junghyun Min, Brandon Waldon et al.
Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning
Kush Juvekar, Arghya Bhattacharya, Sai Khadloya et al.
Explanations explained. Influence of Free-text Explanations on LLMs and the Role of Implicit Knowledge
Andrea Zaninello, Roberto Dessi, Malvina Nissim et al.
Latent Traits and Cross-Task Transfer: Deconstructing Dataset Interactions in LLM Fine-tuning
Shambhavi Krishna, Atharva Naik, Chaitali Agarwal et al.
LLMs as annotators of argumentation
Anna Lindahl
Beyond Human Judgment: A Bayesian Evaluation of LLMs’ Moral Values Understanding
Maciej Skorski, Alina Landowska
On the Role of Unobserved Sequences on Sample-based Uncertainty Quantification for LLMs
Lucie Kunitomo-Jacquin, Edison Marrese-Taylor, Ken Fukuda
Confidence-Based Response Abstinence: Improving LLM Trustworthiness via Activation-Based Uncertainty Estimation
Zhiqi Huang, Vivek Datla, Chenyang Zhu et al.
Towards Trustworthy Summarization of Cardiovascular Articles: A Factuality-and-Uncertainty-Aware Biomedical LLM Approach
Eleni Partalidou, Tatiana Passali, Chrysoula Zerva et al.
Causal Understanding by LLMs: The Role of Uncertainty
Oscar William Lithgow-Serrano, Vani Kanjirangat, Alessandro Antonucci
Read Your Own Mind: Reasoning Helps Surface Self-Confidence Signals in LLMs
Jakub Podolak, Rajeev Verma