Artificial Intelligence › Core AI ›

Large Language Models

6405 directly classified papers

Papers per year

Papers

XLQA: A Benchmark for Locale-Aware Multilingual Open-Domain Question Answering EMNLP 2025

Agent-to-Agent Theory of Mind: Testing Interlocutor Awareness among Large Language Models EMNLP 2025

The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It EMNLP 2025

MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs EMNLP 2025

Machine-generated text detection prevents language model collapse EMNLP 2025

Too Helpful, Too Harmless, Too Honest or Just Right? EMNLP 2025

GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection EMNLP 2025

Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework EMNLP 2025

Quantized but Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMs EMNLP 2025

Simple Yet Effective: An Information-Theoretic Approach to Multi-LLM Uncertainty Quantification EMNLP 2025

Improving Rule-based Reasoning in LLMs using Neurosymbolic Representations EMNLP 2025

SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models EMNLP 2025

Beyond WER: Probing Whisper’s Sub‐token Decoder Across Diverse Language Resource Levels EMNLP 2025

ThinkTuning: Instilling Cognitive Reflections without Distillation EMNLP 2025

Pluralistic Alignment for Healthcare: A Role-Driven Framework EMNLP 2025

Beyond the Leaderboard: Understanding Performance Disparities in Large Language Models via Model Diffing EMNLP 2025

Explicit Learning and the LLM in Machine Translation EMNLP 2025

Label Set Optimization via Activation Distribution Kurtosis for Zero-Shot Classification with Generative Models EMNLP 2025

All Roads Lead to Rome: Graph-Based Confidence Estimation for Large Language Model Reasoning EMNLP 2025

Beyond Human Labels: A Multi-Linguistic Auto-Generated Benchmark for Evaluating Large Language Models on Resume Parsing EMNLP 2025

Measuring scalar constructs in social science with LLMs EMNLP 2025

Africa Health Check: Probing Cultural Bias in Medical LLMs EMNLP 2025

PruneCD: Contrasting Pruned Self Model to Improve Decoding Factuality EMNLP 2025

ThinkSLM: Towards Reasoning in Small Language Models EMNLP 2025

MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning EMNLP 2025