conftrace_

Artificial Intelligence › Core AI ›

Large Language Models

6,405 papers

Papers per year

Papers

Autoformalization in the Wild: Assessing LLMs on Real-World Mathematical Definitions EMNLP 2025

Culture Cartography: Mapping the Landscape of Cultural Knowledge EMNLP 2025

We Politely Insist: Your LLM Must Learn the Persian Art of Taarof EMNLP 2025

Foot-In-The-Door: A Multi-turn Jailbreak for LLMs EMNLP 2025

TurnaboutLLM: A Deductive Reasoning Benchmark from Detective Games EMNLP 2025

Transferable Direct Prompt Injection via Activation-Guided MCMC Sampling EMNLP 2025

Direct Judgement Preference Optimization EMNLP 2025

F²Bench: An Open-ended Fairness Evaluation Benchmark for LLMs with Factuality Considerations EMNLP 2025

CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering EMNLP 2025

FilBench: Can LLMs Understand and Generate Filipino? EMNLP 2025

LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation EMNLP 2025

Exploring Changes in Nation Perception with Nationality-Assigned Personas in LLMs EMNLP 2025

Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations EMNLP 2025

Language-to-Space Programming for Training-Free 3D Visual Grounding EMNLP 2025

Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding EMNLP 2025

Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs EMNLP 2025

P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs EMNLP 2025

RESF: Regularized-Entropy-Sensitive Fingerprinting for Black-Box Tamper Detection of Large Language Models EMNLP 2025

Model-based Large Language Model Customization as Service EMNLP 2025

Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents EMNLP 2025

Through the Valley: Path to Effective Long CoT Training for Small Language Models EMNLP 2025

RED: Unleashing Token-Level Rewards from Holistic Feedback via Reward Redistribution EMNLP 2025

SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language Models EMNLP 2025

InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles EMNLP 2025

DART: Distilling Autoregressive Reasoning to Silent Thought EMNLP 2025