conftrace_

Artificial Intelligence › Core AI ›

Large Language Models

6,405 papers

Papers per year

Papers

VISaGE: Understanding Visual Generics and Exceptions EMNLP 2025

ThinkSLM: Towards Reasoning in Small Language Models EMNLP 2025

MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning EMNLP 2025

Batched Self-Consistency Improves LLM Relevance Assessment and Ranking EMNLP 2025

DEBATE, TRAIN, EVOLVE: Self‐Evolution of Language Model Reasoning EMNLP 2025

CARE: Multilingual Human Preference Learning for Cultural Awareness EMNLP 2025

Language Models Identify Ambiguities and Exploit Loopholes EMNLP 2025

Benchmarking LLMs for Translating Classical Chinese Poetry: Evaluating Adequacy, Fluency, and Elegance EMNLP 2025

AraEval: An Arabic Multi-Task Evaluation Suite for Large Language Models EMNLP 2025

A Systematic Survey of Automatic Prompt Optimization Techniques EMNLP 2025

Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation EMNLP 2025

MemInsight: Autonomous Memory Augmentation for LLM Agents EMNLP 2025

No Need for Explanations: LLMs can implicitly learn from mistakes in-context EMNLP 2025

Revealing and Mitigating the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing EMNLP 2025

Benchmarking LLMs on Semantic Overlap Summarization EMNLP 2025

ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection EMNLP 2025

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward EMNLP 2025

A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making EMNLP 2025

Castle: Causal Cascade Updates in Relational Databases with Large Language Models EMNLP 2025

NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls EMNLP 2025

Benchmarking and Mitigating MCQA Selection Bias of Large Vision-Language Models EMNLP 2025

Can Large Language Models Unlock Novel Scientific Research Ideas? EMNLP 2025

Word Salad Chopper: Reasoning Models Waste A Ton Of Decoding Budget On Useless Repetitions, Self-Knowingly EMNLP 2025

DIWALI - Diversity and Inclusivity aWare cuLture specific Items for India: Dataset and Assessment of LLMs for Cultural Text Adaptation in Indian Context EMNLP 2025

SYNC: A Synthetic Long-Context Understanding Benchmark for Controlled Comparisons of Model Capabilities EMNLP 2025