Artificial Intelligence › Core AI ›

Reasoning

2595 directly classified papers

Papers per year

Papers

TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications EMNLP 2025

U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in Large Language Models ACL 2025

Temporal Information Retrieval via Time-Specifier Model Merging ACL 2025

Can LLMs Recognize Their Own Analogical Hallucinations? Evaluating Uncertainty Estimation for Analogical Reasoning ACL 2025

Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models ACL 2025

ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving ACL 2025

Does “Reasoning” with Large Language Models Improve Recognizing, Generating and Reframing Unhelpful Thoughts? ACL 2025

The Art of Tool Interface Design ACL 2025

ToolReflection: Improving Large Language Models for Real-World API Calls with Self-Generated Data ACL 2025

Snap Out of It: A Dual-Process Approach to Mitigating Overthinking in Language Model Reasoning ACL 2025

StateAct: Enhancing LLM Base Agents via Self-prompting and State-tracking ACL 2025

The ClimateCheck Shared Task: Scientific Fact-Checking of Social Media Claims about Climate Change ACL 2025

iai_MSU at SemEval-2025 Task-3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes in English ACL 2025

UZH at SemEval-2025 Task 3: Token-Level Self-Consistency for Hallucination Detection ACL 2025

NCL-UoR at SemEval-2025 Task 3: Detecting Multilingual Hallucination and Related Observable Overgeneration Text Spans with Modified RefChecker and Modified SeflCheckGPT ACL 2025

Tables as Thought: Exploring Structured Thoughts in LLM Reasoning ACL 2025

Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning ACL 2025

DiaDP@XLLM25: Advancing Chinese Dialogue Parsing via Unified Pretrained Language Models and Biaffine Dependency Scoring ACL 2025

LLMSR@XLLM25: Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation ACL 2025

R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning EMNLP 2025

ModeLing: A Novel Dataset for Testing Linguistic Reasoning in Language Models NAACL 2025

LATTE: Learning to Think with Vision Specialists EMNLP 2025

From Causal Parrots to Causal Prophets? Towards Sound Causal Reasoning with Large Language Models NAACL 2025

Can Prompts Rewind Time for LLMs? Evaluating the Effectiveness of Prompted Knowledge Cutoffs EMNLP 2025

Beyond Image Classification: A Video Benchmark and Dual-Branch Hybrid Discrimination Framework for Compositional Zero-Shot Learning CVPR 2025