Artificial Intelligence › Core AI ›

Reasoning

2595 directly classified papers

Papers per year

Papers

Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models ACL 2025

Evaluating Generalization Capability of Language Models across Abductive, Deductive and Inductive Logical Reasoning COLING 2025

Evaluating Intermediate Reasoning of Code-Assisted Large Language Models for Mathematics ACL 2025

Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents NAACL 2025

U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in Large Language Models ACL 2025

Revealing the Barriers of Language Agents in Planning NAACL 2025

Temporal Information Retrieval via Time-Specifier Model Merging ACL 2025

ThoughtSculpt: Reasoning with Intermediate Revision and Search NAACL 2025

Can LLMs Recognize Their Own Analogical Hallucinations? Evaluating Uncertainty Estimation for Analogical Reasoning ACL 2025

Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models ICCV 2025

Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models ACL 2025

Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning NAACL 2025

ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving ACL 2025

Decoupling Metacognition from Cognition: A Framework for Quantifying Metacognitive Ability in LLMs AAAI 2025

Does “Reasoning” with Large Language Models Improve Recognizing, Generating and Reframing Unhelpful Thoughts? ACL 2025

Meta-Reasoning Improves Tool Use in Large Language Models NAACL 2025

The Art of Tool Interface Design ACL 2025

Is Sarcasm Detection a Step-by-Step Reasoning Process in Large Language Models? AAAI 2025

ToolReflection: Improving Large Language Models for Real-World API Calls with Self-Generated Data ACL 2025

GraphEval36K: Benchmarking Coding and Reasoning Capabilities of Large Language Models on Graph Datasets NAACL 2025

Snap Out of It: A Dual-Process Approach to Mitigating Overthinking in Language Model Reasoning ACL 2025

Temporal Numeric Planning with Patterns AAAI 2025

StateAct: Enhancing LLM Base Agents via Self-prompting and State-tracking ACL 2025

WisPerMed @ PerAnsSumm 2025: Strong Reasoning Through Structured Prompting and Careful Answer Selection Enhances Perspective Extraction and Summarization of Healthcare Forum Threads NAACL 2025

Self-Taught Agentic Long Context Understanding ACL 2025