mathematical reasoning

355 papers

Explore in graph

Also known as

MWP

Co-occurring keywords

large language model (12755) chain-of-thought reasoning (469) benchmark evaluation (1539) reinforcement learning (4122) language model (4573) chain of thought (274) math word problem (103) multimodal learning (4622) chain-of-thought prompting (306) question answering (2904)

Papers

GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers ACL 2024

Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models ACL 2024

Generation of Visual Representations for Multi-Modal Mathematical Knowledge AAAI 2024

What Makes Math Word Problems Challenging for LLMs? NAACL 2024

How Do Humans Write Code? Large Models Do It the Same Way Too EMNLP 2024

MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning EMNLP 2024

ControlMath: Controllable Data Generation Promotes Math Generalist Models EMNLP 2024

MinT: Boosting Generalization in Mathematical Reasoning via Multi-view Fine-tuning COLING 2024

mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models ACL 2024

Forward-Backward Reasoning in Large Language Models for Mathematical Verification ACL 2024

ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models ACL 2024

MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark ACL 2024

Rationales for Answers to Simple Math Word Problems Confuse Large Language Models ACL 2024

Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction ACL 2024

CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs’ Mathematical Reasoning Capabilities ACL 2024

Exploring Reversal Mathematical Reasoning Ability for Large Language Models ACL 2024

NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models ACL 2024

DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents ACL 2024

FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in LLMs EMNLP 2024

TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts EMNLP 2024

Pre-trained Large Language Models Use Fourier Features to Compute Addition NIPS 2024

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing NIPS 2024

Proving Olympiad Algebraic Inequalities without Human Demonstrations NIPS 2024

Multi-language Diversity Benefits Autoformalization NIPS 2024

Global Lyapunov functions: a long-standing open problem in mathematics, with symbolic transformers NIPS 2024