Artificial Intelligence › Core AI ›

Foundation Models

4845 directly classified papers

Papers per year

Papers

K-12EduBench: A Benchmark for Evaluating Large Language Models’ Knowledge, Problem-Solving, and Educational Goal Cognition in K-12 Education AAAI 2026

Promoting Efficient Reasoning with Verifiable Stepwise Reward AAAI 2026

Predicting Emergent Tool Use in LLMs Before It Emerges: A Proxy Perspective AAAI 2026

LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction AAAI 2026

How Does Alignment Enhance LLMs’ Multilingual Capabilities? A Language Neurons Perspective AAAI 2026

The Avengers: A Routing Recipe for Collective Intelligence in Language Models AAAI 2026

GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning AAAI 2026

MMRAG-RFT: Two-stage Reinforcement Fine-tuning for Explainable Multi-modal Retrieval-augmented Generation AAAI 2026

Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities AAAI 2026

Learning from Guidelines: Structured Prompt Optimization for Expert Annotation Tasks AAAI 2026

Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory AAAI 2026

LLM Collaborative Filtering: User-Item Graph as New Language AAAI 2026

What to Ask Next? Probing the Imaginative Reasoning of LLMs with TurtleSoup Puzzles AAAI 2026

AncientBench: Towards Comprehensive Evaluation on Excavated and Transmitted Chinese Corpora AAAI 2026

SPAN: Benchmarking and Improving Cross-Calendar Temporal Reasoning of Large Language Models AAAI 2026

LLMdoctor: Token-Level Flow-Guided Preference Optimization for Efficient Test-Time Alignment of Large Language Models AAAI 2026

Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios AAAI 2026

ProFuser: Progressive Fusion of Large Language Models AAAI 2026

qa-FLoRA: Data-free query-adaptive Fusion of LoRAs for LLMs AAAI 2026

CP-Router: An Uncertainty-Aware Router Between LLM and LRM AAAI 2026

Well Begun, Half Done: Reinforcement Learning with Prefix Optimization for LLM Reasoning AAAI 2026

Improving Value-based Process Verifier via Low-Cost Variance Reduction AAAI 2026

GRAM-R²: Self-Training Generative Foundation Reward Models for Reward Reasoning AAAI 2026

ShoppingBench: A Real-World Intent-Grounded Shopping Benchmark for LLM-based Agents AAAI 2026

A Rolling Stone Gathers No Moss: Adaptive Policy Optimization for Stable Self-Evaluation in Large Multimodal Models AAAI 2026