← Resources & Methods

Natural Language Processing › Resources & Methods ›

Large Language Models

9067 directly classified papers

Papers per year

Papers

Semantic Volume: Quantifying and Detecting Both External and Internal Uncertainty in LLMs AAAI 2026

Selection of LLM Fine-Tuning Data Based on Orthogonal Rules AAAI 2026

Do LLMs Feel? Teaching Emotion Recognition with Prompts, Retrieval, and Curriculum Learning AAAI 2026

VerifyBench: A Systematic Benchmark for Evaluating Reasoning Verifiers Across Domains AAAI 2026

Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction Learning AAAI 2026

SafeNLIDB: A Privacy-Preserving Safety Alignment Framework for LLM-based Natural Language Database Interfaces AAAI 2026

TruthfulRAG: Resolving Factual-level Conflicts in Retrieval-Augmented Generation with Knowledge Graphs AAAI 2026

From Detection to Diagnosis: Advancing Hallucination Analysis with Automated Data Synthesis AAAI 2026

Textual Self-Attention Network: Test-Time Preference Optimization Through Textual Gradient-Based Attention AAAI 2026

A Multi-Agent LLM Framework for Multi-Domain Low-Resource In-Context NER via Knowledge Retrieval, Disambiguation and Reflective Analysis AAAI 2026

Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models AAAI 2026

DSCodeBench: A Realistic Benchmark for Data Science Code Generation AAAI 2026

Enhancing Pre-training Data Detection in LLMs Through Discriminative and Symmetric Prefix Selection AAAI 2026

Efficient Hallucination Detection: Adaptive Bayesian Estimation of Semantic Entropy with Guided Semantic Exploration AAAI 2026

Incoherence as Oracle-less Measure of Error in LLM-Based Code Generation AAAI 2026

Deep Research Arena: The First Exam of LLMs’ Research Abilities via Seminar-Grounded Tasks AAAI 2026

Scaling and Transferability of Annealing Strategies in Large Language Model Training AAAI 2026

OmniBench: A Comprehensive Benchmark Integrating Real-World, Time-sensitive, and Multi-Hop Questions with a Multi-Dimensional Hybrid Evaluation Framework AAAI 2026

CP-Search: A Chain Progressive Search Training Framework Incentivizing the Cognitive Behaviors for Searching in LLMs AAAI 2026

Beyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM Reasoning AAAI 2026

Benchmarking and Enhancing Rule Knowledge-Driven Reasoning of Large Language Models AAAI 2026

AgriEval: A Comprehensive Chinese Agricultural Benchmark for Large Language Models AAAI 2026

OncoCoT: A Temporal-causal Chain-of-Thought Dataset for Oncologic Decision-Making AAAI 2026

Interpreting Fedspeak with Confidence: A LLM-Based Uncertainty-Aware Framework Guided by Monetary Policy Transmission Paths AAAI 2026

SpiderGen: Towards Procedure Generation for Carbon Life Cycle Assessments with Generative AI AAAI 2026