← Resources & Methods

Natural Language Processing › Resources & Methods ›

Large Language Models

9067 directly classified papers

Papers per year

Papers

CAMA: Enhancing Mathematical Reasoning in Large Language Models with Causal Knowledge AAAI 2026

TAdaRAG: Task Adaptive Retrieval-Augmented Generation via On-the-Fly Knowledge Graph Construction AAAI 2026

Interpretable Reward Model via Sparse Autoencoder AAAI 2026

Beyond Chains: Bridging Large Language Models and Knowledge Bases in Complex Question Answering AAAI 2026

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study AAAI 2026

Privacy Preserving In-Context-Learning Framework for Large Language Models AAAI 2026

The Silent Amplifier: In-Context Examples Fuel Bias in Large Language Models AAAI 2026

Joint-GCG: Unified Gradient-Based Poisoning Attacks on Retrieval-Augmented Generation Systems AAAI 2026

ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models AAAI 2026

ACID Test: A Benchmark for Cultural Safety and Alignment in LALMs AAAI 2026

EssayBench: Evaluating Large Language Models in Multi-Genre Chinese Essay Writing AAAI 2026

On the Alignment of Large Language Models with Global Human Opinion AAAI 2026

Polarity-Aware Probing for Quantifying Latent Alignment in Language Models AAAI 2026

Persistent Instability in LLM’s Personality Measurements: Effects of Scale, Reasoning, and Conversation History AAAI 2026

GEM: Generative Entropy-Guided Preference Modeling for Few-Shot Alignment of LLMs AAAI 2026

LocalBench: Benchmarking LLMs on County-Level Local Knowledge and Reasoning AAAI 2026

A Human-Centric Pipeline for Aligning Large Language Models with Chinese Medical Ethics AAAI 2026

CyPortQA: Benchmarking Multimodal Large Language Models for Cyclone Preparedness in Port Operation AAAI 2026

AlignSurvey: A Comprehensive Benchmark for Human Preferences Alignment in Social Surveys AAAI 2026

ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation AAAI 2026

CARE-Bench: A Benchmark of Diverse Client Simulations Guided by Expert Principles for Evaluating LLMs in Psychological Counseling AAAI 2026

HSKBenchmark: Modeling and Benchmarking Chinese Second Language Acquisition in Large Language Models Through Curriculum Tuning AAAI 2026

Assessing Automated Fact-Checking for Medical LLM Responses with Knowledge Graphs AAAI 2026

Towards Aligned and Efficient Large Language Models AAAI 2026

Obedience or Vigilance? How Large Language Models React to Malicious Multiple-Choice Options (Student Abstract) AAAI 2026