Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

Silenced Biases: The Dark Side LLMs Learned to Refuse AAAI 2026

Reducing the Scope of Language Models AAAI 2026

Steering Representations, Safeguarding Privacy: A Cross-Modal Privacy Protection Method for Generative AI AAAI 2026

ShadeEdit: A Utility-Preserving and Defense-Evasive Knowledge Manipulation Attack in Federated LLMs AAAI 2026

SCOPE: Intrinsic Semantic Space Control for Mitigating Copyright Infringement in LLMs AAAI 2026

ALTER: Asymmetric LoRA for Token-Entropy-Guided Unlearning of LLMs AAAI 2026

Fairness Perceptions of Large Language Models AAAI 2026

Beyond World Models: Rethinking Understanding in AI Models AAAI 2026

Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning AAAI 2026

Hallucination as a Computational Boundary: A Hierarchy of Inevitability and the Oracle Escape AAAI 2026

From Chaos to Clarity: A Knowledge Graph-Driven Audit Dataset Generation Framework for LLM Unlearning AAAI 2026

Large Language Model Unlearning for Source Code AAAI 2026

“Yuki Gets Sushi, David Gets Steak?”: Uncovering Gender and Racial Biases in LLM-Based Meal Recommendations EACL 2026

Safe RAG by RAG: Untying the Bell That RAG Rang with the RAG Hand AAAI 2026

HEV Generative Sandbox: A Framework for Assessing Domain-Specific Social Risks Through Human-LLM Simulation AAAI 2026

Adaptive Hallucination Alleviation in Multimodal Large Language Models: From Strategic Data Selection to Severity-Guided Training AAAI 2026

Oblivionis: A Lightweight Learning and Unlearning Framework for Federated Large Language Models AAAI 2026

On the Misalignment Between Data Learnability and Forgettability in Machine Unlearning AAAI 2026

Simulated Rewards, Skewed Strategies: Tracing the Acquired Preference Bias in LLM-Based Dialogue Planners AAAI 2026

An Information Theoretic Evaluation Metric for Strong Unlearning AAAI 2026

OmniDPO: A Preference Optimization Framework to Address Omni-Modal Hallucination AAAI 2026

Reasoning Shapes Alignment: Investigating Cultural Alignment in Large Reasoning Models with Cultural Norms AAAI 2026

A Robust Unlearning Method with Adaptive Knowledge Guidance and Memory Preservation AAAI 2026

FGD-Align: Pluralistic Alignment for Large Language Models via Fuzzy Group Decision-Making AAAI 2026

WaterMod: Modular Token-Rank Partitioning for Probability-Balanced LLM Watermarking AAAI 2026