Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

Understanding Annotator Perception: Modeling Psychological Inference from First- and Third-Person Annotations (Student Abstract) AAAI 2025

Exploring and Mitigating Implicit Bias in Large Language Models: A Cross-Domain Evaluation Framework AAAI 2025

MABR: Multilayer Adversarial Bias Removal Without Prior Bias Knowledge AAAI 2025

Scaling Trends for Data Poisoning in LLMs AAAI 2025

LEGEND: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets AAAI 2025

On the Consideration of AI Openness: Can Good Intent Be Abused? AAAI 2025

Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints AAAI 2025

Why AI Is WEIRD and Shouldn't Be This Way: Towards AI for Everyone, with Everyone, by Everyone AAAI 2025

Data Attribution: A Data-Centric Approach for Trustworthy AI Development AAAI 2025

The AI Race: Why Current Neural Network-based Architectures are a Poor Basis for Artificial General Intelligence AAAI 2025

Lessons for Editors of AI Incidents from the AI Incident Database AAAI 2025

The Mainstays of Trustworthy Machine Learning AAAI 2025

Trustworthy AI Meets Educational Assessment: Challenges and Opportunities AAAI 2025

Usage Governance Advisor: From Intent to AI Governance AAAI 2025

Using Case Studies to Teach Responsible AI to Industry Practitioners AAAI 2025

Supporting AI Literacy Teaching Through the Development of Assessments for Classroom Use AAAI 2025

Artificial Intelligence for Future Presidents: Teaching AI Literacy to Everyone AAAI 2025

A Vision for Reinventing Credible Elections with Artificial Intelligence AAAI 2025

TrustMark: Robust Watermarking and Watermark Removal for Arbitrary Resolution Images ICCV 2025

Bridging the Gap Between Ideal and Real-world Evaluation: Benchmarking AI-Generated Image Detection in Challenging Scenarios ICCV 2025

Intervening in Black Box: Concept Bottleneck Model for Enhancing Human Neural Network Mutual Understanding ICCV 2025

AlignGuard: Scalable Safety Alignment for Text-to-Image Generation ICCV 2025

Style Over Substance: Evaluation Biases for Large Language Models COLING 2025

PERSONA: A Reproducible Testbed for Pluralistic Alignment COLING 2025

CTCC: A Robust and Stealthy Fingerprinting Framework for Large Language Models via Cross-Turn Contextual Correlation Backdoor EMNLP 2025