Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

A Statistical and Multi-Perspective Revisiting of the Membership Inference Attack in Large Language Models ACL 2025

Visual Robustness Benchmark for Visual Question Answering (VQA) WACV 2025

iShumei-Chinchunmei at SemEval-2025 Task 4: A balanced forgetting and retention multi-task framework using effective unlearning loss ACL 2025

COSMIC: Generalized Refusal Direction Identification in LLM Activations ACL 2025

SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models ACL 2025

BanStereoSet: A Dataset to Measure Stereotypical Social Biases in LLMs for Bangla ACL 2025

MOSAIC: Multiple Observers Spotting AI Content ACL 2025

Are LLMs Rational Investors? A Study on the Financial Bias in LLMs ACL 2025

Missing the Margins: A Systematic Literature Review on the Demographic Representativeness of LLMs ACL 2025

WaterPool: A Language Model Watermark Mitigating Trade-Offs among Imperceptibility, Efficacy and Robustness NAACL 2025

Uchaguzi-2022: A Dataset of Citizen Reports on the 2022 Kenyan Election COLING 2025

DIESEL: A Lightweight Inference-Time Safety Enhancement for Language Models ACL 2025

Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning ACL 2025

HatePRISM: Policies, Platforms, and Research Integration. Advancing NLP for Hate Speech Proactive Mitigation ACL 2025

System Prompt Hijacking via Permutation Triggers in LLM Supply Chains ACL 2025

R.R.: Unveiling LLM Training Privacy through Recollection and Ranking ACL 2025

Understanding Microtargeting Pattern on Social Media AAAI 2025

Robustness and Confounders in the Demographic Alignment of LLMs with Human Perceptions of Offensiveness ACL 2025

The Rise of Darkness: Safety-Utility Trade-Offs in Role-Playing Dialogue Agents ACL 2025

Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings ACL 2025

Stereotype Detection as a Catalyst for Enhanced Bias Detection: A Multi-Task Learning Approach ACL 2025

Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation ACL 2025

Decoupling Memories, Muting Neurons: Towards Practical Machine Unlearning for Large Language Models ACL 2025

Evaluating Index-based Treatment Allocation in Underresourced Communities AAAI 2025

SafeLawBench: Towards Safe Alignment of Large Language Models ACL 2025