Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

Style Over Substance: Evaluation Biases for Large Language Models COLING 2025

The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models COLING 2025

Exploring Backdoor Vulnerabilities of Chat Models COLING 2025

Do language models practice what they preach? Examining language ideologies about gendered language reform encoded in LLMs COLING 2025

Learning to Refuse: Towards Mitigating Privacy Risks in LLMs COLING 2025

“Not Aligned” is Not “Malicious”: Being Careful about Hallucinations of Large Language Models’ Jailbreak COLING 2025

The Gaps between Fine Tuning and In-context Learning in Bias Evaluation and Debiasing COLING 2025

LLM Sensitivity Challenges in Abusive Language Detection: Instruction-Tuned vs. Human Feedback COLING 2025

SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration COLING 2025

Automated Progressive Red Teaming COLING 2025

MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models ACL 2025

From Complexity to Clarity: AI/NLP’s Role in Regulatory Compliance ACL 2025

Measuring and Benchmarking Large Language Models’ Capabilities to Generate Persuasive Language NAACL 2025

How to Make LLMs Forget: On Reversing In-Context Knowledge Edits NAACL 2025

Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering EMNLP 2025

Measuring Bias or Measuring the Task: Understanding the Brittle Nature of LLM Gender Biases EMNLP 2025

Meta-Cultural Competence: Climbing the Right Hill of Cultural Awareness NAACL 2025

DAMON: A Dialogue-Aware MCTS Framework for Jailbreaking Large Language Models EMNLP 2025

R-TOFU: Unlearning in Large Reasoning Models EMNLP 2025

Biased LLMs can Influence Political Decision-Making ACL 2025

ALPACA AGAINST VICUNA: Using LLMs to Uncover Memorization of LLMs NAACL 2025

Good Intentions Beyond ACL: Who Does NLP for Social Good, and Where? EMNLP 2025

SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language Models EMNLP 2025

Anak Baik: A Low-Cost Approach to Curate Indonesian Ethical and Unethical Instructions COLING 2025

Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs EMNLP 2025