Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

Social, Legal, Ethical, Empathetic, and Cultural Rules: Compilation and Reasoning AAAI 2024

“Thinking” Fair and Slow: On the Efficacy of Structured Prompts for Debiasing Language Models EMNLP 2024

Systematic Biases in LLM Simulations of Debates EMNLP 2024

Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments EMNLP 2024

Towards Tool Use Alignment of Large Language Models EMNLP 2024

Glue pizza and eat rocks - Exploiting Vulnerabilities in Retrieval-Augmented Generative Models EMNLP 2024

SHIELD: Evaluation and Defense Strategies for Copyright Compliance in LLM Text Generation EMNLP 2024

CMD: a framework for Context-aware Model self-Detoxification EMNLP 2024

Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models EMNLP 2024

ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings EMNLP 2024

GoldCoin: Grounding Large Language Models in Privacy Laws via Contextual Integrity Theory EMNLP 2024

Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism EMNLP 2024

Dissecting Fine-Tuning Unlearning in Large Language Models EMNLP 2024

Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models EMNLP 2024

Teaching LLMs to Abstain across Languages via Multilingual Feedback EMNLP 2024

Outcome-Constrained Large Language Models for Countering Hate Speech EMNLP 2024

How Does the Disclosure of AI Assistance Affect the Perceptions of Writing? EMNLP 2024

Thinking Outside of the Differential Privacy Box: A Case Study in Text Privatization with Language Model Prompting EMNLP 2024

VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment EMNLP 2024

ChatGPT Doesn’t Trust Chargers Fans: Guardrail Sensitivity in Context EMNLP 2024

Aligning Large Language Models with Diverse Political Viewpoints EMNLP 2024

“You Gotta be a Doctor, Lin” : An Investigation of Name-Based Bias of Large Language Models in Employment Recommendations EMNLP 2024

Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment EMNLP 2024

Humans or LLMs as the Judge? A Study on Judgement Bias EMNLP 2024

Walking in Others’ Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias EMNLP 2024