Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models ACL 2024

The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness ACL 2024

ROSE Doesn’t Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding ACL 2024

Evaluating Large Language Models for Health-related Queries with Presuppositions ACL 2024

TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detection ACL 2024

Whose Emotions and Moral Sentiments do Language Models Reflect? ACL 2024

Bias Bluff Busters at FIGNEWS 2024 Shared Task: Developing Guidelines to Make Bias Conscious ACL 2024

Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models ACL 2024

Ask LLMs Directly, “What shapes your bias?”: Measuring Social Bias in Large Language Models ACL 2024

On Shortcuts and Biases: How Finetuned Language Models Distinguish Audience-Specific Instructions in Italian and English ACL 2024

ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation ACL 2024

GPT is Not an Annotator: The Necessity of Human Annotation in Fairness Benchmark Construction ACL 2024

Disentangling Dialect from Social Bias via Multitask Learning to Improve Fairness ACL 2024

Images Speak Louder than Words: Understanding and Mitigating Bias in Vision-Language Model from a Causal Mediation Perspective EMNLP 2024

GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets EMNLP 2024

Rethinking the Role of Proxy Rewards in Language Model Alignment EMNLP 2024

Reward Modeling Requires Automatic Adjustment Based on Data Quality EMNLP 2024

Impeding LLM-assisted Cheating in Introductory Programming Assignments via Adversarial Perturbation EMNLP 2024

Evaluating Psychological Safety of Large Language Models EMNLP 2024

On the Reliability of Psychological Scales on Large Language Models EMNLP 2024

Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models EMNLP 2024

ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models EMNLP 2024

DetoxLLM: A Framework for Detoxification with Explanations EMNLP 2024

Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation CVPR 2024

MACE: Mass Concept Erasure in Diffusion Models CVPR 2024