Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

CoSafe: Evaluating Large Language Model Safety in Multi-Turn Dialogue Coreference EMNLP 2024

Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness EMNLP 2024

From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking EMNLP 2024

Moral Foundations of Large Language Models EMNLP 2024

From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment EMNLP 2024

What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study EMNLP 2024

Revisiting the Robustness of Watermarking to Paraphrasing Attacks EMNLP 2024

User Inference Attacks on Large Language Models EMNLP 2024

Please note that I’m just an AI: Analysis of Behavior Patterns of LLMs in (Non-)offensive Speech Identification EMNLP 2024

Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language Models EMNLP 2024

Adaptable Moral Stances of Large Language Models on Sexist Content: Implications for Society and Gender Discourse EMNLP 2024

SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization EMNLP 2024

Towards Aligning Language Models with Textual Feedback EMNLP 2024

“They are uncultured”: Unveiling Covert Harms and Social Threats in LLM Generated Conversations EMNLP 2024

Do LLMs Know to Respect Copyright Notice? EMNLP 2024

BiasWipe: Mitigating Unintended Bias in Text Classifiers through Model Interpretability EMNLP 2024

Local Contrastive Editing of Gender Stereotypes EMNLP 2024

STAR: SocioTechnical Approach to Red Teaming Language Models EMNLP 2024

Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations EMNLP 2024

The Greatest Good Benchmark: Measuring LLMs’ Alignment with Utilitarian Moral Dilemmas EMNLP 2024

GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation EMNLP 2024

Context-aware Watermark with Semantic Balanced Green-red Lists for Large Language Models EMNLP 2024

MarkLLM: An Open-Source Toolkit for LLM Watermarking EMNLP 2024

CAVA: A Tool for Cultural Alignment Visualization & Analysis EMNLP 2024

Debiasing Text Safety Classifiers through a Fairness-Aware Ensemble EMNLP 2024