Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey EACL 2023

Adding Instructions during Pretraining: Effective way of Controlling Toxicity in Language Models EACL 2023

Logic Against Bias: Textual Entailment Mitigates Stereotypical Sentence Reasoning EACL 2023

A Learning and Control Perspective for Microfinance L4DC 2023

Med-HALT: Medical Domain Hallucination Test for Large Language Models CONLL 2023

Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models CVPR 2023

Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models CVPR 2023

PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation CVPR 2023

Emptying the Ocean with a Spoon: Should We Edit Models? EMNLP 2023

Ethical Reasoning over Moral Alignment: A Case and Framework for In-Context Ethical Policies in LLMs EMNLP 2023

InstructSafety: A Unified Framework for Building Multidimensional and Explainable Safety Detector through Instruction Tuning EMNLP 2023

Uncovering the Root of Hate Speech: A Dataset for Identifying Hate Instigating Speech EMNLP 2023

Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews EMNLP 2023

Towards Conceptualization of “Fair Explanation”: Disparate Impacts of anti-Asian Hate Speech Explanations on Content Moderators EMNLP 2023

Gender Biases in Automatic Evaluation Metrics for Image Captioning EMNLP 2023

Language and Mental Health: Measures of Emotion Dynamics from Text as Linguistic Biosocial Markers EMNLP 2023

Stereotypes and Smut: The (Mis)representation of Non-cisgender Identities by Text-to-Image Models ACL 2023

A Multi-dimensional study on Bias in Vision-Language models ACL 2023

Analyzing Bias in Large Language Model Solutions for Assisted Writing Feedback Tools: Lessons from the Feedback Prize Competition Series ACL 2023

Everything you need to know about Multilingual LLMs: Towards fair, performant and reliable models for languages of the world ACL 2023

Harmful Language Datasets: An Assessment of Robustness ACL 2023

Distinguishing Fact from Fiction: A Benchmark Dataset for Identifying Machine-Generated Scientific Papers in the LLM Era. ACL 2023

Non-Repeatable Experiments and Non-Reproducible Results: The Reproducibility Crisis in Human Evaluation in NLP ACL 2023

This prompt is measuring <mask>: evaluating bias evaluation in language models ACL 2023

Improving Gender Fairness of Pre-Trained Language Models without Catastrophic Forgetting ACL 2023