Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions EMNLP 2024

ORES: Open-Vocabulary Responsible Visual Synthesis AAAI 2024

Image Copy Detection for Diffusion Models NIPS 2024

Enhancing Healthcare Predictions with Deep Learning Models AAAI 2024

A Survey on Detection of LLMs-Generated Content EMNLP 2024

Gender Identity in Pretrained Language Models: An Inclusive Approach to Data Creation and Probing EMNLP 2024

Towards Robust Evaluation of Unlearning in LLMs via Data Transformations EMNLP 2024

A Sequentially Fair Mechanism for Multiple Sensitive Attributes AAAI 2024

Learning to Unlearn: Instance-Wise Unlearning for Pre-trained Classifiers AAAI 2024

T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models NIPS 2024

Evaluating Moral Beliefs across LLMs through a Pluralistic Framework EMNLP 2024

Gender Bias in Decision-Making with Large Language Models: A Study of Relationship Conflicts EMNLP 2024

Evaluating Biases in Context-Dependent Sexual and Reproductive Health Questions EMNLP 2024

LLM Evaluators Recognize and Favor Their Own Generations NIPS 2024

JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models EMNLP 2024

Fostering Trustworthiness in Machine Learning Algorithms AAAI 2024

Irrelevant Alternatives Bias Large Language Model Hiring Decisions EMNLP 2024

Fine-tuning Language Models for AI vs Human Generated Text detection NAACL 2024

Halu-NLP at SemEval-2024 Task 6: MetaCheckGPT - A Multi-task Hallucination Detection using LLM uncertainty and meta-models NAACL 2024

HalluSafe at SemEval-2024 Task 6: An NLI-based Approach to Make LLMs Safer by Better Detecting Hallucinations and Overgeneration Mistakes NAACL 2024

CURATRON: Complete and Robust Preference Data for Rigorous Alignment of Large Language Models NAACL 2024

Addressing Healthcare-related Racial and LGBTQ+ Biases in Pretrained Language Models NAACL 2024

Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models in Court Decisions NAACL 2024

Ethos: Rectifying Language Models in Orthogonal Parameter Space NAACL 2024

MICo: Preventative Detoxification of Large Language Models through Inhibition Control NAACL 2024