Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

Balancing Social Impact, Opportunities, and Ethical Constraints of Using AI in the Documentation and Vitalization of Indigenous Languages IJCAI 2023

Pushing the Limits of Fairness in Algorithmic Decision-Making IJCAI 2023

Fairlearn: Assessing and Improving Fairness of AI Systems JMLR 2023

The Effects of AI Biases and Explanations on Human Decision Fairness: A Case Study of Bidding in Rental Housing Markets IJCAI 2023

Image Shortcut Squeezing: Countering Perturbative Availability Poisons with Compression ICML 2023

When the Majority is Wrong: Modeling Annotator Disagreement for Subjective Tasks EMNLP 2023

“Are Your Explanations Reliable?” Investigating the Stability of LIME in Explaining Text Classifiers by Marrying XAI and Adversarial Attack EMNLP 2023

A Fine-Grained Taxonomy of Replies to Hate Speech EMNLP 2023

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values EMNLP 2023

BiasX: “Thinking Slow” in Toxic Content Moderation with Explanations of Implied Social Biases EMNLP 2023

Toxicity in Multilingual Machine Translation at Scale EMNLP 2023

Geographical Erasure in Language Generation EMNLP 2023

Rehabilitating Homeless: Dataset and Key Insights AAAI 2023

Leveraging Domain Knowledge for Inclusive and Bias-aware Humanitarian Response Entry Classification IJCAI 2023

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset NIPS 2023

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models NIPS 2023

Attribution-based Explanations that Provide Recourse Cannot be Robust JMLR 2023

Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations and Beyond JMLR 2023

Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study EACL 2023

Understanding Ethics in NLP Authoring and Reviewing EACL 2023

Building Stereotype Repositories with Complementary Approaches for Scale and Depth EACL 2023

Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity EACL 2023

Measuring Gender Bias in West Slavic Language Models EACL 2023

Combining Psychological Theory with Language Models for Suicide Risk Detection EACL 2023

Performance and Risk Trade-offs for Multi-word Text Prediction at Scale EACL 2023