Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
EACL 2023
Adding Instructions during Pretraining: Effective way of Controlling Toxicity in Language Models
EACL 2023
Logic Against Bias: Textual Entailment Mitigates Stereotypical Sentence Reasoning
EACL 2023
A Learning and Control Perspective for Microfinance
L4DC 2023
Med-HALT: Medical Domain Hallucination Test for Large Language Models
CONLL 2023
Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
CVPR 2023
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models
CVPR 2023
PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation
CVPR 2023
Emptying the Ocean with a Spoon: Should We Edit Models?
EMNLP 2023
Ethical Reasoning over Moral Alignment: A Case and Framework for In-Context Ethical Policies in LLMs
EMNLP 2023
InstructSafety: A Unified Framework for Building Multidimensional and Explainable Safety Detector through Instruction Tuning
EMNLP 2023
Uncovering the Root of Hate Speech: A Dataset for Identifying Hate Instigating Speech
EMNLP 2023
Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews
EMNLP 2023
Towards Conceptualization of “Fair Explanation”: Disparate Impacts of anti-Asian Hate Speech Explanations on Content Moderators
EMNLP 2023
Gender Biases in Automatic Evaluation Metrics for Image Captioning
EMNLP 2023
Language and Mental Health: Measures of Emotion Dynamics from Text as Linguistic Biosocial Markers
EMNLP 2023
Stereotypes and Smut: The (Mis)representation of Non-cisgender Identities by Text-to-Image Models
ACL 2023
A Multi-dimensional study on Bias in Vision-Language models
ACL 2023
Analyzing Bias in Large Language Model Solutions for Assisted Writing Feedback Tools: Lessons from the Feedback Prize Competition Series
ACL 2023
Everything you need to know about Multilingual LLMs: Towards fair, performant and reliable models for languages of the world
ACL 2023
Harmful Language Datasets: An Assessment of Robustness
ACL 2023
Distinguishing Fact from Fiction: A Benchmark Dataset for Identifying Machine-Generated Scientific Papers in the LLM Era.
ACL 2023
Non-Repeatable Experiments and Non-Reproducible Results: The Reproducibility Crisis in Human Evaluation in NLP
ACL 2023
This prompt is measuring <mask>: evaluating bias evaluation in language models
ACL 2023
Improving Gender Fairness of Pre-Trained Language Models without Catastrophic Forgetting
ACL 2023
<
1
…
62
63
64
…
80
>