Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions
EMNLP 2024
ORES: Open-Vocabulary Responsible Visual Synthesis
AAAI 2024
Image Copy Detection for Diffusion Models
NIPS 2024
Enhancing Healthcare Predictions with Deep Learning Models
AAAI 2024
A Survey on Detection of LLMs-Generated Content
EMNLP 2024
Gender Identity in Pretrained Language Models: An Inclusive Approach to Data Creation and Probing
EMNLP 2024
Towards Robust Evaluation of Unlearning in LLMs via Data Transformations
EMNLP 2024
A Sequentially Fair Mechanism for Multiple Sensitive Attributes
AAAI 2024
Learning to Unlearn: Instance-Wise Unlearning for Pre-trained Classifiers
AAAI 2024
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models
NIPS 2024
Evaluating Moral Beliefs across LLMs through a Pluralistic Framework
EMNLP 2024
Gender Bias in Decision-Making with Large Language Models: A Study of Relationship Conflicts
EMNLP 2024
Evaluating Biases in Context-Dependent Sexual and Reproductive Health Questions
EMNLP 2024
LLM Evaluators Recognize and Favor Their Own Generations
NIPS 2024
JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models
EMNLP 2024
Fostering Trustworthiness in Machine Learning Algorithms
AAAI 2024
Irrelevant Alternatives Bias Large Language Model Hiring Decisions
EMNLP 2024
Fine-tuning Language Models for AI vs Human Generated Text detection
NAACL 2024
Halu-NLP at SemEval-2024 Task 6: MetaCheckGPT - A Multi-task Hallucination Detection using LLM uncertainty and meta-models
NAACL 2024
HalluSafe at SemEval-2024 Task 6: An NLI-based Approach to Make LLMs Safer by Better Detecting Hallucinations and Overgeneration Mistakes
NAACL 2024
CURATRON: Complete and Robust Preference Data for Rigorous Alignment of Large Language Models
NAACL 2024
Addressing Healthcare-related Racial and LGBTQ+ Biases in Pretrained Language Models
NAACL 2024
Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models in Court Decisions
NAACL 2024
Ethos: Rectifying Language Models in Orthogonal Parameter Space
NAACL 2024
MICo: Preventative Detoxification of Large Language Models through Inhibition Control
NAACL 2024
<
1
…
38
39
40
…
80
>