Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
Can Language Model Moderators Improve the Health of Online Discourse?
NAACL 2024
MisgenderMender: A Community-Informed Approach to Interventions for Misgendering
NAACL 2024
Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation
NAACL 2024
Investigating Data Contamination in Modern Benchmarks for Large Language Models
NAACL 2024
Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Value
NAACL 2024
Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information
NAACL 2024
Composite Backdoor Attacks Against Large Language Models
NAACL 2024
Adapting Fake News Detection to the Era of Large Language Models
NAACL 2024
TagDebias: Entity and Concept Tagging for Social Bias Mitigation in Pretrained Language Models
NAACL 2024
Discovering and Mitigating Indirect Bias in Attention-Based Model Explanations
NAACL 2024
MICo: Preventative Detoxification of Large Language Models through Inhibition Control
NAACL 2024
Ethos: Rectifying Language Models in Orthogonal Parameter Space
NAACL 2024
Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models in Court Decisions
NAACL 2024
Addressing Healthcare-related Racial and LGBTQ+ Biases in Pretrained Language Models
NAACL 2024
CURATRON: Complete and Robust Preference Data for Rigorous Alignment of Large Language Models
NAACL 2024
HalluSafe at SemEval-2024 Task 6: An NLI-based Approach to Make LLMs Safer by Better Detecting Hallucinations and Overgeneration Mistakes
NAACL 2024
Halu-NLP at SemEval-2024 Task 6: MetaCheckGPT - A Multi-task Hallucination Detection using LLM uncertainty and meta-models
NAACL 2024
Fine-tuning Language Models for AI vs Human Generated Text detection
NAACL 2024
Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
NAACL 2024
On the Effectiveness of Adversarial Robustness for Abuse Mitigation with Counterspeech
NAACL 2024
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives
NAACL 2024
Safer-Instruct: Aligning Language Models with Automated Preference Data
NAACL 2024
“One-Size-Fits-All”? Examining Expectations around What Constitute “Fair” or “Good” NLG System Behaviors
NAACL 2024
Enhancing Controlled Query Evaluation through Epistemic Policies
IJCAI 2024
Data Ownership and Privacy in Personalized AI Models in Assistive Healthcare
IJCAI 2024
<
1
…
47
48
49
…
80
>