Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
WaterPool: A Language Model Watermark Mitigating Trade-Offs among Imperceptibility, Efficacy and Robustness
NAACL 2025
PROMPTEVALS: A Dataset of Assertions and Guardrails for Custom Production Large Language Model Pipelines
NAACL 2025
Diversity Helps Jailbreak Large Language Models
NAACL 2025
My LLM might Mimic AAE - But When Should It?
NAACL 2025
High-Dimension Human Value Representation in Large Language Models
NAACL 2025
Interactive Human-Centric Bias Mitigation
AAAI 2024
Debiasing Text Safety Classifiers through a Fairness-Aware Ensemble
EMNLP 2024
Identifying, Mitigating, and Anticipating Bias in Algorithmic Decisions
AAAI 2024
Revisiting the Classics: A Study on Identifying and Rectifying Gender Stereotypes in Rhymes and Poems
COLING 2024
Fair and Optimal Prediction via Post-Processing
AAAI 2024
Aequitas Flow: Streamlining Fair ML Experimentation
JMLR 2024
Fairness with Censorship: Bridging the Gap between Fairness Research and Real-World Deployment
AAAI 2024
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives
NAACL 2024
Finding ε and δ of Traditional Disclosure Control Systems
AAAI 2024
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset
NIPS 2024
CAVA: A Tool for Cultural Alignment Visualization & Analysis
EMNLP 2024
KorSmishing Explainer: A Korean-centric LLM-based Framework for Smishing Detection and Explanation Generation
EMNLP 2024
U-trustworthy Models. Reliability, Competence, and Confidence in Decision-Making
AAAI 2024
Identifying Reasons for Bias: An Argumentation-Based Approach
AAAI 2024
Safer-Instruct: Aligning Language Models with Automated Preference Data
NAACL 2024
GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation
EMNLP 2024
Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations
EMNLP 2024
Efficient Lifelong Model Evaluation in an Era of Rapid Progress
NIPS 2024
The Greatest Good Benchmark: Measuring LLMs’ Alignment with Utilitarian Moral Dilemmas
EMNLP 2024
Context-aware Watermark with Semantic Balanced Green-red Lists for Large Language Models
EMNLP 2024
<
1
…
37
38
39
…
80
>