Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment
NAACL 2025
On Weaponization-Resistant Large Language Models with Prospect Theoretic Alignment
COLING 2025
KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge Graph-based False Premise Questions
COLING 2025
PROMPTEVALS: A Dataset of Assertions and Guardrails for Custom Production Large Language Model Pipelines
NAACL 2025
SafeLawBench: Towards Safe Alignment of Large Language Models
ACL 2025
Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings
ACL 2025
Loki: An Open-Source Tool for Fact Verification
COLING 2025
Unfamiliar Finetuning Examples Control How Language Models Hallucinate
NAACL 2025
Self-Pluralising Culture Alignment for Large Language Models
NAACL 2025
Iterative Multilingual Spectral Attribute Erasure
EMNLP 2025
Moral Compass: A Data-Driven Benchmark for Ethical Cognition in AI
IJCAI 2025
Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking
EMNLP 2025
Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design
ACL 2025
A Comprehensive Framework to Operationalize Social Stereotypes for Responsible AI Evaluations
EMNLP 2025
TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination Via Latent Truthful-Guided Pre-Intervention
ICCV 2025
Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study
EMNLP 2025
LLMs Do Not See Age: Assessing Demographic Bias in Automated Systematic Review Synthesis
IJCNLP 2025
IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization
IJCNLP 2025
Hallucinations in Code Change to Natural Language Generation: Prevalence and Evaluation of Detection Metrics
IJCNLP 2025
Small Changes, Large Consequences: Analyzing the Allocational Fairness of LLMs in Hiring Contexts
IJCNLP 2025
The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1
IJCNLP 2025
Beyond Guardrails: Advanced Safety for Large Language Models — Monolingual, Multilingual and Multimodal Frontiers
IJCNLP 2025
MPF: Aligning and Debiasing Language Models post Deployment via Multi-Perspective Fusion
IJCNLP 2025
Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization
NAACL 2025
Extracting and Understanding the Superficial Knowledge in Alignment
NAACL 2025
<
1
…
31
32
33
…
80
>