Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
AraSafe: Benchmarking Safety in Arabic LLMs
EMNLP 2025
MisinfoBench: A Multi-Dimensional Benchmark for Evaluating LLMs’ Resilience to Misinformation
EMNLP 2025
ISACL: Internal State Analyzer for Copyrighted Training Data Leakage
EMNLP 2025
Where Fact Ends and Fairness Begins: Redefining AI Bias Evaluation through Cognitive Biases
EMNLP 2025
Acquiescence Bias in Large Language Models
EMNLP 2025
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
EMNLP 2025
PsyScam: A Benchmark for Psychological Techniques in Real-World Scams
EMNLP 2025
Harry Potter is Still Here! Probing Knowledge Leakage in Targeted Unlearned Large Language Models
EMNLP 2025
Social Bias Evaluation for Large Language Models Requires Prompt Variations
EMNLP 2025
FactReasoner: A Probabilistic Approach to Long-Form Factuality Assessment for Large Language Models
EMNLP 2025
Can You Trick the Grader? Adversarial Persuasion of LLM Judges
EMNLP 2025
Fine-tuning LLMs with Cross-Attention-based Weight Decay for Bias Mitigation
EMNLP 2025
Profiling LLM’s Copyright Infringement Risks under Adversarial Persuasive Prompting
EMNLP 2025
Benchmarking and Improving LLM Robustness for Personalized Generation
EMNLP 2025
Annotation-Efficient Language Model Alignment via Diverse and Representative Response Texts
EMNLP 2025
On Guardrail Models’ Robustness to Mutations and Adversarial Attacks
EMNLP 2025
Choosing a Model, Shaping a Future: Comparing LLM Perspectives on Sustainability and its Relationship with AI
EMNLP 2025
LLMs Reproduce Stereotypes of Sexual and Gender Minorities
EMNLP 2025
Beneath the Facade: Probing Safety Vulnerabilities in LLMs via Auto-Generated Jailbreak Prompts
EMNLP 2025
InFact: Informativeness Alignment for Improved LLM Factuality
EMNLP 2025
LlmFixer: Fix the Helpfulness of Defensive Large Language Models
EMNLP 2025
CTCC: A Robust and Stealthy Fingerprinting Framework for Large Language Models via Cross-Turn Contextual Correlation Backdoor
EMNLP 2025
SURE: Safety Understanding and Reasoning Enhancement for Multimodal Large Language Models
EMNLP 2025
On the Convergence of Moral Self-Correction in Large Language Models
AACL 2025
SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning
ACL 2025
<
1
…
26
27
28
…
80
>