Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
A Detailed Factor Analysis for the Political Compass Test: Navigating Ideologies of Large Language Models
AACL 2025
Reading Between the Prompts: How Stereotypes Shape LLM’s Implicit Personalization
EMNLP 2025
‘Rich Dad, Poor Lad’: How do Large Language Models Contextualize Socioeconomic Factors in College Admission ?
EMNLP 2025
Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge
AACL 2025
Blind Men and the Elephant: Diverse Perspectives on Gender Stereotypes in Benchmark Datasets
EMNLP 2025
TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models
ICCV 2025
DCR: Quantifying Data Contamination in LLMs Evaluation
EMNLP 2025
Modeling Motivated Reasoning in Law: Evaluating Strategic Role Conditioning in LLM Summarization
EMNLP 2025
Building Trust in Clinical LLMs: Bias Analysis and Dataset Transparency
EMNLP 2025
Disrupting Model Merging: A Parameter-Level Defense Without Sacrificing Accuracy
ICCV 2025
PLLuM-Align: Polish Preference Dataset for Large Language Model Alignment
EMNLP 2025
ZEBRA: Leveraging Model-Behavioral Knowledge for Zero-Annotation Preference Dataset Construction
EMNLP 2025
Scalable and Culturally Specific Stereotype Dataset Construction via Human-LLM Collaboration
EMNLP 2025
Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning
ICCV 2025
Profiler: Black-box AI-generated Text Origin Detection via Context-aware Inference Pattern Analysis
EMNLP 2025
Towards Truly Open, Language-Specific, Safe, Factual, and Specialized Large Language Models
COLING 2025
DIESEL: A Lightweight Inference-Time Safety Enhancement for Language Models
ACL 2025
Mātṛkā: Multilingual Jailbreak Evaluation of Open-Source Large Language Models
IJCNLP 2025
Are LLMs Rational Investors? A Study on the Financial Bias in LLMs
ACL 2025
Tokens for Learning, Tokens for Unlearning: Mitigating Membership Inference Attacks in Large Language Models via Dual-Purpose Training
ACL 2025
Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers
EMNLP 2025
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
ACL 2025
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance
EMNLP 2025
DAMAGeR: Deploying Automatic and Manual Approaches to GenAI Red-teaming
NAACL 2025
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
ACL 2025
<
1
…
30
31
32
…
80
>