Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
I Know You Did Not Write That! A Sampling Based Watermarking Method for Identifying Machine Generated Text
COLING 2025
SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model
ACL 2025
Developing a Reliable, Fast, General-Purpose Hallucination Detection and Mitigation Service
NAACL 2025
Identifying and Mitigating Social Bias Knowledge in Language Models
NAACL 2025
MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics
NAACL 2025
DAMAGE: Detecting Adversarially Modified AI Generated Text
COLING 2025
Granite Guardian: Comprehensive LLM Safeguarding
NAACL 2025
Conformity in Large Language Models
ACL 2025
Chat Bankman-Fried: an Exploration of LLM Alignment in Finance
COLING 2025
Exploring Straightforward Methods for Automatic Conversational Red-Teaming
NAACL 2025
Evaluating Bias in LLMs for Job-Resume Matching: Gender, Race, and Education
NAACL 2025
Profiling LLM’s Copyright Infringement Risks under Adversarial Persuasive Prompting
EMNLP 2025
LionGuard: A Contextualized Moderation Classifier to Tackle Localized Unsafe Content
COLING 2025
Annotation-Efficient Language Model Alignment via Diverse and Representative Response Texts
EMNLP 2025
Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis
ACL 2025
A Recipe For Building a Compliant Real Estate Chatbot
COLING 2025
Multilingual Blending: Large Language Model Safety Alignment Evaluation with Language Mixture
NAACL 2025
PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian
NAACL 2025
LLM Safety for Children
NAACL 2025
Intrinsic Model Weaknesses: How Priming Attacks Unveil Vulnerabilities in Large Language Models
NAACL 2025
DAMAGeR: Deploying Automatic and Manual Approaches to GenAI Red-teaming
NAACL 2025
A Practical Analysis of Human Alignment with *PO
NAACL 2025
CultureInstruct: Curating Multi-Cultural Instructions at Scale
NAACL 2025
STAND-Guard: A Small Task-Adaptive Content Moderation Model
COLING 2025
Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment
NAACL 2025
<
1
…
21
22
23
…
80
>