Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
SafePersuasion: A Dataset, Taxonomy, and Baselines for Analysis of Rational Persuasion and Manipulation
IJCNLP 2025
Native Design Bias: Studying the Impact of English Nativeness on Language Model Performance
IJCNLP 2025
UnsafeChain: Enhancing Reasoning Model Safety via Hard Cases
IJCNLP 2025
To Generate or Discriminate? Methodological Considerations for Measuring Cultural Alignment in LLMs
IJCNLP 2025
GeoSAFE - A Novel Geospatial Artificial Intelligence Safety Assurance Framework and Evaluation for LLM Moderation
IJCNLP 2025
Mātṛkā: Multilingual Jailbreak Evaluation of Open-Source Large Language Models
IJCNLP 2025
Unmasking Implicit Bias: Evaluating Persona-Prompted LLM Responses in Power-Disparate Social Scenarios
NAACL 2025
Intersectional Bias in Japanese Large Language Models from a Contextualized Perspective
ACL 2025
Certified Mitigation of Worst-Case LLM Copyright Infringement
EMNLP 2025
An Ethical Dataset from Real-World Interactions Between Users and Large Language Models
IJCAI 2025
The Threat of PROMPTS in Large Language Models: A System and User Prompt Perspective
ACL 2025
BanHateME : Understanding Hate in Bangla Memes thorough Detection, Categorization, and Target Profiling
IJCNLP 2025
EMBRACE: Shaping Inclusive Opinion Representation by Aligning Implicit Conversations with Social Norms
IJCNLP 2025
Mind the Blind Spots: A Focus-Level Evaluation Framework for LLM Reviews
EMNLP 2025
Instantly Learning Preference Alignment via In-context DPO
NAACL 2025
Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities
NAACL 2025
SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains
EMNLP 2025
What is Behind Homelessness Bias? Using LLMs and NLP to Mitigate Homelessness by Acting on Social Stigma
IJCAI 2025
DeTAM: Defending LLMs Against Jailbreak Attacks via Targeted Attention Modification
ACL 2025
Wanted: Personalised Bias Warnings for Gender Bias in Language Models
ACL 2025
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
EMNLP 2025
Beemo: Benchmark of Expert-edited Machine-generated Outputs
NAACL 2025
Decoupling Memories, Muting Neurons: Towards Practical Machine Unlearning for Large Language Models
ACL 2025
Truth, Trust, and Trouble: Medical AI on the Edge
EMNLP 2025
Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings
ACL 2025
<
1
…
32
33
34
…
80
>