Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
PL-Guard: Benchmarking Language Model Safety for Polish
ACL 2025
Power(ful) Associations: Rethinking “Stereotype” for NLP
ACL 2025
Are Bias Evaluation Methods Biased ?
ACL 2025
ELAB: Extensive LLM Alignment Benchmark in Persian Language
ACL 2025
Fine-Tuning Lowers Safety and Disrupts Evaluation Consistency
ACL 2025
Inherent and emergent liability issues in LLM-based agentic systems: a principal-agent perspective
ACL 2025
Analyzing the Evolution of Scientific Misconduct Based on the Language of Retracted Papers
ACL 2025
Hidden Persuasion: Detecting Manipulative Narratives on Social Media During the 2022 Russian Invasion of Ukraine
ACL 2025
Detecting Manipulation in Ukrainian Telegram: A Transformer-Based Approach to Technique Classification and Span Identification
ACL 2025
WETBench: A Benchmark for Detecting Task-Specific Machine-Generated Text on Wikipedia
ACL 2025
STAND-Guard: A Small Task-Adaptive Content Moderation Model
COLING 2025
iShumei-Chinchunmei at SemEval-2025 Task 4: A balanced forgetting and retention multi-task framework using effective unlearning loss
ACL 2025
SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models
ACL 2025
A Recipe For Building a Compliant Real Estate Chatbot
COLING 2025
Multilingual Blending: Large Language Model Safety Alignment Evaluation with Language Mixture
NAACL 2025
Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis
ACL 2025
UNCLE: Benchmarking Uncertainty Expressions in Long-Form Generation
EMNLP 2025
LionGuard: A Contextualized Moderation Classifier to Tackle Localized Unsafe Content
COLING 2025
System Prompt Hijacking via Permutation Triggers in LLM Supply Chains
ACL 2025
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval
ACL 2025
DAPI: Domain Adaptive Toxicity Probe Vector Intervention, for Fine-Grained Detoxification
ACL 2025
Chat Bankman-Fried: an Exploration of LLM Alignment in Finance
COLING 2025
Linguistic and Embedding-Based Profiling of Texts Generated by Humans and Large Language Models
EMNLP 2025
Conformity in Large Language Models
ACL 2025
Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral
ACL 2025
<
1
…
25
26
27
…
80
>