Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
Integrity Shield A System for Ethical AI Use & Authorship Transparency in Assessments
EACL 2026
When Prompt Optimization Becomes Jailbreaking: Adaptive Red-Teaming of Large Language Models
EACL 2026
Medical Summarization in Practice: Design, Deployment, and Analysis of a Clinical Summarization System for a German Hospital
EACL 2026
VortexPIA: Indirect Prompt Injection Attack against LLMs for Efficient Extraction of User Privacy
EACL 2026
Shifting Perspectives: Steering Vectors for Robust Bias Mitigation in LLMs
EACL 2026
Detection of Adversarial Prompts with Model Predictive Entropy
EACL 2026
Do Large Language Models Reflect Demographic Pluralism in Safety?
EACL 2026
ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations
EACL 2026
BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage
EACL 2026
Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval
EACL 2026
Jailbreaking Safeguarded Text-to-Image Models via Large Language Models
EACL 2026
When Do Language Models Endorse Limitations on Human Rights Principles?
EACL 2026
Code-Switching as a Safety Failure Mode in Large Language Models: An Empirical Study of Roman Urdu across English, Mixed, and Transliteration-Only Inputs
EACL 2026
Position: Biomedical NLP Demands Specialization, Not Generalization
EACL 2026
Antisocial Behavior Prediction: A Survey and Practical Guide
EACL 2026
A Simple and Efficient Learning-Style Prompting for LLM Jailbreaking
EACL 2026
SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning
EACL 2026
Conformal Feedback Alignment: Quantifying Answer-Level Reliability for Robust LLM Alignment
EACL 2026
Modulation-Based Backdoors: Leveraging Amplitude and Frequency Patterns to Attack Speaker Recognition
AAAI 2026
Poisoned Distillation: Injecting Backdoors into Distilled Datasets Without Raw Data Access
AAAI 2026
DIFT: Protecting Contrastive Learning Against Data Poisoning Backdoor Attacks
AAAI 2026
Clean-Label Physical Backdoor Attacks with Data Distillation
AAAI 2026
Diversifying Counterattacks: Orthogonal Exploration for Robust CLlP Inference
AAAI 2026
MTAttack: Multi-Target Backdoor Attacks Against Large Vision-Language Models
AAAI 2026
Mental Model-based Generation of Lies for Insider Threat Modeling
AAAI 2026
<
1
…
11
12
13
…
119
>