Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
Dual-View Inference Attack: Machine Unlearning Amplifies Privacy Exposure
AAAI 2026
Good Gradients Poison Your Model: Evading Defenses in Federated Learning via Boundary-adaptive Perturbation
AAAI 2026
On Robustness of Linear Classifiers to Targeted Data Poisoning
AAAI 2026
Dormant Backdoor: Weaponizing Model Finetuning for Feasible Backdoor Attacks Against Pretrained Models
AAAI 2026
Towards Effective, Stealthy, and Persistent Backdoor Attacks Targeting Graph Foundation Models
AAAI 2026
On Stealing Graph Neural Network Models
AAAI 2026
Breaking the Stealth-Potency Trade-off in Clean-Image Backdoors with Generative Trigger Optimization
AAAI 2026
Yours or Mine? Overwriting Attacks Against Neural Audio Watermarking
AAAI 2026
Vulnerability-Aware Robust Multimodal Adversarial Training
AAAI 2026
DUP: Detection-guided Unlearning for Backdoor Purification in Language Models
AAAI 2026
FILTER: A Framework for Defending Against Backdoor Attacks in Vertical Federated Learning
AAAI 2026
IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
AAAI 2026
Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code
COLING 2025
Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation
ACL 2025
Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings
COLING 2025
Understanding the Dark Side of LLMs’ Intrinsic Self-Correction
ACL 2025
SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities
ACL 2025
Towards Truly Open, Language-Specific, Safe, Factual, and Specialized Large Language Models
COLING 2025
AutoCVSS: Assessing the Performance of LLMs for Automated Software Vulnerability Scoring
EMNLP 2025
Unmasking Style Sensitivity: A Causal Analysis of Bias Evaluation Instability in Large Language Models
ACL 2025
Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack
ACL 2025
ROBOTO2: An Interactive System and Dataset for LLM-assisted Clinical Trial Risk of Bias Assessment
EMNLP 2025
Can GPTZero’s AI Vocabulary Distinguish Between LLM-Generated and Student-Written Essays?
ACL 2025
Caution for the Environment: Multimodal LLM Agents are Susceptible to Environmental Distractions
ACL 2025
Tongue-Tied: Breaking LLMs Safety Through New Language Learning
NAACL 2025
<
1
…
12
13
14
…
119
>