Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
Defending LLMs against Jailbreaking Attacks via Backtranslation
ACL 2024
Chain-of-Verification Reduces Hallucination in Large Language Models
ACL 2024
Red Teaming Visual Language Models
ACL 2024
TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models
ACL 2024
Pseudo-Private Data Guided Model Inversion Attacks
NIPS 2024
CROWD: Certified Robustness via Weight Distribution for Smoothed Classifiers against Backdoor Attack
EMNLP 2024
BAN: Detecting Backdoors Activated by Adversarial Neuron Noise
NIPS 2024
Provable Editing of Deep Neural Networks using Parametric Linear Relaxation
NIPS 2024
Supporting Upper Elementary Students in Learning AI Concepts with Story-Driven Game-Based Learning
AAAI 2024
Occlusion Sensitivity Analysis With Augmentation Subspace Perturbation in Deep Feature Space
WACV 2024
Hard-Label Based Small Query Black-Box Adversarial Attack
WACV 2024
Dynamic Adversarial Attacks on Autonomous Driving Systems
RSS 2024
Adaptive Randomized Smoothing: Certified Adversarial Robustness for Multi-Step Defences
NIPS 2024
Treatment of Statistical Estimation Problems in Randomized Smoothing for Adversarial Robustness
NIPS 2024
RACER: Epistemic Risk-Sensitive RL Enables Fast Driving with Fewer Crashes
RSS 2024
Fooling Polarization-Based Vision using Locally Controllable Polarizing Projection
CVPR 2024
MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models
CVPR 2024
Scanning Trojaned Models Using Out-of-Distribution Samples
NIPS 2024
Defense without Forgetting: Continual Adversarial Defense with Anisotropic & Isotropic Pseudo Replay
CVPR 2024
LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning
CVPR 2024
1-Lipschitz Layers Compared: Memory Speed and Certifiable Robustness
CVPR 2024
Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models
CVPR 2024
VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models
CVPR 2024
MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model
CVPR 2024
BrainWash: A Poisoning Attack to Forget in Continual Learning
CVPR 2024
<
1
…
74
75
76
…
119
>