Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
Detecting Argumentative Fallacies in the Wild: Problems and Limitations of Large Language Models
EMNLP 2023
RIATIG: Reliable and Imperceptible Adversarial Text-to-Image Generation With Natural Prompts
CVPR 2023
SlowLiDAR: Increasing the Latency of LiDAR-Based Detection Using Adversarial Examples
CVPR 2023
Architectural Backdoors in Neural Networks
CVPR 2023
Neural Polarizer: A Lightweight and Effective Backdoor Defense via Purifying Poisoned Features
NIPS 2023
CBD: A Certified Backdoor Detector Based on Local Dominant Probability
NIPS 2023
Robust Contrastive Language-Image Pretraining against Data Poisoning and Backdoor Attacks
NIPS 2023
Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings
NIPS 2023
Improving Adversarial Robustness via Information Bottleneck Distillation
NIPS 2023
Understanding and Mitigating Copying in Diffusion Models
NIPS 2023
Addressing Chest Radiograph Projection Bias in Deep Classification Models
MIDL 2023
Analysis and Detectability of Offline Data Poisoning Attacks on Linear Dynamical Systems
L4DC 2023
Distributionally Robust Lyapunov Function Search Under Uncertainty
L4DC 2023
Failing with Grace: Learning Neural Network Controllers that are Boundedly Unsafe
L4DC 2023
Probabilistic Safeguard for Reinforcement Learning Using Safety Index Guided Gaussian Process Models
L4DC 2023
Detection of Man-in-the-Middle Attacks in Model-Free Reinforcement Learning
L4DC 2023
CellDAM: User-Space, Rootless Detection and Mitigation for 5G Data Plane
NSDI 2023
Exploring Practical Vulnerabilities of Machine Learning-based Wireless Systems
NSDI 2023
Deception Game: Closing the Safety-Learning Loop in Interactive Robot Autonomy
CORL 2023
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
CORL 2023
Birds of an odd feather: guaranteed out-of-distribution (OOD) novel category detection
UAI 2023
Step by Step Loss Goes Very Far: Multi-Step Quantization for Adversarial Text Attacks
EACL 2023
When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization
EACL 2023
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
EACL 2023
Counter-GAP: Counterfactual Bias Evaluation through Gendered Ambiguous Pronouns
EACL 2023
<
1
…
87
88
89
…
119
>