Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
Anthropomorphization of AI: Opportunities and Risks
EMNLP 2023
Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition
CVPR 2023
Towards Building Self-Aware Object Detectors via Reliable Uncertainty Quantification and Calibration
CVPR 2023
Privacy-Preserving Representations Are Not Enough: Recovering Scene Content From Camera Poses
CVPR 2023
Model Barrier: A Compact Un-Transferable Isolation Domain for Model Intellectual Property Protection
CVPR 2023
Chameleon: Adapting to Peer Images for Planting Durable Backdoors in Federated Learning
ICML 2023
Automatically Auditing Large Language Models via Discrete Optimization
ICML 2023
Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems
ICML 2023
Eliminating Adversarial Noise via Information Discard and Robust Representation Restoration
ICML 2023
Phase-aware Adversarial Defense for Improving Adversarial Robustness
ICML 2023
TAILCHECK: A Lightweight Heap Overflow Detection Mechanism with Page Protection and Tagged Pointers
OSDI 2023
Safety Verification and Universal Invariants for Relational Action Bases
IJCAI 2023
Corrupting Neuron Explanations of Deep Visual Features
ICCV 2023
Multi-Metrics Adaptively Identifies Backdoors in Federated Learning
ICCV 2023
CDTA: A Cross-Domain Transfer-Based Attack with Contrastive Learning
AAAI 2023
DeFL: Defending against Model Poisoning Attacks in Federated Learning via Critical Learning Periods Awareness
AAAI 2023
Distinguishing Fact from Fiction: A Benchmark Dataset for Identifying Machine-Generated Scientific Papers in the LLM Era.
ACL 2023
Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks
EMNLP 2023
On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research
EMNLP 2023
Preserving Privacy Through Dememorization: An Unlearning Technique For Mitigating Memorization Risks In Language Models
EMNLP 2023
NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails
EMNLP 2023
A Diachronic Perspective on User Trust in AI under Uncertainty
EMNLP 2023
Do Stochastic Parrots have Feelings Too? Improving Neural Detection of Synthetic Text via Emotion Recognition
EMNLP 2023
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
EMNLP 2023
Thorny Roses: Investigating the Dual Use Dilemma in Natural Language Processing
EMNLP 2023
<
1
…
86
87
88
…
119
>