Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
Influence-Based Fair Selection for Sample-Discriminative Backdoor Attack
AAAI 2025
Neurosymbolic Reinforcement Learning: Playing MiniHack with Probabilistic Logic Shields
AAAI 2025
Kaleidoscopic Background Attack: Disrupting Pose Estimation with Multi-Fold Radial Symmetry Textures
ICCV 2025
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
ICCV 2025
Backdoor Defense via Enhanced Splitting and Trap Isolation
ICCV 2025
Backdoor Attacks on Neural Networks via One-Bit Flip
ICCV 2025
Enhancing Transferability of Targeted Adversarial Examples via Inverse Target Gradient Competition and Spatial Distance Stretching
ICCV 2025
Find a Scapegoat: Poisoning Membership Inference Attack and Defense to Federated Learning
ICCV 2025
SAFER: Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers
ICCV 2025
SPD: Shallow Backdoor Protecting Deep Backdoor Against Backdoor Detection
ICCV 2025
SAM Encoder Breach by Adversarial Simplicial Complex Triggers Downstream Model Failures
ICCV 2025
Adversarial Robustness of Discriminative Self-Supervised Learning in Vision
ICCV 2025
Pretend Benign: A Stealthy Adversarial Attack by Exploiting Vulnerabilities in Cooperative Perception
ICCV 2025
Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models
ICCV 2025
AILS-NTUA at SemEval-2025 Task 4: Parameter-Efficient Unlearning for Large Language Models using Data Chunking
ACL 2025
SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes
ACL 2025
Hermit Kingdom Through the Lens of Multiple Perspectives: A Case Study of LLM Hallucination on North Korea
COLING 2025
SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models
ACL 2025
M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs
ACL 2025
PROTECT: Policy-Related Organizational Value Taxonomy for Ethical Compliance and Trust
ACL 2025
Language Models Resist Alignment: Evidence From Data Compression
ACL 2025
Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet
ACL 2025
CRAFT: Class Ranking Aware Fine-Tuning for Enhanced Out-of-Distribution Detection
WACV 2025
QGuard:Question-based Zero-shot Guard for Multi-modal LLM Safety
ACL 2025
Combining Domain and Alignment Vectors Provides Better Knowledge-Safety Trade-offs in LLMs
ACL 2025
<
1
…
33
34
35
…
119
>