Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
Exploiting the Shadows: Unveiling Privacy Leaks through Lower-Ranked Tokens in Large Language Models
ACL 2025
Adversarial Robust Memory-Based Continual Learner
ICCV 2025
Gradient-Reweighted Adversarial Camouflage for Physical Object Detection Evasion
ICCV 2025
VLSBench: Unveiling Visual Leakage in Multimodal Safety
ACL 2025
Are Stereotypes Leading LLMs’ Zero-Shot Stance Detection ?
EMNLP 2025
Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks
ICCV 2025
Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment
ICCV 2025
AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection
ACL 2025
Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
ICCV 2025
Backdoor Mitigation by Distance-Driven Detoxification
ICCV 2025
LLM as a Broken Telephone: Iterative Generation Distorts Information
ACL 2025
LLMScan: Causal Scan for LLM Misbehavior Detection
ICML 2025
NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning
AAAI 2025
LlmFixer: Fix the Helpfulness of Defensive Large Language Models
EMNLP 2025
The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas
EMNLP 2025
Certified Mitigation of Worst-Case LLM Copyright Infringement
EMNLP 2025
The Confidence Paradox: Can LLM Know When It’s Wrong?
IJCNLP 2025
DISTIL: Data-Free Inversion of Suspicious Trojan Inputs via Latent Diffusion
ICCV 2025
Biased LLMs can Influence Political Decision-Making
ACL 2025
Profiling LLM’s Copyright Infringement Risks under Adversarial Persuasive Prompting
EMNLP 2025
SimVBG: Simulating Individual Values by Backstory Generation
EMNLP 2025
Learning to Rewrite: Generalized LLM-Generated Text Detection
ACL 2025
Neutral Is Not Unbiased: Evaluating Implicit and Intersectional Identity Bias in LLMs Through Structured Narrative Scenarios
EMNLP 2025
Self-adaptive Dataset Construction for Real-World Multimodal Safety Scenarios
EMNLP 2025
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
NAACL 2025
<
1
…
22
23
24
…
119
>