conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Safety
414 papers
Papers per year
2016: 1
1
2017: 1
1
2018: 4
4
2019: 8
8
2020: 11
11
2021: 21
21
2022: 29
29
2023: 36
36
2024: 87
87
2025: 117
117
2026: 99
99
Papers
ColJailBreak: Collaborative Generation and Editing for Jailbreaking Text-to-Image Deep Generation
NIPS 2024
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models
NIPS 2024
Fight Back Against Jailbreaking via Prompt Adversarial Tuning
NIPS 2024
Safe LoRA: The Silver Lining of Reducing Safety Risks when Finetuning Large Language Models
NIPS 2024
GuardT2I: Defending Text-to-Image Models from Adversarial Prompts
NIPS 2024
Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense
NIPS 2024
OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement Learning
NIPS 2024
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
NIPS 2024
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
NIPS 2024
INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness
NIPS 2024
MoGU: A Framework for Enhancing Safety of LLMs While Preserving Their Usability
NIPS 2024
NN4SysBench: Characterizing Neural Network Verification for Computer Systems
NIPS 2024
What Makes and Breaks Safety Fine-tuning? A Mechanistic Study
NIPS 2024
Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models
NIPS 2024
Image Safeguarding: Reasoning with Conditional Vision Language Model and Obfuscating Unsafe Content Counterfactually
AAAI 2024
Stable Unlearnable Example: Enhancing the Robustness of Unlearnable Examples via Stable Error-Minimizing Noise
AAAI 2024
Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy
AAAI 2024
Constrained Meta-Reinforcement Learning for Adaptable Safety Guarantee with Differentiable Convex Programming
AAAI 2024
Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation
AAAI 2024
DeepBern-Nets: Taming the Complexity of Certifying Neural Networks Using Bernstein Polynomial Activations and Precise Bound Propagation
AAAI 2024
Accelerating Adversarially Robust Model Selection for Deep Neural Networks via Racing
AAAI 2024
Reward Certification for Policy Smoothed Reinforcement Learning
AAAI 2024
Pure-Past Action Masking
AAAI 2024
Long-Term Safe Reinforcement Learning with Binary Feedback
AAAI 2024
Safe Reinforcement Learning with Instantaneous Constraints: The Role of Aggressive Exploration
AAAI 2024
<
1
…
9
10
11
…
17
>