conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Safety
414 papers
Papers per year
2016: 1
1
2017: 1
1
2018: 4
4
2019: 8
8
2020: 11
11
2021: 21
21
2022: 29
29
2023: 36
36
2024: 87
87
2025: 117
117
2026: 99
99
Papers
Enhancing Robustness in Incremental Learning with Adversarial Training
AAAI 2025
PROSAC: Provably Safe Certification for Machine Learning Models under Adversarial Attacks
AAAI 2025
Rethinking Byzantine Robustness in Federated Recommendation from Sparse Aggregation Perspective
AAAI 2025
Shield Synthesis for LTL Modulo Theories
AAAI 2025
Leveraging Constraint Violation Signals for Action Constrained Reinforcement Learning
AAAI 2025
Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning
AAAI 2025
Probabilistic Shielding for Safe Reinforcement Learning
AAAI 2025
Offline Safe Reinforcement Learning Using Trajectory Classification
AAAI 2025
COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems Against Semantic Attacks
AAAI 2025
Certification of Speaker Recognition Models to Additive Perturbations
AAAI 2025
Investigating the Security Threat Arising from “Yes-No” Implicit Bias in Large Language Models
AAAI 2025
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
AAAI 2025
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
AAAI 2025
NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning
AAAI 2025
Quantitative Predictive Monitoring and Control for Safe Human-Machine Interaction
AAAI 2025
SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models
AAAI 2025
Scaling Trends for Data Poisoning in LLMs
AAAI 2025
Verification of Neural Networks Against Convolutional Perturbations via Parameterised Kernels
AAAI 2025
LEGEND: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
AAAI 2025
SMLE: Safe Machine Learning via Embedded Overapproximation
AAAI 2025
Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems
AAAI 2025
SafetyPrompts: A Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
AAAI 2025
MMJ-Bench: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models
AAAI 2025
Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation
ACL 2025
Jailbreak Large Vision-Language Models Through Multi-Modal Linkage
ACL 2025
<
1
…
4
5
6
…
17
>