conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Safety
414 papers
Papers per year
2016: 1
1
2017: 1
1
2018: 4
4
2019: 8
8
2020: 11
11
2021: 21
21
2022: 29
29
2023: 36
36
2024: 87
87
2025: 117
117
2026: 99
99
Papers
GaLileo: General Linear Relaxation Framework for Tightening Robustness Certification of Transformers
AAAI 2024
A Huber Loss Minimization Approach to Byzantine Robust Federated Learning
AAAI 2024
Towards Trustworthy Deep Learning
AAAI 2024
Monitoring of Perception Systems: Deterministic, Probabilistic, and Learning-Based Fault Detection and Identification (Abstract Reprint)
AAAI 2024
Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees (Abstract Reprint)
AAAI 2024
Reward (Mis)design for Autonomous Driving (Abstract Reprint)
AAAI 2024
VeriCompress: A Tool to Streamline the Synthesis of Verified Robust Compressed Neural Networks from Scratch
AAAI 2024
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
ACL 2024
PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety
ACL 2024
Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
ACL 2024
On the Hallucination in Simultaneous Machine Translation
ACL 2024
Realistic Evaluation of Toxicity in Large Language Models
ACL 2024
UNIWIZ: A Unified Large Language Model Orchestrated Wizard for Safe Knowledge Grounded Conversations
ACL 2024
A Chinese Dataset for Evaluating the Safeguards in Large Language Models
ACL 2024
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models
ACL 2024
Subtle Signatures, Strong Shields: Advancing Robust and Imperceptible Watermarking in Large Language Models
ACL 2024
All Languages Matter: On the Multilingual Safety of LLMs
ACL 2024
SpeechGuard: Exploring the Adversarial Robustness of Multi-modal Large Language Models
ACL 2024
Making Harmful Behaviors Unlearnable for Large Language Models
ACL 2024
Evaluating Robustness of Generative Search Engine on Adversarial Factoid Questions
ACL 2024
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
ACL 2024
TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification
ACL 2024
Safe-Embed: Unveiling the Safety-Critical Knowledge of Sentence Encoders
ACL 2024
Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?
CVPR 2024
Backdoor Defense via Test-Time Detecting and Repairing
CVPR 2024
<
1
…
10
11
12
…
17
>