conftrace_

Artificial Intelligence › Core AI ›

Safety

414 papers

Papers per year

1

1

4

8

11

21

29

36

87

117

99

Papers

GaLileo: General Linear Relaxation Framework for Tightening Robustness Certification of Transformers AAAI 2024

A Huber Loss Minimization Approach to Byzantine Robust Federated Learning AAAI 2024

Towards Trustworthy Deep Learning AAAI 2024

Monitoring of Perception Systems: Deterministic, Probabilistic, and Learning-Based Fault Detection and Identification (Abstract Reprint) AAAI 2024

Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees (Abstract Reprint) AAAI 2024

Reward (Mis)design for Autonomous Driving (Abstract Reprint) AAAI 2024

VeriCompress: A Tool to Streamline the Synthesis of Verified Robust Compressed Neural Networks from Scratch AAAI 2024

ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs ACL 2024

PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety ACL 2024

Emulated Disalignment: Safety Alignment for Large Language Models May Backfire! ACL 2024

On the Hallucination in Simultaneous Machine Translation ACL 2024

Realistic Evaluation of Toxicity in Large Language Models ACL 2024

UNIWIZ: A Unified Large Language Model Orchestrated Wizard for Safe Knowledge Grounded Conversations ACL 2024

A Chinese Dataset for Evaluating the Safeguards in Large Language Models ACL 2024

SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models ACL 2024

Subtle Signatures, Strong Shields: Advancing Robust and Imperceptible Watermarking in Large Language Models ACL 2024

All Languages Matter: On the Multilingual Safety of LLMs ACL 2024

SpeechGuard: Exploring the Adversarial Robustness of Multi-modal Large Language Models ACL 2024

Making Harmful Behaviors Unlearnable for Large Language Models ACL 2024

Evaluating Robustness of Generative Search Engine on Adversarial Factoid Questions ACL 2024

CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion ACL 2024

TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification ACL 2024

Safe-Embed: Unveiling the Safety-Critical Knowledge of Sentence Encoders ACL 2024

Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion? CVPR 2024

Backdoor Defense via Test-Time Detecting and Repairing CVPR 2024