Artificial Intelligence › Core AI ›

AI Safety

2972 directly classified papers

Papers per year

Papers

Influence-Based Fair Selection for Sample-Discriminative Backdoor Attack AAAI 2025

Neurosymbolic Reinforcement Learning: Playing MiniHack with Probabilistic Logic Shields AAAI 2025

Kaleidoscopic Background Attack: Disrupting Pose Estimation with Multi-Fold Radial Symmetry Textures ICCV 2025

IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves ICCV 2025

Backdoor Defense via Enhanced Splitting and Trap Isolation ICCV 2025

Backdoor Attacks on Neural Networks via One-Bit Flip ICCV 2025

Enhancing Transferability of Targeted Adversarial Examples via Inverse Target Gradient Competition and Spatial Distance Stretching ICCV 2025

Find a Scapegoat: Poisoning Membership Inference Attack and Defense to Federated Learning ICCV 2025

SAFER: Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers ICCV 2025

SPD: Shallow Backdoor Protecting Deep Backdoor Against Backdoor Detection ICCV 2025

SAM Encoder Breach by Adversarial Simplicial Complex Triggers Downstream Model Failures ICCV 2025

Adversarial Robustness of Discriminative Self-Supervised Learning in Vision ICCV 2025

Pretend Benign: A Stealthy Adversarial Attack by Exploiting Vulnerabilities in Cooperative Perception ICCV 2025

Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models ICCV 2025

AILS-NTUA at SemEval-2025 Task 4: Parameter-Efficient Unlearning for Large Language Models using Data Chunking ACL 2025

SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes ACL 2025

Hermit Kingdom Through the Lens of Multiple Perspectives: A Case Study of LLM Hallucination on North Korea COLING 2025

SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models ACL 2025

M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs ACL 2025

PROTECT: Policy-Related Organizational Value Taxonomy for Ethical Compliance and Trust ACL 2025

Language Models Resist Alignment: Evidence From Data Compression ACL 2025

Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet ACL 2025

CRAFT: Class Ranking Aware Fine-Tuning for Enhanced Out-of-Distribution Detection WACV 2025

QGuard:Question-based Zero-shot Guard for Multi-modal LLM Safety ACL 2025

Combining Domain and Alignment Vectors Provides Better Knowledge-Safety Trade-offs in LLMs ACL 2025