Artificial Intelligence › Core AI ›

Adversarial Learning

1235 directly classified papers

Papers per year

Papers

SAFENUDGE: Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs EMNLP 2025

COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems Against Semantic Attacks AAAI 2025

Nullspace Disentanglement for Red Teaming Language Models EMNLP 2025

Integrating Argumentation Features for Enhanced Propaganda Detection in Arabic Narratives on the Israeli War on Gaza COLING 2025

GRADA: Graph-based Reranking against Adversarial Documents Attack EMNLP 2025

FaceShield: Defending Facial Image against Deepfake Threats ICCV 2025

Enhancing LLM-Based Social Bot via an Adversarial Learning Framework EMNLP 2025

Vulnerability of Large Language Models to Output Prefix Jailbreaks: Impact of Positions on Safety NAACL 2025

NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks EMNLP 2025

Augmented Adversarial Trigger Learning NAACL 2025

Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic Encoding EMNLP 2025

Query-Based and Unnoticeable Graph Injection Attack from Neighborhood Perspective IJCAI 2025

Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness EMNLP 2025

TRNAS: A Training-Free Robust Neural Architecture Search ICCV 2025

TempParaphraser: “Heating Up” Text to Evade AI-Text Detection through Paraphrasing EMNLP 2025

Rethinking Backdoor Detection Evaluation for Language Models EMNLP 2025

Detection Defenses: An Empty Promise Against Adversarial Patch Attacks on Optical Flow WACV 2024

Natural Light Can Also Be Dangerous: Traffic Sign Misinterpretation Under Adversarial Natural Light Attacks WACV 2024

Context-aware Adversarial Attack on Named Entity Recognition EACL 2024

Linguistic Obfuscation Attacks and Large Language Model Uncertainty EACL 2024

Diffusion Models Meet Image Counter-Forensics WACV 2024

ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users NIPS 2024

Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes NIPS 2024

On the Adversarial Robustness of Benjamini Hochberg NIPS 2024

Transferable Adversarial Attacks on SAM and Its Downstream Models NIPS 2024