conftrace_

Artificial Intelligence › Core AI ›

Security

95 papers

Papers per year

1

2

1

4

4

83

Papers

False Friends in the Shell: Unveiling the Emoticon Semantic Confusion in Large Language Models ACL 2026

SoundBreak: A Systematic Study of Audio-Only Adversarial Attacks on Trimodal Models ACL 2026

You Can Have a Second Chance: Unbiased and Multi-bit Watermarking for Diffusion Language Models with Regret-based Remasking ACL 2026

VerilogLAVD: LLM-Aided Pattern Generation for Verilog CWE Detection ACL 2026

On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference ACL 2026

Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation ACL 2026

ReasoningGuard: Safeguarding Large Reasoning Models with Inference-time Safety Aha Moments ACL 2026

Frankentext: Stitching random text fragments into long-form narratives ACL 2026

Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward ACL 2026

A Multi-Agent Framework for High-Interaction Terminal Simulation ACL 2026

RedCoder: Automated Multi-Turn Red Teaming for Code LLMs ACL 2026

PIArena: A Platform for Prompt Injection Evaluation ACL 2026

Conjunctive Prompt Attacks in Multi-Agent LLM Systems ACL 2026

SSG: Logit-Balanced Vocabulary Partitioning for LLM Watermarking ACL 2026

From TDMA to CDMA: A Multi-bit Watermark for Diffusion Language Models ACL 2026

When Efficiency Becomes a Vulnerability: Computational Cost Attacks on WebAgents ACL 2026

CodeRipple: Wavelet-Based Detection of LLM-Generated Code ACL 2026

BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks ACL 2026

ASTRA: An Automated Framework for Strategy Discovery, Retrieval, and Evolution for Jailbreaking LLMs ACL 2026

Activation Decomposition and Steering for LLM Backdoor Remediation ACL 2026

Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization ACL 2026

Don’t Corrupt the Fact: A Trustworthy RAG Watermarking Framework based on Dual Factual Shield ACL 2026

JARVIS or Ultron? A Survey on the Safety and Security Threats of Computer-Using Agents ACL 2026

ReasMark: A Robust Watermark for Attributing LLM Reasoning Under Knowledge Distillation Attacks ACL 2026

TROJail: Trajectory-Level Optimization for Multi-Turn Large Language Model Jailbreaks with Process Rewards ACL 2026