conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Safety
414 papers
Papers per year
2016: 1
1
2017: 1
1
2018: 4
4
2019: 8
8
2020: 11
11
2021: 21
21
2022: 29
29
2023: 36
36
2024: 87
87
2025: 117
117
2026: 99
99
Papers
TVChain: Leveraging Textual-Visual Prompt Chains for Jailbreaking Large Vision-Language Models
AAAI 2026
Runtime Safety and Reach-avoid Prediction of Stochastic Systems via Observation-aware Barrier Functions
AAAI 2026
Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety
ACL 2026
A Functionality-Grounded Benchmark for Evaluating Web Agents in E-commerce Domains
ACL 2026
CachePrune: Teaching LLMs What Not to Follow via KV-Cache Editing
ACL 2026
CRISP: Persistent Concept Unlearning via Sparse Autoencoders
ACL 2026
Beyond Surface-Level Detection: Towards Cognitive-Driven Defense Against Jailbreak Attacks via Meta-Operations Reasoning
ACL 2026
Between a Rock and a Hard Place: The Tension Between Ethical Reasoning and Safety Alignment in LLMs
ACL 2026
RADO: Reasoning Audit-Driven Optimization for Rigorous Reasoning in High-Stakes Domains
ACL 2026
Interpretable Safety Alignment via SAE-Constructed Low-Rank Subspace Adaptation
ACL 2026
Reasoning Structure Matters for Safety Alignment of Reasoning Models
ACL 2026
FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation
ACL 2026
ContextLens: Modeling Imperfect Privacy and Safety Context for Legal Compliance
ACL 2026
StealthGraph: Exposing Domain-Specific Risks in LLMs through Knowledge-Graph-Guided Harmful Prompt Generation
ACL 2026
Calibrating Inference Time Alignment with Sequence-level Risk Accumulation
ACL 2026
Don’t Click That: Teaching Web Agents to Resist Deceptive Interfaces
ACL 2026
LAFaCT: Attribution-based Localization and Focused Sequential Analysis of Fact-Critical Tokens for Hallucination Detection
ACL 2026
Safety-Utility Conflicts Are Not Global: Surgical Alignment via Head-Level Diagnosis
ACL 2026
JPU: Bridging Jailbreak Defense and Unlearning via On-Policy Path Rectification
ACL 2026
Enhancing the Transferability of Jailbreak Attacks on Large Language Models via Exploiting Reparameterization Invariance
ACL 2026
GAMBIT: A Gamified Jailbreak Framework for Multimodal Large Language Models
ACL 2026
MirageBackdoor: A Stealthy Attack that Induces Think-Well-Answer-Wrong Reasoning
ACL 2026
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows
ACL 2026
CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning
ACL 2026
Visual Self-Fulfilling Alignment: Shaping Safety-Oriented Personas via Threat-Related Images
ACL 2026
<
1
2
3
4
5
…
17
>