conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Safety
414 papers
Papers per year
2016: 1
1
2017: 1
1
2018: 4
4
2019: 8
8
2020: 11
11
2021: 21
21
2022: 29
29
2023: 36
36
2024: 87
87
2025: 117
117
2026: 99
99
Papers
Critical-CoT: A Robust Defense Framework against Reasoning-Level Backdoor Attacks in Large Language Models
ACL 2026
DMN: A Compositional Framework for Jailbreaking Multimodal LLMs with Multi-Image Inputs
ACL 2026
SafeMCP: Proactive Power Regulation for LLM Agent Defense via Environment-Grounded Look-Ahead Reasoning
ACL 2026
HarmRLVR: Weaponizing Verifiable Rewards for Harmful LLM Alignment
ACL 2026
Provably Safe Offline-to-Online RL: Decoupling Learning from Data-Driven Safety Enforcement
ACL 2026
SHAPE: Unifying Safety, Helpfulness and Pedagogy for Educational LLMs
ACL 2026
Unlearners Can Lie: Evaluating and Improving Honesty in LLM Unlearning
ACL 2026
Answering the Wrong Question: Reasoning Trace Inversion for Abstention in LLMs
ACL 2026
Can Factual Opinions Be Edited (Manipulated) in Large Language Models?
ACL 2026
To Lie or Not to Lie? Investigating The Biased Spread of Global Lies by LLMs
ACL 2026
Hallucination Detection in LLMs with Topological Divergence on Attention Graphs
ACL 2026
Accommodation and Epistemic Vigilance: A Pragmatic Account of Why LLMs Fail to Challenge Harmful Beliefs
ACL 2026
Knowing When Not to Answer: Lightweight KB-Aligned OOD Detection for Safe RAG
ACL 2026
In-Context Representation Hijacking
ACL 2026
SAGE: Synergistic Adaptive Gating of Experts for Hateful Video Detection
ACL 2026
Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations
ACL 2026
Seeing No Evil: Blinding Large Vision-Language Models to Safety Instructions via Adversarial Attention Hijacking
ACL 2026
FineSteer: A Unified Framework for Fine-Grained Inference-Time Steering in Large Language Models
ACL 2026
ReFL: Reflective Feedback Learning for Hallucination Detection of Large Language Models
ACL 2026
DVMap: Fine-Grained Pluralistic Value Alignment via High-Consensus Demographic-Value Mapping
ACL 2026
Visual Inception: Compromising Long-term Planning in Agentic Recommenders via Multimodal Memory Poisoning
ACL 2026
Probing the Safety Robustness of LLMs in Latent Space
ACL 2026
USB: A COMPREHENSIVE AND UNIFIED SAFETY EVALUATION BENCHMARK FOR MULTIMODAL LARGE LANGUAGE MODELS
ACL 2026
Rethinking Jailbreak Detection of Large Vision Language Models with Representational Contrastive Scoring
ACL 2026
Benchmarking Web Agent Safety under E-commerce Deceptive Interfaces
ACL 2026
<
1
2
3
4
5
…
17
>