conftrace_

Artificial Intelligence › Core AI ›

Safety

414 papers

Papers per year

1

1

4

8

11

21

29

36

87

117

99

Papers

CADRef: Robust Out-of-Distribution Detection via Class-Aware Decoupled Relative Feature Leveraging CVPR 2025

Where's the Liability in the Generative Era? Recovery-based Black-Box Detection of AI-Generated Content CVPR 2025

Spotting the Unexpected (STU): A 3D LiDAR Dataset for Anomaly Segmentation in Autonomous Driving CVPR 2025

Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key CVPR 2025

Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model CVPR 2025

I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models CVPR 2025

Erasing Undesirable Influence in Diffusion Models CVPR 2025

Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization CVPR 2025

Six-CD: Benchmarking Concept Removals for Text-to-image Diffusion Models CVPR 2025

Hyperbolic Safety-Aware Vision-Language Models CVPR 2025

Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models? CVPR 2025

PBI-Attack: Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization EMNLP 2025

Automating Steering for Safe Multimodal Large Language Models EMNLP 2025

Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning EMNLP 2025

IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents EMNLP 2025

Anecdoctoring: Automated Red-Teaming Across Language and Place EMNLP 2025

Rescorla-Wagner Steering of LLMs for Undesired Behaviors over Disproportionate Inappropriate Context EMNLP 2025

SAFENUDGE: Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs EMNLP 2025

MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models EMNLP 2025

Nullspace Disentanglement for Red Teaming Language Models EMNLP 2025

Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety EMNLP 2025

Investigating How Pre-training Data Leakage Affects Models’ Reproduction and Detection Capabilities EMNLP 2025

Hallucination Detection in LLMs Using Spectral Features of Attention Maps EMNLP 2025

AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender EMNLP 2025

Evaluating Robustness of Large Audio Language Models to Audio Injection: An Empirical Study EMNLP 2025