conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Safety
414 papers
Papers per year
2016: 1
1
2017: 1
1
2018: 4
4
2019: 8
8
2020: 11
11
2021: 21
21
2022: 29
29
2023: 36
36
2024: 87
87
2025: 117
117
2026: 99
99
Papers
Language Detoxification with Attribute-Discriminative Latent Space
ACL 2023
TextVerifier: Robustness Verification for Textual Classifiers with Certifiable Guarantees
ACL 2023
Defending against Insertion-based Textual Backdoor Attacks via Attribution
ACL 2023
Can Large Language Models Safely Address Patient Questions Following Cataract Surgery?
ACL 2023
The Best Defense Is a Good Offense: Adversarial Augmentation Against Adversarial Attacks
CVPR 2023
Unveiling the Implicit Toxicity in Large Language Models
EMNLP 2023
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
EMNLP 2023
Self-Detoxifying Language Models via Toxification Reversal
EMNLP 2023
Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs Through a Global Prompt Hacking Competition
EMNLP 2023
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models
EMNLP 2023
ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models
EMNLP 2023
Towards Detecting Contextual Real-Time Toxicity for In-Game Chat
EMNLP 2023
InstructSafety: A Unified Framework for Building Multidimensional and Explainable Safety Detector through Instruction Tuning
EMNLP 2023
GTA: Gated Toxicity Avoidance for LM Performance Preservation
EMNLP 2023
Constrained Update Projection Approach to Safe Policy Optimization
NIPS 2022
On the Safety of Interpretable Machine Learning: A Maximum Deviation Approach
NIPS 2022
Risk-Driven Design of Perception Systems
NIPS 2022
Toward Robust Spiking Neural Network Against Adversarial Perturbation
NIPS 2022
A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP
NIPS 2022
Increasing Confidence in Adversarial Robustness Evaluations
NIPS 2022
Shield Decentralization for Safe Multi-Agent Reinforcement Learning
NIPS 2022
Provable Defense against Backdoor Policies in Reinforcement Learning
NIPS 2022
Enhancing Safe Exploration Using Safety State Augmentation
NIPS 2022
Counterfactual harm
NIPS 2022
Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning
AAAI 2022
<
1
…
13
14
15
16
17
>