Artificial Intelligence › Core AI ›

AI Safety

2972 directly classified papers

Papers per year

Papers

Certified Trustworthiness in the Era of Large Language Models AAAI 2025

Harnessing Robust Statistics for Trustworthy AI AAAI 2025

Persuasion for Social Good: How to Build and Break AI AAAI 2025

To Err Is AI: A Case Study Informing LLM Flaw Reporting Practices AAAI 2025

Fostering Epistemic Insights into AI Ethics through a Constructionist Pedagogy: An Interdisciplinary Approach to AI Literacy AAAI 2025

Assessing Vulnerabilities in State-of-the-Art Large Language Models Through Hex Injection (Student Abstract) AAAI 2025

LLM Stinger: Jailbreaking LLMs Using RL Fine-Tuned LLMs (Student Abstract) AAAI 2025

Combating Phone Scams with LLM-based Detection: Where Do We Stand? (Student Abstract) AAAI 2025

ML-GOOD: Towards Multi-Label Graph Out-Of-Distribution Detection AAAI 2025

HyperDefender: A Robust Framework for Hyperbolic GNNs AAAI 2025

Attack on Prompt: Backdoor Attack in Prompt-Based Continual Learning AAAI 2025

Certified Causal Defense with Generalizable Robustness AAAI 2025

Clean-Label Graph Backdoor Attack in the Node Classification Task AAAI 2025

Speed Master: Quick or Slow Play to Attack Speaker Recognition AAAI 2025

Scaling Trends for Data Poisoning in LLMs AAAI 2025

Verification of Neural Networks Against Convolutional Perturbations via Parameterised Kernels AAAI 2025

LEGEND: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets AAAI 2025

SMLE: Safe Machine Learning via Embedded Overapproximation AAAI 2025

On the Consideration of AI Openness: Can Good Intent Be Abused? AAAI 2025

Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints AAAI 2025

Can Go AIs Be Adversarially Robust? AAAI 2025

Towards Trustworthy Machine Learning Under Distribution Shifts AAAI 2025

The AI Race: Why Current Neural Network-based Architectures are a Poor Basis for Artificial General Intelligence AAAI 2025

Lessons for Editors of AI Incidents from the AI Incident Database AAAI 2025

SafeQuant: LLM Safety Analysis via Quantized Gradient Inspection NAACL 2025