Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
Certified Trustworthiness in the Era of Large Language Models
AAAI 2025
Harnessing Robust Statistics for Trustworthy AI
AAAI 2025
Persuasion for Social Good: How to Build and Break AI
AAAI 2025
To Err Is AI: A Case Study Informing LLM Flaw Reporting Practices
AAAI 2025
Fostering Epistemic Insights into AI Ethics through a Constructionist Pedagogy: An Interdisciplinary Approach to AI Literacy
AAAI 2025
Assessing Vulnerabilities in State-of-the-Art Large Language Models Through Hex Injection (Student Abstract)
AAAI 2025
LLM Stinger: Jailbreaking LLMs Using RL Fine-Tuned LLMs (Student Abstract)
AAAI 2025
Combating Phone Scams with LLM-based Detection: Where Do We Stand? (Student Abstract)
AAAI 2025
ML-GOOD: Towards Multi-Label Graph Out-Of-Distribution Detection
AAAI 2025
HyperDefender: A Robust Framework for Hyperbolic GNNs
AAAI 2025
Attack on Prompt: Backdoor Attack in Prompt-Based Continual Learning
AAAI 2025
Certified Causal Defense with Generalizable Robustness
AAAI 2025
Clean-Label Graph Backdoor Attack in the Node Classification Task
AAAI 2025
Speed Master: Quick or Slow Play to Attack Speaker Recognition
AAAI 2025
Scaling Trends for Data Poisoning in LLMs
AAAI 2025
Verification of Neural Networks Against Convolutional Perturbations via Parameterised Kernels
AAAI 2025
LEGEND: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
AAAI 2025
SMLE: Safe Machine Learning via Embedded Overapproximation
AAAI 2025
On the Consideration of AI Openness: Can Good Intent Be Abused?
AAAI 2025
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
AAAI 2025
Can Go AIs Be Adversarially Robust?
AAAI 2025
Towards Trustworthy Machine Learning Under Distribution Shifts
AAAI 2025
The AI Race: Why Current Neural Network-based Architectures are a Poor Basis for Artificial General Intelligence
AAAI 2025
Lessons for Editors of AI Incidents from the AI Incident Database
AAAI 2025
SafeQuant: LLM Safety Analysis via Quantized Gradient Inspection
NAACL 2025
<
1
…
32
33
34
…
119
>