Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts
EMNLP 2025
Exploring the Impact of Instruction-Tuning on LLM’s Susceptibility to Misinformation
ACL 2025
NLP_CIMAT at SemEval-2025 Task 3: Just Ask GPT or look Inside. A prompt and Neural Networks Approach to Hallucination Detection
SEMEVAL 2025
AILS-NTUA at SemEval-2025 Task 3: Leveraging Large Language Models and Translation Strategies for Multilingual Hallucination Detection
SEMEVAL 2025
One Shot Dominance: Knowledge Poisoning Attack on Retrieval-Augmented Generation Systems
EMNLP 2025
Internal Value Alignment in Large Language Models through Controlled Value Vector Activation
ACL 2025
LLaVA-Critic: Learning to Evaluate Multimodal Models
CVPR 2025
Team Cantharellus at SemEval-2025 Task 3: Hallucination Span Detection with Fine Tuning on Weakly Supervised Synthetic Data
SEMEVAL 2025
RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
CVPR 2025
LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges
ACL 2025
MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique
EMNLP 2025
Mr. Snuffleupagus at SemEval-2025 Task 4: Unlearning Factual Knowledge from LLMs Using Adaptive RMU
SEMEVAL 2025
AI Governance and Lessons Learned as an AI Policy Advisor in the United States Senate
AAAI 2025
Know Your Mistakes: Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling
ACL 2025
Towards Statistical Factuality Guarantee for Large Vision-Language Models
EMNLP 2025
Computational Thinking with Computer Vision: Developing AI Competency in an Introductory Computer Science Course
AAAI 2025
Lacuna Inc. at SemEval-2025 Task 4: LoRA-Enhanced Influence-Based Unlearning for LLMs
SEMEVAL 2025
SDD: Self-Degraded Defense against Malicious Fine-tuning
ACL 2025
Test-Time Backdoor Detection for Object Detection Models
CVPR 2025
HEAL: An Empirical Study on Hallucinations in Embodied Agents Driven by Large Language Models
EMNLP 2025
LORE: Continual Logit Rewriting Fosters Faithful Generation
EMNLP 2025
A Theory of Response Sampling in LLMs: Part Descriptive and Part Prescriptive
ACL 2025
Tightening Robustness Verification of MaxPool-based Neural Networks via Minimizing the Over-Approximation Zone
CVPR 2025
AGENTVIGIL: Automatic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents
EMNLP 2025
ESF: Efficient Sensitive Fingerprinting for Black-Box Tamper Detection of Large Language Models
ACL 2025
<
1
…
37
38
39
…
119
>