Research Explorer

Cross-Task Defense: Instruction-Tuning LLMs for Content Safety

Yu Fu, Wen Xiao, Jia Chen et al.

2024 NAACL

Introducing GenCeption for Multimodal LLM Benchmarking: You May Bypass Annotations

Lele Cao, Valentin Buchner, Zineb Senane et al.

2024 NAACL

Sandwich attack: Multi-language Mixture Adaptive Attack on LLMs

Bibek Upadhayay, Vahid Behzadan

2024 NAACL

Can LLMs Handle Low-Resource Dialects? A Case Study on Translation and Common Sense Reasoning in Šariš

Viktória Ondrejová, Marek Šuppa

2024 NAACL

Data-Augmentation-Based Dialectal Adaptation for LLMs

Fahim Faisal, Antonios Anastasopoulos

2024 NAACL

Incorporating Dialect Understanding Into LLM Using RAG and Prompt Engineering Techniques for Causal Commonsense Reasoning

Benedikt Perak, Slobodan Beliga, Ana Meštrović

2024 NAACL

Simple LLM based Approach to Counter Algospeak

Jan Fillies, Adrian Paschke

2024 NAACL

On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs

Nitay Calderon, Roi Reichart

2025 NAACL

Hybrid Graphs for Table-and-Text based Question Answering using LLMs

Ankush Agarwal, Chaitanya Devaguptapu, Ganesh S

2025 NAACL

SeqAR: Jailbreak LLMs with Sequential Auto-Generated Characters

Yan Yang, Zeguan Xiao, Xin Lu et al.

2025 NAACL

Unmasking Implicit Bias: Evaluating Persona-Prompted LLM Responses in Power-Disparate Social Scenarios

Bryan Chen Zhengyu Tan, Roy Ka-Wei Lee

2025 NAACL

Balancing Forget Quality and Model Utility: A Reverse KL-Divergence Knowledge Distillation Approach for Better Unlearning in LLMs

Bichen Wang, Yuzhe Zi, Yixin Sun et al.

2025 NAACL

Can LLMs Convert Graphs to Text-Attributed Graphs?

Zehong Wang, Sidney Liu, Zheyuan Zhang et al.

2025 NAACL

CompAct: Compressed Activations for Memory-Efficient LLM Training

Yara Shamshoum, Nitzan Hodos, Yuval Sieradzki et al.

2025 NAACL

What Did I Do Wrong? Quantifying LLMs’ Sensitivity and Consistency to Prompt Engineering

Federico Errica, Davide Sanvito, Giuseppe Siracusano et al.

2025 NAACL

SafetyQuizzer: Timely and Dynamic Evaluation on the Safety of LLMs

Zhichao Shi, Shaoling Jing, Yi Cheng et al.

2025 NAACL

The Impact of Inference Acceleration on Bias of LLMs

Elisabeth Kirsten, Ivan Habernal, Vedant Nanda et al.

2025 NAACL

From Allies to Adversaries: Manipulating LLM Tool-Calling through Adversarial Injection

Rupeng Zhang, Haowei Wang, Junjie Wang et al.

2025 NAACL

Fine-Tuned LLMs are “Time Capsules” for Tracking Societal Bias Through Books

Sangmitra Madhusudan, Robert Morabito, Skye Reid et al.

2025 NAACL

SafeQuant: LLM Safety Analysis via Quantized Gradient Inspection

Sindhu Padakandla, Sadbhavana Babar, Rathod Darshan D et al.

2025 NAACL

Have LLMs Reopened the Pandora’s Box of AI-Generated Fake News?

Xinyu Wang, Wenbo Zhang, Sai Koneru et al.

2025 NAACL

A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation

Bairu Hou, Yang Zhang, Jacob Andreas et al.

2025 NAACL

LLM The Genius Paradox: A Linguistic and Math Expert’s Struggle with Simple Word-based Counting Problems

Nan Xu, Xuezhe Ma

2025 NAACL

Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs?

So Young Lee, Russell Scheinberg, Amber Shore et al.

2025 NAACL

Can Unconfident LLM Annotations Be Used for Confident Conclusions?

Kristina Gligoric, Tijana Zrnic, Cinoo Lee et al.

2025 NAACL

Papers