Kevin Zhu
23 papers · 2024–2026 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+8 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (41) π Interdisciplinary Bridge π Renaissance Researcher (7) π Conference Polyglot (5) π§ Keyword Pioneer
π§
Keyword Pioneer
π£
Hot Topic Early Bird
π
Keyword Champion
(2)
π€
Dynamic Duo
(10)
β
The Questioner
β‘
Prolific Year
(15)
π
Century Club
(18)
ποΈ
Keyword Collector
(110)
Conferences
ACL (5)
EMNLP (5)
NAACL (4)
AACL (3)
IJCNLP (3)
COLING (2)
EACL (1)
Top co-authors
Keywords
large language model
(12)
benchmark evaluation
(7)
prompt engineering
(4)
internal state analysis
(2)
clinical genomics
(2)
clustering analysis
(2)
catastrophic forgetting
(2)
gradient projection
(2)
bias detection
(2)
in-context learning
(2)
continual learning
(2)
question answering
(2)
low-resource language
(2)
few-shot prompting
(2)
hidden state vector
(2)
multimodal learning
(2)
hallucination detection
(2)
chain-of-thought reasoning
(1)
conversational ai
(1)
sentiment analysis
(1)
Papers
MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment
EACL 2026
Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs
ACL 2026
FrontierScience Bench: Evaluating AI Research Capabilities in LLMs
ACL 2025
DecepBench: Benchmarking Multimodal Deception Detection
ACL 2025
Pragmatic Metacognitive Prompting Improves LLM Performance on Sarcasm Detection
COLING 2025
Improving LLM Abilities in Idiomatic Translation
COLING 2025
NovelHopQA: Diagnosing Multi-Hop Reasoning Failures in Long Narrative Contexts
EMNLP 2025
EnDive: A Cross-Dialect Benchmark for Fairness and Performance in Large Language Models
EMNLP 2025
ERGO: Entropy-guided Resetting for Generation Optimization in Multi-turn Language Models
EMNLP 2025
Visualizing and Benchmarking LLM Factual Hallucination Tendencies via Internal State Analysis and Clustering
IJCNLP 2025
Adaptive Linguistic Prompting (ALP) Enhances Phishing Webpage Detection in Multimodal Large Language Models
ACL 2025
Visualizing and Benchmarking LLM Factual Hallucination Tendencies via Internal State Analysis and Clustering
AACL 2025
Mitigating Forgetting in Continual Learning with Selective Gradient Projection
AACL 2025
VariantBench: A Framework for Evaluating LLMs on Justifications for Genetic Variant Interpretation
AACL 2025
Mitigating Forgetting in Continual Learning with Selective Gradient Projection
IJCNLP 2025
VariantBench: A Framework for Evaluating LLMs on Justifications for Genetic Variant Interpretation
IJCNLP 2025
Rosetta-PL: Propositional Logic as a Benchmark for Large Language Model Reasoning
NAACL 2025
Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche Language
NAACL 2025
Self Knowledge-Tracing for Tool Use (SKT-Tool): Helping LLM Agents Understand Their Capabilities in Tool Use
NAACL 2025
Error Reflection Prompting: Can Large Language Models Successfully Understand Errors?
NAACL 2025
Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks
ACL 2024
DiversityMedQA: A Benchmark for Assessing Demographic Biases in Medical Diagnosis using Large Language Models
EMNLP 2024
AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark
EMNLP 2024