conftrace_

Kevin Zhu

23 papers · 2024–2026 · 7 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓
+8 more ↓ πŸ—ΊοΈ Taxonomy Completionist (41) πŸŒ‰ Interdisciplinary Bridge 🌈 Renaissance Researcher (7) 🌍 Conference Polyglot (5) 🧭 Keyword Pioneer
🧭 Keyword Pioneer 🐣 Hot Topic Early Bird πŸ† Keyword Champion (2) 🀝 Dynamic Duo (10) ❓ The Questioner ⚑ Prolific Year (15) πŸ’Ž Century Club (18) πŸ—ƒοΈ Keyword Collector (110)

Conferences

ACL (5) EMNLP (5) NAACL (4) AACL (3) IJCNLP (3) COLING (2) EACL (1)

Papers

MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment EACL 2026 Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs ACL 2026 FrontierScience Bench: Evaluating AI Research Capabilities in LLMs ACL 2025 DecepBench: Benchmarking Multimodal Deception Detection ACL 2025 Pragmatic Metacognitive Prompting Improves LLM Performance on Sarcasm Detection COLING 2025 Improving LLM Abilities in Idiomatic Translation COLING 2025 NovelHopQA: Diagnosing Multi-Hop Reasoning Failures in Long Narrative Contexts EMNLP 2025 EnDive: A Cross-Dialect Benchmark for Fairness and Performance in Large Language Models EMNLP 2025 ERGO: Entropy-guided Resetting for Generation Optimization in Multi-turn Language Models EMNLP 2025 Visualizing and Benchmarking LLM Factual Hallucination Tendencies via Internal State Analysis and Clustering IJCNLP 2025 Adaptive Linguistic Prompting (ALP) Enhances Phishing Webpage Detection in Multimodal Large Language Models ACL 2025 Visualizing and Benchmarking LLM Factual Hallucination Tendencies via Internal State Analysis and Clustering AACL 2025 Mitigating Forgetting in Continual Learning with Selective Gradient Projection AACL 2025 VariantBench: A Framework for Evaluating LLMs on Justifications for Genetic Variant Interpretation AACL 2025 Mitigating Forgetting in Continual Learning with Selective Gradient Projection IJCNLP 2025 VariantBench: A Framework for Evaluating LLMs on Justifications for Genetic Variant Interpretation IJCNLP 2025 Rosetta-PL: Propositional Logic as a Benchmark for Large Language Model Reasoning NAACL 2025 Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche Language NAACL 2025 Self Knowledge-Tracing for Tool Use (SKT-Tool): Helping LLM Agents Understand Their Capabilities in Tool Use NAACL 2025 Error Reflection Prompting: Can Large Language Models Successfully Understand Errors? NAACL 2025 Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks ACL 2024 DiversityMedQA: A Benchmark for Assessing Demographic Biases in Medical Diagnosis using Large Language Models EMNLP 2024 AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark EMNLP 2024