Yuhui Zhang
27 papers · 2019–2026 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π Academic Marathon (6) π Conference Polyglot (9) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (13)
π
Cross-Pollinator
(13)
π
Renaissance Researcher
(8)
πΊοΈ
Taxonomy Completionist
(62)
π₯
Mega-Team
(23)
π
Keyword Champion
(2)
π¬
Deep Specialist
(10)
π§¬
Topic Evolution
π€
Dynamic Duo
(11)
ποΈ
Keyword Collector
(121)
π
Century Club
(26)
π₯
Unstoppable
(5)
β
The Questioner
(3)
β‘
Prolific Year
(7)
Conferences
ACL (6)
CVPR (4)
EMNLP (4)
NIPS (4)
ICLR (3)
IJCAI (2)
EACL (1)
ECCV (1)
ICML (1)
INTERSPEECH (1)
Top co-authors
Keywords
vision-language model
(4)
vision language model
(4)
benchmark evaluation
(4)
visual question answering
(3)
question answering
(3)
large language model
(3)
self-supervised learning
(2)
catastrophic forgetting
(2)
zero-shot classification
(2)
biomedical imaging
(2)
language model
(2)
contrastive learning
(2)
multimodal learning
(2)
negation understanding
(2)
multi-modal learning
(2)
multimodal large language model
(2)
chain-of-thought reasoning
(2)
named entity recognition
(2)
lexical semantics
(1)
zero-shot learning
(1)
Papers
PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR
EACL 2026
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
CVPR 2025
NegVQA: Can Vision Language Models Understand Negation?
ACL 2025
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research
CVPR 2025
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
CVPR 2025
MAKAR: a Multi-Agent framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity Recognition
EMNLP 2025
EquiBench: Benchmarking Large Language Modelsβ Reasoning about Program Semantics via Equivalence Checking
EMNLP 2025
Data or Language Supervision: What Makes CLIP Better than DINO?
EMNLP 2025
Video Action Differencing
ICLR 2025
CellFlux: Simulating Cellular Morphology Changes via Flow Matching
ICML 2025
AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing
IJCAI 2025
Describing Differences in Image Sets with Natural Language
CVPR 2024
Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data
ICLR 2024
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
ECCV 2024
Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation
EMNLP 2024
Micro-Bench: A Microscopy Benchmark for Vision-Language Understanding
NIPS 2024
Why are Visually-Grounded Language Models Bad at Image Classification?
NIPS 2024
MuEP: A Multimodal Benchmark for Embodied Planning with Foundation Models
IJCAI 2024
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
NIPS 2023
Diagnosing and Rectifying Vision Models using Language
ICLR 2023
Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language Models
ACL 2023
Deep Self-Supervised Learning of Speech Denoising from Noisy Speeches
INTERSPEECH 2022
Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning
NIPS 2022
Inducing Grammar from Long Short-Term Memory Networks by Shapley Decomposition
ACL 2020
Enhancing Transformer with Sememe Knowledge
ACL 2020
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
ACL 2020
Jiuge: A Human-Machine Collaborative Chinese Classical Poetry Generation System
ACL 2019