Serena Yeung-Levy

20 papers · 2024–2026 · 11 conferences · across top CS/AI conferences

Achievements

+7 more ↓

🐝 Cross-Pollinator (13) 🌍 Conference Polyglot (10) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌈 Renaissance Researcher (6)

🐝 Cross-Pollinator (13) 🤝 Dynamic Duo (11) 👥 Mega-Team (23) 💎 Century Club (19) ⚡ Prolific Year (6) ❓ The Questioner (3) 🗃️ Keyword Collector (62)

Conferences

CVPR (5) ECCV (3) ICLR (3) NIPS (2) ACL (1) EACL (1) EMNLP (1) ICCV (1) ICML (1) MLHC (1) WACV (1)

Top co-authors

Yuhui Zhang (12) Xiaohan Wang (10) James Burgess (9) Alejandro Lozano (7) Yuchang Su (5) Orr Zohar (4) Anita Rau (3) Josiah Aklilu (3) Jeffrey J Nirschl (3) Emma Lundberg (3)

Keywords

vision-language model (6) benchmark evaluation (3) visual question answering (3) vision language model (3) zero-shot learning (2) question answering (2) biomedical imaging (2) chain-of-thought reasoning (1) catastrophic forgetting (1) image classification (1) prototype learning (1) self-supervised learning (1) domain adaptation (1) reinforcement learning (1) image captioning (1) transfer learning (1) multimodal learning (1) test-time adaptation (1) multi-modal learning (1) information retrieval (1)

Papers

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR EACL 2026 Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision ICLR 2025 CellFlux: Simulating Cellular Morphology Changes via Flow Matching ICML 2025 The Impact of Image Resolution on Biomedical Multimodal Large Language Models MLHC 2025 Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models WACV 2025 BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature CVPR 2025 Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation CVPR 2025 NegVQA: Can Vision Language Models Understand Negation? ACL 2025 MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research CVPR 2025 Apollo: An Exploration of Video Understanding in Large Multimodal Models CVPR 2025 Data or Language Supervision: What Makes CLIP Better than DINO? EMNLP 2025 Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration ICCV 2025 Foundation Models Secretly Understand Neural Network Weights: Enhancing Hypernetwork Architectures with Foundation Models ICLR 2025 Video Action Differencing ICLR 2025 VideoAgent: Long-form Video Understanding with Large Language Model as Agent ECCV 2024 Why are Visually-Grounded Language Models Bad at Image Classification? NIPS 2024 Describing Differences in Image Sets with Natural Language CVPR 2024 Depth-guided NeRF Training via Earth Mover’s Distance ECCV 2024 Viewpoint textual inversion: discovering scene representations and 3D view control in 2D diffusion models ECCV 2024 Micro-Bench: A Microscopy Benchmark for Vision-Language Understanding NIPS 2024