Yiyi Zhou
27 papers · 2019–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+12 more ↓ Show less ↑
π Interdisciplinary Bridge π Academic Marathon (6) π Conference Polyglot (8) π Renaissance Researcher (5) πΊοΈ Taxonomy Completionist (49)
π
Cross-Pollinator
(15)
π
Conference Polyglot
(8)
π
Academic Marathon
(6)
π€
Dynamic Duo
(21)
π
Grand Slam
π§¬
Topic Evolution
β
The Questioner
π
Century Club
(26)
π
Conference Pioneer
ποΈ
Keyword Collector
(101)
β‘
Prolific Year
(5)
π₯
Unstoppable
(7)
Conferences
CVPR (9)
AAAI (7)
ICLR (3)
NIPS (3)
ECCV (2)
COLING (1)
ICCV (1)
ICML (1)
Top co-authors
Keywords
multimodal learning
(7)
attention mechanism
(7)
referring expression comprehension
(4)
multi-modal learning
(3)
visual question answering
(3)
dynamic routing
(3)
teacher-student learning
(2)
multimodal large language model
(2)
visual token pruning
(2)
contrastive learning
(2)
efficient computing
(2)
vision-language model
(2)
transfer learning
(2)
image segmentation
(2)
pseudo labeling
(2)
knowledge distillation
(2)
weakly supervised learning
(2)
model compression
(2)
image captioning
(2)
visual language
(2)
Papers
Vision-language Incremental Learning with Dual Class-individual Memory
AAAI 2026
SVFR: A Unified Framework for Generalized Video Face Restoration
CVPR 2025
DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension
CVPR 2025
Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models
ICLR 2025
What Kind of Visual Tokens Do We Need? Training-Free Visual Token Pruning for Multi-Modal Large Language Models from the Perspective of Graph
AAAI 2025
Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
AAAI 2025
$\gamma-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
ICLR 2025
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
ICLR 2025
FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression
CVPR 2025
Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks
AAAI 2024
Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization
ICML 2024
MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization
COLING 2024
Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network
AAAI 2023
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
NIPS 2023
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models
NIPS 2023
RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension
CVPR 2023
RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension
CVPR 2023
Active Teacher for Semi-Supervised Object Detection
CVPR 2022
DIFNet: Boosting Visual Information Flow for Image Captioning
CVPR 2022
Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach
NIPS 2022
PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation
ECCV 2022
SeqTR: A Simple Yet Universal Network for Visual Grounding
ECCV 2022
TRAR: Routing the Attention Spans in Transformer for Visual Question Answering
ICCV 2021
RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words
CVPR 2021
Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
CVPR 2020
Dynamic Capsule Attention for Visual Question Answering
AAAI 2019
Free VQA Models from Knowledge Inertia by Pairwise Inconformity Learning
AAAI 2019