Yixiao Ge

58 papers · 2018–2025 · 10 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🏃 Academic Marathon (7) 🌍 Conference Polyglot (10) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (14)

🐝 Cross-Pollinator (14) 🌈 Renaissance Researcher (6) 🗺️ Taxonomy Completionist (79) 🏠 Conference Loyalist (21) 🤝 Dynamic Duo (43) 🏆 Keyword Champion (2) 👑 Triple Crown 🧬 Topic Evolution 🔬 Deep Specialist (21) 🏆 Grand Slam 🔥 Unstoppable (6) 💎 Century Club (58) ⚡ Prolific Year (9) 🗃️ Keyword Collector (203)

Conferences

CVPR (21) ICCV (10) ECCV (6) ICLR (6) NIPS (6) AAAI (3) ICML (3) ACL (1) IJCAI (1) NAACL (1)

Top co-authors

Ying Shan (43) Yuying Ge (11) Ping Luo (10) Lin Song (9) Xiaohu Qie (8) Mike Zheng Shou (8) hongsheng Li (7) Xintao Wang (6) Xiaogang Wang (5) Xiaotong Li (4)

Keywords

transfer learning (7) multimodal learning (7) contrastive learning (5) object detection (5) representation learning (5) large language model (4) model compression (4) image generation (3) vision-language model (3) vision transformer (3) multimodal large language model (3) diffusion model (3) video-text retrieval (3) zero-shot learning (2) multi-modal learning (2) few-shot learning (2) video understanding (2) image classification (2) code generation (2) benchmark evaluation (2)

Papers

GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers ICCV 2025 Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation CVPR 2025 VoCo-LLaMA: Towards Vision Compression with Large Language Models CVPR 2025 ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models CVPR 2025 Scalable Image Tokenization with Index Backpropagation Quantization ICCV 2025 Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots NAACL 2025 Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos ICCV 2025 AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction ICCV 2025 HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding ICML 2025 LoRA-Gen: Specializing Large Language Model via Online LoRA Generation ICML 2025 ST-LLM: Large Language Models Are Effective Temporal Learners ECCV 2024 MambaTree: Tree Topology is All You Need in State Space Model NIPS 2024 Cached Transformers: Improving Transformers with Differentiable Memory Cachde AAAI 2024 LLaMA Pro: Progressive LLaMA with Block Expansion ACL 2024 Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis CVPR 2024 SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models CVPR 2024 Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs CVPR 2024 BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning CVPR 2024 Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities CVPR 2024 YOLO-World: Real-Time Open-Vocabulary Object Detection CVPR 2024 ViT-Lens: Towards Omni-modal Representations CVPR 2024 UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition CVPR 2024 SEED-Bench: Benchmarking Multimodal Large Language Models CVPR 2024 DreamDiffusion: High-Quality EEG-to-Image Generation with Temporal Masked Signal Modeling and CLIP Alignment ECCV 2024 Making LLaMA SEE and Draw with SEED Tokenizer ICLR 2024 $\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation ICML 2023 Meta-Adapter: An Online Few-shot Learner for Vision-Language Model NIPS 2023 Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models NIPS 2023 Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection ICCV 2023 Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation ICCV 2023 BoxSnake: Polygonal Instance Segmentation with Box Supervision ICCV 2023 Exploring Model Transferability through the Lens of Potential Energy ICCV 2023 Darwinian Model Upgrades: Model Evolving with Selective Compatibility AAAI 2023 Video-Text Pre-training with Learned Regions for Retrieval AAAI 2023 GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction NIPS 2023 Accelerating Vision-Language Pretraining With Free Language Modeling CVPR 2023 All in One: Exploring Unified Video-Language Pre-Training CVPR 2023 Learning Transferable Spatiotemporal Representations From Natural Script Knowledge CVPR 2023 RILS: Masked Visual Reconstruction in Language Semantic Space CVPR 2023 Masked Image Modeling with Denoising Contrast ICLR 2023 Object-Aware Video-Language Pre-Training for Retrieval CVPR 2022 Uncertainty Modeling for Out-of-Distribution Generalization ICLR 2022 Mc-BEiT: Multi-Choice Discretization for Image BERT Pre-training ECCV 2022 Not All Models Are Equal: Predicting Model Transferability in a Self-Challenging Fisher Space ECCV 2022 MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval ECCV 2022 Towards Universal Backward-Compatible Representation Learning IJCAI 2022 Hot-Refresh Model Upgrades with Regression-Free Compatible Training in Image Retrieval ICLR 2022 Dynamic Token Normalization improves Vision Transformers ICLR 2022 Bridging Video-Text Retrieval With Multiple Choice Questions CVPR 2022 Progressive Correspondence Pruning by Consensus Learning ICCV 2021 Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-Identification ICCV 2021 Mutual CRF-GNN for Few-Shot Learning CVPR 2021 Refining Pseudo Labels With Clustering Consensus Over Generations for Unsupervised Object Re-Identification CVPR 2021 DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network CVPR 2021 Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification ICLR 2020 Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID NIPS 2020 Self-supervising Fine-grained Region Similarities for Large-scale Image Localization ECCV 2020 FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification NIPS 2018