Yixiao Ge
58 papers · 2018–2025 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
π Academic Marathon (7) π Conference Polyglot (10) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (14)
π
Cross-Pollinator
(14)
π
Renaissance Researcher
(6)
πΊοΈ
Taxonomy Completionist
(79)
π
Conference Loyalist
(21)
π€
Dynamic Duo
(43)
π
Keyword Champion
(2)
π
Triple Crown
π§¬
Topic Evolution
π¬
Deep Specialist
(21)
π
Grand Slam
π₯
Unstoppable
(6)
π
Century Club
(58)
β‘
Prolific Year
(9)
ποΈ
Keyword Collector
(203)
Conferences
CVPR (21)
ICCV (10)
ECCV (6)
ICLR (6)
NIPS (6)
AAAI (3)
ICML (3)
ACL (1)
IJCAI (1)
NAACL (1)
Top co-authors
Keywords
transfer learning
(7)
multimodal learning
(7)
contrastive learning
(5)
object detection
(5)
representation learning
(5)
large language model
(4)
model compression
(4)
image generation
(3)
vision-language model
(3)
vision transformer
(3)
multimodal large language model
(3)
diffusion model
(3)
video-text retrieval
(3)
zero-shot learning
(2)
multi-modal learning
(2)
few-shot learning
(2)
video understanding
(2)
image classification
(2)
code generation
(2)
benchmark evaluation
(2)
Papers
GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers
ICCV 2025
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
CVPR 2025
VoCo-LLaMA: Towards Vision Compression with Large Language Models
CVPR 2025
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
CVPR 2025
Scalable Image Tokenization with Index Backpropagation Quantization
ICCV 2025
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
NAACL 2025
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
ICCV 2025
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction
ICCV 2025
HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
ICML 2025
LoRA-Gen: Specializing Large Language Model via Online LoRA Generation
ICML 2025
ST-LLM: Large Language Models Are Effective Temporal Learners
ECCV 2024
MambaTree: Tree Topology is All You Need in State Space Model
NIPS 2024
Cached Transformers: Improving Transformers with Differentiable Memory Cachde
AAAI 2024
LLaMA Pro: Progressive LLaMA with Block Expansion
ACL 2024
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
CVPR 2024
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
CVPR 2024
Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs
CVPR 2024
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
CVPR 2024
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
CVPR 2024
YOLO-World: Real-Time Open-Vocabulary Object Detection
CVPR 2024
ViT-Lens: Towards Omni-modal Representations
CVPR 2024
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition
CVPR 2024
SEED-Bench: Benchmarking Multimodal Large Language Models
CVPR 2024
DreamDiffusion: High-Quality EEG-to-Image Generation with Temporal Masked Signal Modeling and CLIP Alignment
ECCV 2024
Making LLaMA SEE and Draw with SEED Tokenizer
ICLR 2024
$\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation
ICML 2023
Meta-Adapter: An Online Few-shot Learner for Vision-Language Model
NIPS 2023
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
NIPS 2023
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
ICCV 2023
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
ICCV 2023
BoxSnake: Polygonal Instance Segmentation with Box Supervision
ICCV 2023
Exploring Model Transferability through the Lens of Potential Energy
ICCV 2023
Darwinian Model Upgrades: Model Evolving with Selective Compatibility
AAAI 2023
Video-Text Pre-training with Learned Regions for Retrieval
AAAI 2023
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
NIPS 2023
Accelerating Vision-Language Pretraining With Free Language Modeling
CVPR 2023
All in One: Exploring Unified Video-Language Pre-Training
CVPR 2023
Learning Transferable Spatiotemporal Representations From Natural Script Knowledge
CVPR 2023
RILS: Masked Visual Reconstruction in Language Semantic Space
CVPR 2023
Masked Image Modeling with Denoising Contrast
ICLR 2023
Object-Aware Video-Language Pre-Training for Retrieval
CVPR 2022
Uncertainty Modeling for Out-of-Distribution Generalization
ICLR 2022
Mc-BEiT: Multi-Choice Discretization for Image BERT Pre-training
ECCV 2022
Not All Models Are Equal: Predicting Model Transferability in a Self-Challenging Fisher Space
ECCV 2022
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval
ECCV 2022
Towards Universal Backward-Compatible Representation Learning
IJCAI 2022
Hot-Refresh Model Upgrades with Regression-Free Compatible Training in Image Retrieval
ICLR 2022
Dynamic Token Normalization improves Vision Transformers
ICLR 2022
Bridging Video-Text Retrieval With Multiple Choice Questions
CVPR 2022
Progressive Correspondence Pruning by Consensus Learning
ICCV 2021
Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-Identification
ICCV 2021
Mutual CRF-GNN for Few-Shot Learning
CVPR 2021
Refining Pseudo Labels With Clustering Consensus Over Generations for Unsupervised Object Re-Identification
CVPR 2021
DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network
CVPR 2021
Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification
ICLR 2020
Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID
NIPS 2020
Self-supervising Fine-grained Region Similarities for Large-scale Image Localization
ECCV 2020
FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification
NIPS 2018