Chaoyou Fu

21 papers · 2019–2026 · 8 conferences · across top CS/AI conferences

Achievements

+10 more ↓

🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (6) 🏃 Academic Marathon (6) 🗺️ Taxonomy Completionist (42)

🧭 Keyword Pioneer 🐝 Cross-Pollinator (15) 🗺️ Taxonomy Completionist (42) 🏆 Grand Slam 👥 Mega-Team (21) ❓ The Questioner 🔥 Unstoppable (7) 🗃️ Keyword Collector (81) 💎 Century Club (19) ⚡ Prolific Year (7)

Conferences

CVPR (8) NIPS (4) ICML (3) ICLR (2) AAAI (1) ACL (1) ICCV (1) IJCAI (1)

Top co-authors

Ran He (9) Ke Li (5) Xing Sun (5) Yunhang Shen (5) Yibo Hu (4) Peixian Chen (4) Mengdan Zhang (4) Rongrong Ji (3) Yifan Zhang (3) Rong Jin (3)

Keywords

image generation (3) semantic segmentation (2) disentangled representation (2) multi-modal learning (2) video understanding (2) domain adaptation (2) heterogeneous face recognition (2) multimodal large language model (2) identity swapping (2) object detection (2) latent space (2) unsupervised learning (2) noisy label learning (1) video captioning (1) optimal transport (1) open-vocabulary detection (1) transfer learning (1) information bottleneck (1) few-shot learning (1) face recognition (1)

Papers

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension AAAI 2026 Scaling Law for Multimodal Large Language Model Supervised Fine-Tuning ACL 2026 InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption CVPR 2025 Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis CVPR 2025 Learning Interleaved Image-Text Comprehension in Vision-Language Large Models ICLR 2025 MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? ICLR 2025 MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency ICML 2025 Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM ICML 2025 MM-RLHF: The Next Step Forward in Multimodal LLM Alignment ICML 2025 No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation CVPR 2024 Aligning and Prompting Everything All at Once for Universal Visual Perception CVPR 2024 Multi-modal Queried Object Detection in the Wild NIPS 2023 CAPro: Webly Supervised Learning with Cross-modality Aligned Prototypes NIPS 2023 Rethinking Image Cropping: Exploring Diverse Compositions From Global Views CVPR 2022 CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification ICCV 2021 Information Bottleneck Disentanglement for Identity Swapping CVPR 2021 Pareidolia Face Reenactment CVPR 2021 AOT: Appearance Optimal Transport Based Identity Swapping for Forgery Detection NIPS 2020 Cross-Spectral Face Hallucination via Disentangling Independent Factors CVPR 2020 Dual Variational Generation for Low Shot Heterogeneous Face Recognition NIPS 2019 Neurons Merging Layer: Towards Progressive Redundancy Reduction for Deep Supervised Hashing IJCAI 2019