Chong Luo

41 papers · 2018–2026 · 10 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🏃 Academic Marathon (8) 🌍 Conference Polyglot (10) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (13)

🐝 Cross-Pollinator (13) 🌈 Renaissance Researcher (6) 🗺️ Taxonomy Completionist (77) 📛 The Namer 🤝 Dynamic Duo (14) 🧬 Topic Evolution 🏆 Grand Slam 🗃️ Keyword Collector (201) 💎 Century Club (39) ⚡ Prolific Year (8) 🔥 Unstoppable (9) ❓ The Questioner 📈 Trend Setter

Conferences

CVPR (16) AAAI (6) ICCV (4) INTERSPEECH (4) NIPS (3) ECCV (2) ICLR (2) IJCAI (2) ICML (1) WACV (1)

Top co-authors

Wenjun Zeng (14) Qi Dai (11) Yucheng Zhao (11) Chuanxin Tang (9) Zuxuan Wu (7) Zhiwei Xiong (7) Dongdong Chen (6) Dacheng Yin (6) Guangting Wang (6) Zhiyuan Zhao (6)

Keywords

video generation (6) diffusion model (6) video understanding (5) image classification (4) action recognition (4) image generation (4) vision transformer (4) speech enhancement (3) object tracking (3) transformer architecture (3) multimodal learning (3) contrastive learning (3) text-to-video generation (3) representation learning (2) reinforcement learning (2) attention mechanism (2) transfer learning (2) self-supervised learning (2) visual object tracking (2) visual representation (2)

Papers

HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models AAAI 2026 MageBench: Bridging Large Multimodal Models to Agents WACV 2026 LLM2CLIP: Powerful Language Model Unlocks Richer Cross-Modality Representation AAAI 2026 PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting for Novel View Synthesis ICML 2025 MotionFollower: Editing Video Motion via Score-Guided Diffusion ICCV 2025 JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers ICCV 2025 REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents ICCV 2025 FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis CVPR 2025 HomoGen: Enhanced Video Inpainting via Homography Propagation and Diffusion CVPR 2025 StableAnimator: High-Quality Identity-Preserving Human Image Animation CVPR 2025 Unifying Correspondence Pose and NeRF for Generalized Pose-Free Novel View Synthesis CVPR 2024 OmniViD: A Generative Framework for Universal Video Understanding CVPR 2024 Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering ECCV 2024 Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild ECCV 2024 Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms NIPS 2024 CCEdit: Creative and Controllable Video Editing via Diffusion Models CVPR 2024 MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation CVPR 2024 Panacea: Panoramic and Controllable Video Generation for Autonomous Driving CVPR 2024 Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching ICLR 2023 Look Before You Match: Instance Understanding Matters in Video Object Segmentation CVPR 2023 Streaming Video Model CVPR 2023 TridentSE: Guiding Speech Enhancement with 32 Global Tokens INTERSPEECH 2023 Make It Move: Controllable Image-to-Video Generation With Text Descriptions CVPR 2022 When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism AAAI 2022 Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph ICLR 2022 OmniVL: One Foundation Model for Image-Language and Video-Language Tasks NIPS 2022 Sparse MLP for Image Recognition: Is Self-Attention Really Necessary? AAAI 2022 RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion INTERSPEECH 2022 An Anchor-Free Detector for Continuous Speech Keyword Spotting INTERSPEECH 2022 Peripheral Vision Transformer NIPS 2022 Unsupervised Visual Representation Learning by Tracking Patches in Video CVPR 2021 Self-Supervised Visual Representations Learning by Contrastive Mask Prediction ICCV 2021 Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration INTERSPEECH 2021 Spatiotemporal Fusion in 3D CNNs: A Probabilistic View CVPR 2020 Tracking by Instance Detection: A Meta-Learning Approach CVPR 2020 PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network AAAI 2020 Multi-Scale Group Transformer for Long Sequence Modeling in Speech Separation IJCAI 2020 Joint Time-Frequency and Time Domain Learning for Speech Enhancement IJCAI 2020 Posterior-Guided Neural Architecture Search AAAI 2020 SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking CVPR 2019 A Twofold Siamese Network for Real-Time Object Tracking CVPR 2018