Chong Luo
41 papers · 2018–2026 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π Academic Marathon (8) π Conference Polyglot (10) π§ Keyword Pioneer π Interdisciplinary Bridge π Cross-Pollinator (13)
π
Cross-Pollinator
(13)
π
Renaissance Researcher
(6)
πΊοΈ
Taxonomy Completionist
(77)
π
The Namer
π€
Dynamic Duo
(14)
π§¬
Topic Evolution
π
Grand Slam
ποΈ
Keyword Collector
(201)
π
Century Club
(39)
β‘
Prolific Year
(8)
π₯
Unstoppable
(9)
β
The Questioner
π
Trend Setter
Conferences
CVPR (16)
AAAI (6)
ICCV (4)
INTERSPEECH (4)
NIPS (3)
ECCV (2)
ICLR (2)
IJCAI (2)
ICML (1)
WACV (1)
Top co-authors
Keywords
video generation
(6)
diffusion model
(6)
video understanding
(5)
image classification
(4)
action recognition
(4)
image generation
(4)
vision transformer
(4)
speech enhancement
(3)
object tracking
(3)
transformer architecture
(3)
multimodal learning
(3)
contrastive learning
(3)
text-to-video generation
(3)
representation learning
(2)
reinforcement learning
(2)
attention mechanism
(2)
transfer learning
(2)
self-supervised learning
(2)
visual object tracking
(2)
visual representation
(2)
Papers
HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models
AAAI 2026
MageBench: Bridging Large Multimodal Models to Agents
WACV 2026
LLM2CLIP: Powerful Language Model Unlocks Richer Cross-Modality Representation
AAAI 2026
PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting for Novel View Synthesis
ICML 2025
MotionFollower: Editing Video Motion via Score-Guided Diffusion
ICCV 2025
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
ICCV 2025
REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents
ICCV 2025
FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis
CVPR 2025
HomoGen: Enhanced Video Inpainting via Homography Propagation and Diffusion
CVPR 2025
StableAnimator: High-Quality Identity-Preserving Human Image Animation
CVPR 2025
Unifying Correspondence Pose and NeRF for Generalized Pose-Free Novel View Synthesis
CVPR 2024
OmniViD: A Generative Framework for Universal Video Understanding
CVPR 2024
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
ECCV 2024
Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild
ECCV 2024
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
NIPS 2024
CCEdit: Creative and Controllable Video Editing via Diffusion Models
CVPR 2024
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
CVPR 2024
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
CVPR 2024
Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching
ICLR 2023
Look Before You Match: Instance Understanding Matters in Video Object Segmentation
CVPR 2023
Streaming Video Model
CVPR 2023
TridentSE: Guiding Speech Enhancement with 32 Global Tokens
INTERSPEECH 2023
Make It Move: Controllable Image-to-Video Generation With Text Descriptions
CVPR 2022
When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism
AAAI 2022
Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph
ICLR 2022
OmniVL: One Foundation Model for Image-Language and Video-Language Tasks
NIPS 2022
Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?
AAAI 2022
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
INTERSPEECH 2022
An Anchor-Free Detector for Continuous Speech Keyword Spotting
INTERSPEECH 2022
Peripheral Vision Transformer
NIPS 2022
Unsupervised Visual Representation Learning by Tracking Patches in Video
CVPR 2021
Self-Supervised Visual Representations Learning by Contrastive Mask Prediction
ICCV 2021
Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
INTERSPEECH 2021
Spatiotemporal Fusion in 3D CNNs: A Probabilistic View
CVPR 2020
Tracking by Instance Detection: A Meta-Learning Approach
CVPR 2020
PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network
AAAI 2020
Multi-Scale Group Transformer for Long Sequence Modeling in Speech Separation
IJCAI 2020
Joint Time-Frequency and Time Domain Learning for Speech Enhancement
IJCAI 2020
Posterior-Guided Neural Architecture Search
AAAI 2020
SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking
CVPR 2019
A Twofold Siamese Network for Real-Time Object Tracking
CVPR 2018