Jialong Zuo

19 papers · 2023–2026 · 9 conferences · across top CS/AI conferences

Achievements

+7 more ↓

🧭 Keyword Pioneer 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (11) 🌍 Conference Polyglot (9)

🧭 Keyword Pioneer 🐝 Cross-Pollinator (11) 🤝 Dynamic Duo (11) 🏆 Keyword Champion (2) ⚡ Prolific Year (6) 🗃️ Keyword Collector (100) 💎 Century Club (18)

Conferences

ACL (6) AAAI (3) CVPR (2) ICLR (2) NIPS (2) COLING (1) EMNLP (1) ICCV (1) INTERSPEECH (1)

Top co-authors

Zhou Zhao (11) Shengpeng Ji (10) Ziyue Jiang (9) Minghui Fang (9) Changxin Gao (7) Nong Sang (7) Xize Cheng (7) Xiaoda Yang (5) Qian Yang (4) Huaxin Zhang (4)

Keywords

zero-shot learning (4) speech synthesis (4) contrastive learning (2) vector quantization (2) person re-identification (2) cross-modal retrieval (2) speech generation (2) speaker cloning (2) video anomaly detection (2) generative model (2) image retrieval (1) multimodal learning (1) cross-modal learning (1) autoregressive generation (1) anomaly detection (1) speech recognition (1) flow matching (1) domain generalization (1) semantic alignment (1) instruction following (1)

Papers

Learning to Tell Apart: Weakly Supervised Video Anomaly Detection via Disentangled Semantic Alignment AAAI 2026 L-Man: A Large Multi-modal Model Unifying Human-centric Tasks AAAI 2025 Speech Watermarking with Discrete Intermediate Representations AAAI 2025 Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching ACL 2025 Language-Codec: Bridging Discrete Codec Representations and Speech Language Models ACL 2025 CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling ACL 2025 VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation COLING 2025 Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity CVPR 2025 Partial Forward Blocking: A Novel Data Pruning Paradigm for Lossless Training Acceleration ICCV 2025 OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup ICLR 2025 WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling ICLR 2025 ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control ACL 2025 MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis INTERSPEECH 2024 MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech ACL 2024 UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity CVPR 2024 PLIP: Language-Image Pre-training for Person Representation Learning NIPS 2024 AudioVSR: Enhancing Video Speech Recognition with Audio Data EMNLP 2024 Cross-video Identity Correlating for Person Re-identification Pre-training NIPS 2024 FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models ACL 2023