Shengpeng Ji

24 papers · 2024–2026 · 8 conferences · across top CS/AI conferences

Achievements

+6 more ↓

🐝 Cross-Pollinator (5) 🌍 Conference Polyglot (8) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌈 Renaissance Researcher (6)

🌉 Interdisciplinary Bridge 🤝 Dynamic Duo (19) 🗃️ Keyword Collector (101) ❓ The Questioner ⚡ Prolific Year (19) 💎 Century Club (22)

Conferences

ACL (12) ICLR (5) EMNLP (2) AAAI (1) COLING (1) CVPR (1) ICCV (1) ICML (1)

Top co-authors

Zhou Zhao (21) Minghui Fang (11) Xize Cheng (11) Jialong Zuo (10) Tao Jin (9) Ziyue Jiang (8) Zehan Wang (7) Xiaoda Yang (6) Hanting Wang (6) Yifu Chen (6)

Keywords

speech synthesis (4) zero-shot learning (3) vector quantization (3) generative model (2) spoken dialogue (2) multimodal representation (2) speech generation (2) discrete representation (2) speaker cloning (2) speech language model (2) contrastive learning (2) multimodal learning (2) speech recognition (2) speech processing (2) spoken dialogue system (2) spoken dialogue model (2) reinforcement learning (1) voice conversion (1) feature alignment (1) self-supervised learning (1)

Papers

VoxMind: An End-to-End Agentic Spoken Dialogue System ACL 2026 Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models ACL 2026 WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models ACL 2025 Language-Codec: Bridging Discrete Codec Representations and Speech Language Models ACL 2025 CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling ACL 2025 Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching ACL 2025 InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training ACL 2025 UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook ACL 2025 T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback ACL 2025 Enhancing Multimodal Unified Representations for Cross Modal Generalization ACL 2025 VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation COLING 2025 Speech Watermarking with Discrete Intermediate Representations AAAI 2025 InteractSpeech: A Speech Dialogue Interaction Corpus for Spoken Dialogue Model EMNLP 2025 Open-set Cross Modal Generalization via Multimodal Unified Representation ICCV 2025 SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language CVPR 2025 ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control ACL 2025 OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup ICLR 2025 WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling ICLR 2025 IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models ICML 2025 VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words? ICLR 2025 OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces ICLR 2025 Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis ICLR 2024 MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech ACL 2024 AudioVSR: Enhancing Video Speech Recognition with Audio Data EMNLP 2024