Shengpeng Ji
24 papers · 2024–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+6 more ↓ Show less ↑
🐝 Cross-Pollinator (5) 🌍 Conference Polyglot (8) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌈 Renaissance Researcher (6)
🌉
Interdisciplinary Bridge
🤝
Dynamic Duo
(19)
🗃️
Keyword Collector
(101)
❓
The Questioner
⚡
Prolific Year
(19)
💎
Century Club
(22)
Conferences
ACL (12)
ICLR (5)
EMNLP (2)
AAAI (1)
COLING (1)
CVPR (1)
ICCV (1)
ICML (1)
Top co-authors
Keywords
speech synthesis
(4)
zero-shot learning
(3)
vector quantization
(3)
generative model
(2)
spoken dialogue
(2)
multimodal representation
(2)
speech generation
(2)
discrete representation
(2)
speaker cloning
(2)
speech language model
(2)
contrastive learning
(2)
multimodal learning
(2)
speech recognition
(2)
speech processing
(2)
spoken dialogue system
(2)
spoken dialogue model
(2)
reinforcement learning
(1)
voice conversion
(1)
feature alignment
(1)
self-supervised learning
(1)
Papers
VoxMind: An End-to-End Agentic Spoken Dialogue System
ACL 2026
Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models
ACL 2026
WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models
ACL 2025
Language-Codec: Bridging Discrete Codec Representations and Speech Language Models
ACL 2025
CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling
ACL 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
ACL 2025
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
ACL 2025
UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook
ACL 2025
T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
ACL 2025
Enhancing Multimodal Unified Representations for Cross Modal Generalization
ACL 2025
VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation
COLING 2025
Speech Watermarking with Discrete Intermediate Representations
AAAI 2025
InteractSpeech: A Speech Dialogue Interaction Corpus for Spoken Dialogue Model
EMNLP 2025
Open-set Cross Modal Generalization via Multimodal Unified Representation
ICCV 2025
SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language
CVPR 2025
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
ACL 2025
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
ICLR 2025
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
ICLR 2025
IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models
ICML 2025
VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?
ICLR 2025
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
ICLR 2025
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
ICLR 2024
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
ACL 2024
AudioVSR: Enhancing Video Speech Recognition with Audio Data
EMNLP 2024