conftrace_

Shengpeng Ji

24 papers · 2024–2026 · 8 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓
+6 more ↓ 🐝 Cross-Pollinator (5) 🌍 Conference Polyglot (8) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌈 Renaissance Researcher (6)
🌉 Interdisciplinary Bridge 🤝 Dynamic Duo (19) 🗃️ Keyword Collector (101) The Questioner Prolific Year (19) 💎 Century Club (22)

Conferences

ACL (12) ICLR (5) EMNLP (2) AAAI (1) COLING (1) CVPR (1) ICCV (1) ICML (1)

Papers

VoxMind: An End-to-End Agentic Spoken Dialogue System ACL 2026 Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models ACL 2026 WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models ACL 2025 Language-Codec: Bridging Discrete Codec Representations and Speech Language Models ACL 2025 CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling ACL 2025 Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching ACL 2025 InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training ACL 2025 UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook ACL 2025 T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback ACL 2025 Enhancing Multimodal Unified Representations for Cross Modal Generalization ACL 2025 VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation COLING 2025 Speech Watermarking with Discrete Intermediate Representations AAAI 2025 InteractSpeech: A Speech Dialogue Interaction Corpus for Spoken Dialogue Model EMNLP 2025 Open-set Cross Modal Generalization via Multimodal Unified Representation ICCV 2025 SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language CVPR 2025 ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control ACL 2025 OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup ICLR 2025 WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling ICLR 2025 IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models ICML 2025 VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words? ICLR 2025 OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces ICLR 2025 Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis ICLR 2024 MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech ACL 2024 AudioVSR: Enhancing Video Speech Recognition with Audio Data EMNLP 2024