conftrace_

Minghui Fang

14 papers · 2024–2025 · 7 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+5 more ↓

🐝 Cross-Pollinator (5) 🗺️ Taxonomy Completionist (33) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌈 Renaissance Researcher (5)

🌍 Conference Polyglot (7) 🤝 Dynamic Duo (11) 💎 Century Club (14) ⚡ Prolific Year (12) 🗃️ Keyword Collector (69)

Conferences

ACL (5) AAAI (2) EMNLP (2) ICLR (2) COLING (1) ICCV (1) NIPS (1)

Top co-authors

Zhou Zhao (11) Shengpeng Ji (11) Jialong Zuo (9) Xize Cheng (7) Ziyue Jiang (6) Xiaoda Yang (5) Hai Huang (5) Tao Jin (4) Zehan Wang (3) Yan Xia (3)

Keywords

vector quantization (2) generative model (2) diffusion transformer (2) zero-shot learning (2) speech synthesis (2) contrastive learning (2) multimodal representation (2) speech generation (2) discrete representation (2) message passing (1) flow matching (1) cross-modal retrieval (1) voice conversion (1) multimodal learning (1) autoregressive generation (1) uncertainty quantification (1) feature fusion (1) variational autoencoder (1) hallucination mitigation (1) text generation (1)

Papers

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling ICLR 2025 Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling AAAI 2025 Speech Watermarking with Discrete Intermediate Representations AAAI 2025 ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control ACL 2025 Language-Codec: Bridging Discrete Codec Representations and Speech Language Models ACL 2025 CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling ACL 2025 Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching ACL 2025 Enhancing Multimodal Unified Representations for Cross Modal Generalization ACL 2025 VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation COLING 2025 Open-set Cross Modal Generalization via Multimodal Unified Representation ICCV 2025 OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup ICLR 2025 Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNets EMNLP 2025 AudioVSR: Enhancing Video Speech Recognition with Audio Data EMNLP 2024 MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence NIPS 2024