Minghui Fang
14 papers · 2024–2025 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+5 more ↓ Show less ↑
π Cross-Pollinator (5) πΊοΈ Taxonomy Completionist (33) π Interdisciplinary Bridge π§ Keyword Pioneer π Renaissance Researcher (5)
π
Conference Polyglot
(7)
π€
Dynamic Duo
(11)
π
Century Club
(14)
β‘
Prolific Year
(12)
ποΈ
Keyword Collector
(69)
Conferences
ACL (5)
AAAI (2)
EMNLP (2)
ICLR (2)
COLING (1)
ICCV (1)
NIPS (1)
Top co-authors
Keywords
vector quantization
(2)
generative model
(2)
diffusion transformer
(2)
zero-shot learning
(2)
speech synthesis
(2)
contrastive learning
(2)
multimodal representation
(2)
speech generation
(2)
discrete representation
(2)
message passing
(1)
flow matching
(1)
cross-modal retrieval
(1)
voice conversion
(1)
multimodal learning
(1)
autoregressive generation
(1)
uncertainty quantification
(1)
feature fusion
(1)
variational autoencoder
(1)
hallucination mitigation
(1)
text generation
(1)
Papers
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
ICLR 2025
Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling
AAAI 2025
Speech Watermarking with Discrete Intermediate Representations
AAAI 2025
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
ACL 2025
Language-Codec: Bridging Discrete Codec Representations and Speech Language Models
ACL 2025
CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling
ACL 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
ACL 2025
Enhancing Multimodal Unified Representations for Cross Modal Generalization
ACL 2025
VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation
COLING 2025
Open-set Cross Modal Generalization via Multimodal Unified Representation
ICCV 2025
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
ICLR 2025
Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNets
EMNLP 2025
AudioVSR: Enhancing Video Speech Recognition with Audio Data
EMNLP 2024
MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence
NIPS 2024