conftrace_

Xize Cheng

33 papers · 2023–2026 · 9 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓
+8 more ↓ 🌍 Conference Polyglot (9) 🐝 Cross-Pollinator (10) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌈 Renaissance Researcher (7)
🐝 Cross-Pollinator (10) 🤝 Dynamic Duo (29) 🏆 Grand Slam 🔬 Deep Specialist (17) Prolific Year (10) 🗃️ Keyword Collector (146) The Questioner 💎 Century Club (32)

Conferences

ACL (16) ICLR (4) EMNLP (3) ICCV (3) ICML (2) NIPS (2) AAAI (1) COLING (1) CVPR (1)

Papers

SDiaReward: Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness ACL 2026 A Wander Through the Multimodal Landscape: Efficient Transfer Learning via Low-rank Sequence Multimodal Adapter AAAI 2025 Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching ACL 2025 ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control ACL 2025 CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling ACL 2025 T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback ACL 2025 VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation COLING 2025 SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language CVPR 2025 PACHAT: Persona-Aware Speech Assistant for Multi-party Dialogue EMNLP 2025 VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words? ICLR 2025 OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces ICLR 2025 OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup ICLR 2025 WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling ICLR 2025 TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation ACL 2024 Extending Multi-modal Contrastive Representations NIPS 2024 Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers NIPS 2024 InstructSpeech: Following Speech Editing Instructions via Large Language Models ICML 2024 AudioVSR: Enhancing Video Speech Recognition with Audio Data EMNLP 2024 FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion ICML 2024 Rethinking the Multimodal Correlation of Multimodal Sequential Learning via Generalizable Attentional Results Alignment ACL 2024 Text-to-Song: Towards Controllable Music Generation Incorporating Vocal and Accompaniment ACL 2024 Uni-Dubbing: Zero-Shot Speech Synthesis from Visual Articulation ACL 2024 Wav2SQL: Direct Generalizable Speech-To-SQL Parsing ACL 2024 OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment ACL 2023 Exploring Group Video Captioning with Efficient Relational Approximation ICCV 2023 Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding ICCV 2023 AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation ACL 2023 MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition ICCV 2023 Weakly-Supervised Spoken Video Grounding via Semantic Interaction Learning ACL 2023 TAVT: Towards Transferable Audio-Visual Text Generation ACL 2023 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding EMNLP 2023 Semantic-conditioned Dual Adaptation for Cross-domain Query-based Visual Segmentation ACL 2023 Contrastive Token-Wise Meta-Learning for Unseen Performer Visual Temporal-Aligned Translation ACL 2023