conftrace_

Zehan Wang

30 papers · 2016–2025 · 8 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓
+10 more ↓ πŸƒ Academic Marathon (9) 🌍 Conference Polyglot (8) πŸŒ‰ Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (12)
🐝 Cross-Pollinator (12) 🌈 Renaissance Researcher (8) πŸ—ΊοΈ Taxonomy Completionist (47) 🀝 Dynamic Duo (24) πŸ‘₯ Mega-Team (20) πŸ”¬ Deep Specialist (11) ❓ The Questioner πŸ’Ž Century Club (30) ⚑ Prolific Year (10) πŸ—ƒοΈ Keyword Collector (127)

Conferences

NIPS (7) ICLR (6) ACL (5) CVPR (5) ICML (3) ICCV (2) EMNLP (1) NAACL (1)

Papers

Data-Efficiently Learn Large Language Model for Universal 3D Scene Perception NAACL 2025 ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control ACL 2025 T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback ACL 2025 RoboGround: Robotic Manipulation with Grounded Vision-Language Priors CVPR 2025 SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language CVPR 2025 VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words? ICLR 2025 OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces ICLR 2025 Improving Long-Text Alignment for Text-to-Image Diffusion Models ICLR 2025 OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup ICLR 2025 Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision ICLR 2025 WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling ICLR 2025 Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models ICML 2025 InstructSpeech: Following Speech Editing Instructions via Large Language Models ICML 2024 MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes NIPS 2024 Action Imitation in Common Action Space for Customized Action Image Synthesis NIPS 2024 Extending Multi-modal Contrastive Representations NIPS 2024 Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers NIPS 2024 Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching NIPS 2024 Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT NIPS 2024 Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners ACL 2024 TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation ACL 2024 FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion ICML 2024 MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition ICCV 2023 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding EMNLP 2023 Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding ICCV 2023 Scene-robust Natural Language Video Localization via Learning Domain-invariant Representations ACL 2023 Connecting Multi-modal Contrastive Representations NIPS 2023 Real-Time Video Super-Resolution With Spatio-Temporal Networks and Motion Compensation CVPR 2017 Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network CVPR 2017 Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network CVPR 2016