Yuxuan Wang

73 papers · 2012–2026 · 16 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (16) 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (16)

🐝 Cross-Pollinator (14) 🗺️ Taxonomy Completionist (16) 🐣 Hot Topic Early Bird 🌟 Keyword Trendsetter Combo (3) 🏆 Grand Slam 👑 Triple Crown 🏆 Keyword Champion 🔬 Deep Specialist (17) 🧬 Topic Evolution 💎 Century Club (66) 📈 Trend Setter 🗃️ Keyword Collector (292) ⚡ Prolific Year (18) ❓ The Questioner 🔥 Unstoppable (9)

Conferences

AAAI (11) ACL (10) INTERSPEECH (10) ICML (8) EMNLP (6) NIPS (6) ICCV (5) CONLL (3) CVPR (3) ECCV (2) ICLR (2) IJCAI (2) IJCNLP (2) COLING (1) MICCAI (1) NAACL (1)

Top co-authors

Wanxiang Che (9) Dongyan Zhao (8) Yuping Wang (7) Ting Liu (7) Zilong Zheng (7) Lu Lu (6) Yueqian Wang (6) Hanwang Zhang (6) Chuanzeng Huang (5) Qingshan Xu (5)

Keywords

video understanding (7) large language model (5) multimodal learning (5) multi-modal learning (4) neural network (4) self-supervised learning (3) speech synthesis (3) dependency parsing (3) contextualized word embedding (3) diffusion model (3) speech generation (3) video question answering (3) source separation (2) automatic speech recognition (2) speech enhancement (2) scene graph generation (2) linear transformation (2) zero-shot learning (2) domain adaptation (2) universal dependencies (2)

Papers

Pushing Rendering Boundaries: Hard Gaussian Splatting AAAI 2026 PBR3DGen: A VLM-Guided Mesh Generation with High-Quality PBR Texture AAAI 2026 Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image AAAI 2026 DragNeXt: Rethinking Drag-Based Image Editing AAAI 2026 v-HUB: A Benchmark for Video Humor Understanding from Vision and Sound ACL 2026 NeuSpring: Neural Spring Fields for Reconstruction and Simulation of Deformable Objects from Videos AAAI 2026 Temporal Leakage in Search-Engine Date-Filtered Web Retrieval: A Retrospective Forecasting Case Study ACL 2026 Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding AAAI 2025 Clinical Prior Guided Cross-Modal Hierarchical Fusion for Histological Subtyping of Lung Cancer in CT Scans MICCAI 2025 TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation ICML 2025 Bayesian Active Learning for Bivariate Causal Discovery ICML 2025 QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions ACL 2025 Towards Reliable Large Audio Language Model ACL 2025 Sounding that Object: Interactive Object-Aware Image to Audio Generation ICML 2025 DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation ICML 2025 FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models ICCV 2025 Vision-Language Interactive Relation Mining for Open-Vocabulary Scene Graph Generation ICCV 2025 OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts CVPR 2025 Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation ICCV 2025 Reasoning Mamba: Hypergraph-Guided Region Relation Calculating for Weakly Supervised Affordance Grounding CVPR 2025 VGMamba: Attribute-to-Location Clue Reasoning for Quantity-Agnostic 3D Visual Grounding ICCV 2025 VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges ICCV 2025 VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format EMNLP 2025 CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation AAAI 2025 Language Model Can Listen While Speaking AAAI 2025 Can Large Language Models Understand Spatial Audio? INTERSPEECH 2024 TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables NIPS 2024 SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words NIPS 2024 STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering AAAI 2024 Medical Dialogue System: A Survey of Categories, Methods, Evaluation and Challenges ACL 2024 View-Consistent 3D Editing with Gaussian Splatting ECCV 2024 Predicate Debiasing in Vision-Language Models Integration for Scene Graph Generation Enhancement EMNLP 2024 Efficient Temporal Extrapolation of Multimodal Large Language Models with Temporal Grounding Bridge EMNLP 2024 PolyVoice: Language Models for Speech to Speech Translation ICLR 2024 TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling ICML 2024 video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models ICML 2024 InstructME: An Instruction Guided Music Edit Framework with Latent Diffusion Models IJCAI 2024 A Swap Relaxation-Based Local Search for the Latin Square Completion Problem IJCAI 2024 Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR INTERSPEECH 2024 LLaMA-Rider: Spurring Large Language Models to Explore the Open World NAACL 2024 Empowering Convolutional Neural Nets with MetaSin Activation NIPS 2023 Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task AAAI 2023 Query Encoder Distillation via Embedding Alignment is a Strong Baseline Method to Boost Dense Retriever Online Efficiency ACL 2023 Rethinking Dictionaries and Glyphs for Chinese Language Pre-training ACL 2023 Efficient Neural Music Generation NIPS 2023 Memory Augmented Lookup Dictionary Based Language Modeling for Automatic Speech Recognition INTERSPEECH 2023 Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition INTERSPEECH 2023 Language-universal Phonetic Encoder for Low-resource Speech Recognition INTERSPEECH 2023 Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network INTERSPEECH 2023 VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions ACL 2023 Non-intrusive Speech Quality Assessment with a Multi-Task Learning based Subband Adaptive Attention Temporal Convolutional Neural Network INTERSPEECH 2022 VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration INTERSPEECH 2022 Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation EMNLP 2022 SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation CVPR 2022 "GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval" ECCV 2022 AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant EMNLP 2022 Simple and Effective Graph-to-Graph Annotation Conversion COLING 2022 A Closer Look into the Robustness of Neural Dependency Parsers Using Better Adversarial Examples ACL 2021 A Closer Look into the Robustness of Neural Dependency Parsers Using Better Adversarial Examples IJCNLP 2021 Neural Dubber: Dubbing for Videos According to Scripts NIPS 2021 Speech Enhancement with Weakly Labelled Data from AudioSet INTERSPEECH 2021 Modeling the Compatibility of Stem Tracks to Generate Music Mashups AAAI 2021 Xiaomingbot: A Multilingual Robot News Reporter ACL 2020 Hierarchical Generative Modeling for Controllable Speech Synthesis ICLR 2019 Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing IJCNLP 2019 Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing EMNLP 2019 HIT-SCIR at MRP 2019: A Unified Pipeline for Meaning Representation Parsing via Efficient Training and Effective Encoding CONLL 2019 Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation CONLL 2018 Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis ICML 2018 Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron ICML 2018 The HIT-SCIR System for End-to-End Parsing of Universal Dependencies CONLL 2017 Tacotron: Towards End-to-End Speech Synthesis INTERSPEECH 2017 Cocktail Party Processing via Structured Prediction NIPS 2012