Yuxuan Wang
73 papers · 2012–2026 · 16 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (16) π Renaissance Researcher (5) π Interdisciplinary Bridge π Conference Polyglot (16)
π
Cross-Pollinator
(14)
πΊοΈ
Taxonomy Completionist
(16)
π£
Hot Topic Early Bird
π
Keyword Trendsetter Combo
(3)
π
Grand Slam
π
Triple Crown
π
Keyword Champion
π¬
Deep Specialist
(17)
π§¬
Topic Evolution
π
Century Club
(66)
π
Trend Setter
ποΈ
Keyword Collector
(292)
β‘
Prolific Year
(18)
β
The Questioner
π₯
Unstoppable
(9)
Conferences
AAAI (11)
ACL (10)
INTERSPEECH (10)
ICML (8)
EMNLP (6)
NIPS (6)
ICCV (5)
CONLL (3)
CVPR (3)
ECCV (2)
ICLR (2)
IJCAI (2)
IJCNLP (2)
COLING (1)
MICCAI (1)
NAACL (1)
Top co-authors
Keywords
video understanding
(7)
large language model
(5)
multimodal learning
(5)
multi-modal learning
(4)
neural network
(4)
self-supervised learning
(3)
speech synthesis
(3)
dependency parsing
(3)
contextualized word embedding
(3)
diffusion model
(3)
speech generation
(3)
video question answering
(3)
source separation
(2)
automatic speech recognition
(2)
speech enhancement
(2)
scene graph generation
(2)
linear transformation
(2)
zero-shot learning
(2)
domain adaptation
(2)
universal dependencies
(2)
Papers
Pushing Rendering Boundaries: Hard Gaussian Splatting
AAAI 2026
PBR3DGen: A VLM-Guided Mesh Generation with High-Quality PBR Texture
AAAI 2026
Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image
AAAI 2026
DragNeXt: Rethinking Drag-Based Image Editing
AAAI 2026
v-HUB: A Benchmark for Video Humor Understanding from Vision and Sound
ACL 2026
NeuSpring: Neural Spring Fields for Reconstruction and Simulation of Deformable Objects from Videos
AAAI 2026
Temporal Leakage in Search-Engine Date-Filtered Web Retrieval: A Retrospective Forecasting Case Study
ACL 2026
Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding
AAAI 2025
Clinical Prior Guided Cross-Modal Hierarchical Fusion for Histological Subtyping of Lung Cancer in CT Scans
MICCAI 2025
TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation
ICML 2025
Bayesian Active Learning for Bivariate Causal Discovery
ICML 2025
QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
ACL 2025
Towards Reliable Large Audio Language Model
ACL 2025
Sounding that Object: Interactive Object-Aware Image to Audio Generation
ICML 2025
DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation
ICML 2025
FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models
ICCV 2025
Vision-Language Interactive Relation Mining for Open-Vocabulary Scene Graph Generation
ICCV 2025
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
CVPR 2025
Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation
ICCV 2025
Reasoning Mamba: Hypergraph-Guided Region Relation Calculating for Weakly Supervised Affordance Grounding
CVPR 2025
VGMamba: Attribute-to-Location Clue Reasoning for Quantity-Agnostic 3D Visual Grounding
ICCV 2025
VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges
ICCV 2025
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format
EMNLP 2025
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
AAAI 2025
Language Model Can Listen While Speaking
AAAI 2025
Can Large Language Models Understand Spatial Audio?
INTERSPEECH 2024
TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables
NIPS 2024
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
NIPS 2024
STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering
AAAI 2024
Medical Dialogue System: A Survey of Categories, Methods, Evaluation and Challenges
ACL 2024
View-Consistent 3D Editing with Gaussian Splatting
ECCV 2024
Predicate Debiasing in Vision-Language Models Integration for Scene Graph Generation Enhancement
EMNLP 2024
Efficient Temporal Extrapolation of Multimodal Large Language Models with Temporal Grounding Bridge
EMNLP 2024
PolyVoice: Language Models for Speech to Speech Translation
ICLR 2024
TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling
ICML 2024
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
ICML 2024
InstructME: An Instruction Guided Music Edit Framework with Latent Diffusion Models
IJCAI 2024
A Swap Relaxation-Based Local Search for the Latin Square Completion Problem
IJCAI 2024
Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR
INTERSPEECH 2024
LLaMA-Rider: Spurring Large Language Models to Explore the Open World
NAACL 2024
Empowering Convolutional Neural Nets with MetaSin Activation
NIPS 2023
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task
AAAI 2023
Query Encoder Distillation via Embedding Alignment is a Strong Baseline Method to Boost Dense Retriever Online Efficiency
ACL 2023
Rethinking Dictionaries and Glyphs for Chinese Language Pre-training
ACL 2023
Efficient Neural Music Generation
NIPS 2023
Memory Augmented Lookup Dictionary Based Language Modeling for Automatic Speech Recognition
INTERSPEECH 2023
Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition
INTERSPEECH 2023
Language-universal Phonetic Encoder for Low-resource Speech Recognition
INTERSPEECH 2023
Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network
INTERSPEECH 2023
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions
ACL 2023
Non-intrusive Speech Quality Assessment with a Multi-Task Learning based Subband Adaptive Attention Temporal Convolutional Neural Network
INTERSPEECH 2022
VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration
INTERSPEECH 2022
Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation
EMNLP 2022
SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation
CVPR 2022
"GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval"
ECCV 2022
AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant
EMNLP 2022
Simple and Effective Graph-to-Graph Annotation Conversion
COLING 2022
A Closer Look into the Robustness of Neural Dependency Parsers Using Better Adversarial Examples
ACL 2021
A Closer Look into the Robustness of Neural Dependency Parsers Using Better Adversarial Examples
IJCNLP 2021
Neural Dubber: Dubbing for Videos According to Scripts
NIPS 2021
Speech Enhancement with Weakly Labelled Data from AudioSet
INTERSPEECH 2021
Modeling the Compatibility of Stem Tracks to Generate Music Mashups
AAAI 2021
Xiaomingbot: A Multilingual Robot News Reporter
ACL 2020
Hierarchical Generative Modeling for Controllable Speech Synthesis
ICLR 2019
Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing
IJCNLP 2019
Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing
EMNLP 2019
HIT-SCIR at MRP 2019: A Unified Pipeline for Meaning Representation Parsing via Efficient Training and Effective Encoding
CONLL 2019
Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation
CONLL 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
ICML 2018
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
ICML 2018
The HIT-SCIR System for End-to-End Parsing of Universal Dependencies
CONLL 2017
Tacotron: Towards End-to-End Speech Synthesis
INTERSPEECH 2017
Cocktail Party Processing via Structured Prediction
NIPS 2012