Xiaoda Yang
13 papers · 2024–2026 · 6 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+5 more ↓ Show less ↑
π Renaissance Researcher (6) πΊοΈ Taxonomy Completionist (29) π§ Keyword Pioneer π Interdisciplinary Bridge π Conference Polyglot (6)
π
Cross-Pollinator
(14)
ποΈ
Keyword Collector
(64)
β‘
Prolific Year
(9)
π
Century Club
(11)
β
The Questioner
Conferences
AAAI (3)
EMNLP (3)
ICLR (3)
ACL (2)
COLING (1)
WACV (1)
Top co-authors
Keywords
generative model
(2)
benchmark evaluation
(1)
human motion synthesis
(1)
pose estimation
(1)
voice conversion
(1)
speech synthesis
(1)
speech recognition
(1)
autoregressive generation
(1)
multimodal learning
(1)
temporal information
(1)
cross-modal retrieval
(1)
flow matching
(1)
deep learning
(1)
speaker recognition
(1)
image generation
(1)
visual speech recognition
(1)
zero-shot learning
(1)
diffusion model
(1)
feature fusion
(1)
neural decoding
(1)
Papers
SpatialLogic-Bench: A Diagnostic Benchmark for Task-Oriented Spatiotemporal Reasoning
AAAI 2026
VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework
WACV 2026
Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling
AAAI 2026
VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation
COLING 2025
PACHAT: Persona-Aware Speech Assistant for Multi-party Dialogue
EMNLP 2025
Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection
AAAI 2025
VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?
ICLR 2025
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
ICLR 2025
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
ICLR 2025
BrainLoc: Brain Signal-Based Object Detection with Multi-modal Alignment
EMNLP 2025
CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling
ACL 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
ACL 2025
AudioVSR: Enhancing Video Speech Recognition with Audio Data
EMNLP 2024