Zhiyong Wu
101 papers · 2015–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (30) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (5) π£ Hot Topic Early Bird
π
Interdisciplinary Bridge
π§
Keyword Pioneer
π
Cross-Pollinator
(11)
π
Conference Loyalist
(47)
π€
Dynamic Duo
(43)
π§¬
Topic Evolution
π
Grand Slam
π¬
Deep Specialist
(17)
π
Keyword Champion
(2)
β
The Questioner
(3)
π
Conference Pioneer
β‘
Prolific Year
(6)
π₯
Unstoppable
(11)
ποΈ
Keyword Collector
(89)
π
Century Club
(96)
π
Trend Setter
Conferences
INTERSPEECH (47)
ACL (19)
AAAI (9)
EMNLP (7)
ICLR (6)
IJCAI (4)
COLING (2)
CVPR (2)
IJCNLP (2)
AACL (1)
ICML (1)
NIPS (1)
Top co-authors
Keywords
large language model
(13)
text-to-speech synthesis
(8)
speech synthesis
(7)
in-context learning
(7)
recurrent neural network
(6)
language model
(6)
speaker embedding
(5)
diffusion model
(5)
contrastive learning
(5)
few-shot learning
(5)
multi-task learning
(4)
convolutional neural network
(4)
multimodal learning
(4)
speech emotion recognition
(4)
prompt engineering
(4)
unsupervised learning
(4)
voice conversion
(4)
model interpretability
(3)
transformer architecture
(3)
zero-shot learning
(3)
Papers
Human-Centric Video Generation via Collaborative Multi-Modal Conditioning
AAAI 2026
Truth or Sophistry? LoFa: A Benchmark for LLM Robustness Against Logical Fallacies
ACL 2026
UniSRM: A Unified Speech Reward Model for Reasoning-Based Fine-grained Assessment
ACL 2026
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows
ACL 2026
DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models
AAAI 2026
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement
AAAI 2025
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
ICLR 2025
Implicit Search via Discrete Diffusion: A Study on Chess
ICLR 2025
OS-ATLAS: Foundation Action Model for Generalist GUI Agents
ICLR 2025
VideoHumanMIB: Unlocking Appearance Decoupling for Video Human Motion In-betweening
IJCAI 2025
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
ACL 2025
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models
ACL 2025
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
ACL 2025
π-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation
ACL 2025
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant
ACL 2025
SimCalib: Graph Neural Network Calibration Based on Similarity between Nodes
AAAI 2024
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
ACL 2024
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models
ACL 2024
LLM as Prompter: Low-resource Inductive Reasoning on Arbitrary Knowledge Graphs
ACL 2024
How Vocabulary Sharing Facilitates Multilingualism in LLaMA?
ACL 2024
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
CVPR 2024
An End-to-End Approach for Chord-Conditioned Song Generation
INTERSPEECH 2024
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models
INTERSPEECH 2024
Speaker Change Detection with Weighted-sum Knowledge Distillation based on Self-supervised Pre-trained Models
INTERSPEECH 2024
A Survey on In-context Learning
EMNLP 2024
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
INTERSPEECH 2024
Comparing Discrete and Continuous Space LLMs for Speech Recognition
INTERSPEECH 2024
EMO: EARTH MOVER DISTANCE OPTIMIZATION FOR AUTO-REGRESSIVE LANGUAGE MODELING
ICLR 2024
SongCreator: Lyrics-based Universal Song Generation
NIPS 2024
SECap: Speech Emotion Captioning with Large Language Model
AAAI 2024
LoRA-MER: Low-Rank Adaptation of Pre-Trained Speech Models for Multimodal Emotion Recognition Using Mutual Information
INTERSPEECH 2024
Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis
EMNLP 2024
Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations
AAAI 2024
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model
INTERSPEECH 2023
Unsupervised Explanation Generation via Correct Instantiations
AAAI 2023
What Does Your Face Sound Like? 3D Face Shape towards Voice
AAAI 2023
Self-Adaptive In-Context Learning: An Information Compression Perspective for In-Context Example Selection and Ordering
ACL 2023
OpenICL: An Open-Source Framework for In-context Learning
ACL 2023
Explanation Regeneration via Information Bottleneck
ACL 2023
QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation
CVPR 2023
Can We Edit Factual Knowledge by In-Context Learning?
EMNLP 2023
DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models
EMNLP 2023
DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
ICLR 2023
Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning
ICLR 2023
Compositional Exemplars for In-context Learning
ICML 2023
DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models
IJCAI 2023
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
INTERSPEECH 2023
ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs
INTERSPEECH 2023
Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information
INTERSPEECH 2023
SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
INTERSPEECH 2023
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis
INTERSPEECH 2023
MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation
INTERSPEECH 2023
Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction
INTERSPEECH 2023
Prosody Modeling with 3D Visual Information for Expressive Video Dubbing
INTERSPEECH 2023
Lexical Knowledge Internalization for Neural Dialog Generation
ACL 2022
CoLo: A Contrastive Learning Based Re-ranking Framework for One-Stage Summarization
COLING 2022
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification
INTERSPEECH 2022
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information
INTERSPEECH 2022
Speech Enhancement with Fullband-Subband Cross-Attention Network
INTERSPEECH 2022
Unsupervised Multi-scale Expressive Speaking Style Modeling with Hierarchical Context Information for Audiobook Speech Synthesis
COLING 2022
CALM: Constrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
INTERSPEECH 2022
Towards Cross-speaker Reading Style Transfer on Audiobook Dataset
INTERSPEECH 2022
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis
INTERSPEECH 2022
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis
INTERSPEECH 2022
Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information
INTERSPEECH 2022
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
INTERSPEECH 2022
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion
INTERSPEECH 2022
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
EMNLP 2022
ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback
EMNLP 2022
Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation
IJCNLP 2021
Adversarially Learning Disentangled Speech Representations for Robust Multi-Factor Voice Conversion
INTERSPEECH 2021
VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis
INTERSPEECH 2021
Voting for the Right Answer: Adversarial Defense for Speaker Verification
INTERSPEECH 2021
Towards Multi-Scale Style Control for Expressive Speech Synthesis
INTERSPEECH 2021
Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation
ACL 2021
Cascaded Head-colliding Attention
ACL 2021
Inferring Emotion from Large-scale Internet Voice Data: A Semi-supervised Curriculum Augmentation based Deep Learning Approach
AAAI 2021
Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding
EMNLP 2021
Cascaded Head-colliding Attention
IJCNLP 2021
FERNet: Fine-grained Extraction and Reasoning Network for Emotion Recognition in Dialogues
AACL 2020
Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT
ACL 2020
SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition
INTERSPEECH 2020
Re-Weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting
INTERSPEECH 2020
Enhancing Monotonicity for Robust Autoregressive Transformer TTS
INTERSPEECH 2020
Speech-XLNet: Unsupervised Acoustic Model Pretraining for Self-Attention Networks
INTERSPEECH 2020
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT
INTERSPEECH 2019
One-Shot Voice Conversion with Global Speaker Embeddings
INTERSPEECH 2019
Towards Discriminative Representation Learning for Speech Emotion Recognition
IJCAI 2019
Knowledge-Based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis
INTERSPEECH 2019
Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection
INTERSPEECH 2018
Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network Based Method
INTERSPEECH 2018
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis
INTERSPEECH 2018
Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms
INTERSPEECH 2018
Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space
INTERSPEECH 2017
Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer
INTERSPEECH 2017
Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion
INTERSPEECH 2017
Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis
INTERSPEECH 2016
Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data
INTERSPEECH 2016
Analysis on Gated Recurrent Unit Based Question Detection Approach
INTERSPEECH 2016
Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition
INTERSPEECH 2016
Modelling High-Dimensional Sequences with LSTM-RTRBM: Application to Polyphonic Music Generation
IJCAI 2015