conftrace_

Zhiyong Wu

101 papers · 2015–2026 · 12 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+16 more ↓

🗺️ Taxonomy Completionist (30) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (5) 🐣 Hot Topic Early Bird

🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (11) 🏠 Conference Loyalist (47) 🤝 Dynamic Duo (43) 🧬 Topic Evolution 🏆 Grand Slam 🔬 Deep Specialist (17) 🏆 Keyword Champion (2) ❓ The Questioner (3) 🚀 Conference Pioneer ⚡ Prolific Year (6) 🔥 Unstoppable (11) 🗃️ Keyword Collector (89) 💎 Century Club (96) 📈 Trend Setter

Conferences

INTERSPEECH (47) ACL (19) AAAI (9) EMNLP (7) ICLR (6) IJCAI (4) COLING (2) CVPR (2) IJCNLP (2) AACL (1) ICML (1) NIPS (1)

Top co-authors

Helen Meng (44) Lingpeng Kong (16) Jia Jia (14) Shiyin Kang (11) Xixin Wu (9) Xiang Li (9) Lianhong Cai (9) Qiushi Sun (9) Fangzhi Xu (9) Runnan Li (7)

Keywords

large language model (13) text-to-speech synthesis (8) speech synthesis (7) in-context learning (7) recurrent neural network (6) language model (6) speaker embedding (5) diffusion model (5) contrastive learning (5) few-shot learning (5) multi-task learning (4) convolutional neural network (4) multimodal learning (4) speech emotion recognition (4) prompt engineering (4) unsupervised learning (4) voice conversion (4) model interpretability (3) transformer architecture (3) zero-shot learning (3)

Papers

Human-Centric Video Generation via Collaborative Multi-Modal Conditioning AAAI 2026 Truth or Sophistry? LoFa: A Benchmark for LLM Robustness Against Logical Fallacies ACL 2026 UniSRM: A Unified Speech Reward Model for Reasoning-Based Fine-grained Assessment ACL 2026 OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows ACL 2026 DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models AAAI 2026 MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement AAAI 2025 RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction ICLR 2025 Implicit Search via Discrete Diffusion: A Study on Chess ICLR 2025 OS-ATLAS: Foundation Action Model for Generalist GUI Agents ICLR 2025 VideoHumanMIB: Unlocking Appearance Decoupling for Video Human Motion In-betweening IJCAI 2025 OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis ACL 2025 Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models ACL 2025 Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning ACL 2025 𝜙-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation ACL 2025 AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant ACL 2025 SimCalib: Graph Neural Network Calibration Based on Similarity between Nodes AAAI 2024 SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents ACL 2024 Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models ACL 2024 LLM as Prompter: Low-resource Inductive Reasoning on Arbitrary Knowledge Graphs ACL 2024 How Vocabulary Sharing Facilitates Multilingualism in LLaMA? ACL 2024 Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model CVPR 2024 An End-to-End Approach for Chord-Conditioned Song Generation INTERSPEECH 2024 Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models INTERSPEECH 2024 Speaker Change Detection with Weighted-sum Knowledge Distillation based on Self-supervised Pre-trained Models INTERSPEECH 2024 A Survey on In-context Learning EMNLP 2024 CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction INTERSPEECH 2024 Comparing Discrete and Continuous Space LLMs for Speech Recognition INTERSPEECH 2024 EMO: EARTH MOVER DISTANCE OPTIMIZATION FOR AUTO-REGRESSIVE LANGUAGE MODELING ICLR 2024 SongCreator: Lyrics-based Universal Song Generation NIPS 2024 SECap: Speech Emotion Captioning with Large Language Model AAAI 2024 LoRA-MER: Low-Rank Adaptation of Pre-Trained Speech Models for Multimodal Emotion Recognition Using Mutual Information INTERSPEECH 2024 Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis EMNLP 2024 Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations AAAI 2024 Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model INTERSPEECH 2023 Unsupervised Explanation Generation via Correct Instantiations AAAI 2023 What Does Your Face Sound Like? 3D Face Shape towards Voice AAAI 2023 Self-Adaptive In-Context Learning: An Information Compression Perspective for In-Context Example Selection and Ordering ACL 2023 OpenICL: An Open-Source Framework for In-context Learning ACL 2023 Explanation Regeneration via Information Bottleneck ACL 2023 QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation CVPR 2023 Can We Edit Factual Knowledge by In-Context Learning? EMNLP 2023 DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models EMNLP 2023 DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models ICLR 2023 Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning ICLR 2023 Compositional Exemplars for In-context Learning ICML 2023 DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models IJCAI 2023 Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation INTERSPEECH 2023 ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs INTERSPEECH 2023 Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information INTERSPEECH 2023 SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge INTERSPEECH 2023 Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis INTERSPEECH 2023 MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation INTERSPEECH 2023 Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction INTERSPEECH 2023 Prosody Modeling with 3D Visual Information for Expressive Video Dubbing INTERSPEECH 2023 Lexical Knowledge Internalization for Neural Dialog Generation ACL 2022 CoLo: A Contrastive Learning Based Re-ranking Framework for One-Stage Summarization COLING 2022 MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification INTERSPEECH 2022 Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information INTERSPEECH 2022 Speech Enhancement with Fullband-Subband Cross-Attention Network INTERSPEECH 2022 Unsupervised Multi-scale Expressive Speaking Style Modeling with Hierarchical Context Information for Audiobook Speech Synthesis COLING 2022 CALM: Constrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis INTERSPEECH 2022 Towards Cross-speaker Reading Style Transfer on Audiobook Dataset INTERSPEECH 2022 Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis INTERSPEECH 2022 Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis INTERSPEECH 2022 Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information INTERSPEECH 2022 Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis INTERSPEECH 2022 Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion INTERSPEECH 2022 ZeroGen: Efficient Zero-shot Learning via Dataset Generation EMNLP 2022 ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback EMNLP 2022 Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation IJCNLP 2021 Adversarially Learning Disentangled Speech Representations for Robust Multi-Factor Voice Conversion INTERSPEECH 2021 VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis INTERSPEECH 2021 Voting for the Right Answer: Adversarial Defense for Speaker Verification INTERSPEECH 2021 Towards Multi-Scale Style Control for Expressive Speech Synthesis INTERSPEECH 2021 Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation ACL 2021 Cascaded Head-colliding Attention ACL 2021 Inferring Emotion from Large-scale Internet Voice Data: A Semi-supervised Curriculum Augmentation based Deep Learning Approach AAAI 2021 Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding EMNLP 2021 Cascaded Head-colliding Attention IJCNLP 2021 FERNet: Fine-grained Extraction and Reasoning Network for Emotion Recognition in Dialogues AACL 2020 Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT ACL 2020 SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition INTERSPEECH 2020 Re-Weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting INTERSPEECH 2020 Enhancing Monotonicity for Robust Autoregressive Transformer TTS INTERSPEECH 2020 Speech-XLNet: Unsupervised Acoustic Model Pretraining for Self-Attention Networks INTERSPEECH 2020 Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT INTERSPEECH 2019 One-Shot Voice Conversion with Global Speaker Embeddings INTERSPEECH 2019 Towards Discriminative Representation Learning for Speech Emotion Recognition IJCAI 2019 Knowledge-Based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis INTERSPEECH 2019 Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection INTERSPEECH 2018 Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network Based Method INTERSPEECH 2018 Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis INTERSPEECH 2018 Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms INTERSPEECH 2018 Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space INTERSPEECH 2017 Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer INTERSPEECH 2017 Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion INTERSPEECH 2017 Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis INTERSPEECH 2016 Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data INTERSPEECH 2016 Analysis on Gated Recurrent Unit Based Question Detection Approach INTERSPEECH 2016 Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition INTERSPEECH 2016 Modelling High-Dimensional Sequences with LSTM-RTRBM: Application to Polyphonic Music Generation IJCAI 2015