Ziyang Ma

35 papers · 2013–2026 · 10 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🏃 Academic Marathon (12) 🌍 Conference Polyglot (10) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (10)

🌈 Renaissance Researcher (8) 🌍 Conference Polyglot (10) 🏃 Academic Marathon (12) 👥 Mega-Team (32) 🤝 Dynamic Duo (22) 🔬 Deep Specialist (10) 🧬 Topic Evolution ⚡ Prolific Year (9) 🚀 Conference Pioneer 💎 Century Club (30) 🗃️ Keyword Collector (145)

Conferences

ACL (11) INTERSPEECH (8) AAAI (5) EMNLP (4) ICCV (2) COLING (1) CVPR (1) ICLR (1) ICML (1) IJCAI (1)

Top co-authors

Xie Chen (26) Wenxi Chen (8) Zhisheng Zheng (7) Yifan Yang (7) Kai Yu (7) Guanrou Yang (6) Xiquan Li (6) Zhikang Niu (5) Yakun Song (4) Zhuo Chen (4)

Keywords

automatic speech recognition (5) large language model (5) speech synthesis (4) self-supervised learning (4) contrastive learning (3) multimodal learning (3) zero-shot learning (2) music understanding (2) speech representation (2) end-to-end model (2) speech language model (2) image reconstruction (2) representation learning (2) low-resource language (2) speech emotion recognition (2) uncertainty quantification (1) curriculum learning (1) em algorithm (1) k-means clustering (1) reinforcement learning (1)

Papers

Evaluating the Expressive Appropriateness of Speech in Rich Contexts ACL 2026 FineLAP: Taming Heterogeneous Supervision for Fine-grained Language-Audio Pretraining ACL 2026 Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training ACL 2026 SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization ACL 2026 Ghost in the Transformer: Detecting Model Reuse with Invariant Spectral Signatures AAAI 2026 F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching ACL 2025 Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning ACL 2025 Towards Reliable Large Audio Language Model ACL 2025 SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training ACL 2025 Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens EMNLP 2025 Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation EMNLP 2025 URO-Bench: Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models EMNLP 2025 CoViPAL: Layer-wise Contextualized Visual Token Pruning for Large Vision-Language Models EMNLP 2025 VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization AAAI 2025 Language Model Can Listen While Speaking AAAI 2025 Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration AAAI 2025 ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering AAAI 2025 GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement ACL 2025 MuPT: A Generative Symbolic Music Pretrained Transformer ICLR 2025 ChatMusician: Understanding and Generating Music Intrinsically with LLM ACL 2024 BAT: Learning to Reason about Spatial Sounds with Large Language Models ICML 2024 EAT: Self-Supervised Pre-Training with Efficient Audio Transformer IJCAI 2024 EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark INTERSPEECH 2024 MaLa-ASR: Multimedia-Assisted LLM-Based ASR INTERSPEECH 2024 LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR INTERSPEECH 2024 TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers INTERSPEECH 2024 Source-free Domain Adaptation for Aspect-based Sentiment Analysis COLING 2024 emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation ACL 2024 Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation INTERSPEECH 2023 MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets INTERSPEECH 2023 Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation INTERSPEECH 2023 Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition INTERSPEECH 2023 Video Super-Resolution via Deep Draft-Ensemble Learning ICCV 2015 Handling Motion Blur in Multi-Frame Super-Resolution CVPR 2015 Constant Time Weighted Median Filtering for Stereo Matching and Beyond ICCV 2013