Ziyang Ma
35 papers · 2013–2026 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+11 more ↓ Show less ↑
🏃 Academic Marathon (12) 🌍 Conference Polyglot (10) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (10)
🌈
Renaissance Researcher
(8)
🌍
Conference Polyglot
(10)
🏃
Academic Marathon
(12)
👥
Mega-Team
(32)
🤝
Dynamic Duo
(22)
🔬
Deep Specialist
(10)
🧬
Topic Evolution
⚡
Prolific Year
(9)
🚀
Conference Pioneer
💎
Century Club
(30)
🗃️
Keyword Collector
(145)
Conferences
ACL (11)
INTERSPEECH (8)
AAAI (5)
EMNLP (4)
ICCV (2)
COLING (1)
CVPR (1)
ICLR (1)
ICML (1)
IJCAI (1)
Top co-authors
Keywords
automatic speech recognition
(5)
large language model
(5)
speech synthesis
(4)
self-supervised learning
(4)
contrastive learning
(3)
multimodal learning
(3)
zero-shot learning
(2)
music understanding
(2)
speech representation
(2)
end-to-end model
(2)
speech language model
(2)
image reconstruction
(2)
representation learning
(2)
low-resource language
(2)
speech emotion recognition
(2)
uncertainty quantification
(1)
curriculum learning
(1)
em algorithm
(1)
k-means clustering
(1)
reinforcement learning
(1)
Papers
Evaluating the Expressive Appropriateness of Speech in Rich Contexts
ACL 2026
FineLAP: Taming Heterogeneous Supervision for Fine-grained Language-Audio Pretraining
ACL 2026
Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
ACL 2026
SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization
ACL 2026
Ghost in the Transformer: Detecting Model Reuse with Invariant Spectral Signatures
AAAI 2026
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
ACL 2025
Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning
ACL 2025
Towards Reliable Large Audio Language Model
ACL 2025
SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training
ACL 2025
Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens
EMNLP 2025
Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation
EMNLP 2025
URO-Bench: Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models
EMNLP 2025
CoViPAL: Layer-wise Contextualized Visual Token Pruning for Large Vision-Language Models
EMNLP 2025
VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization
AAAI 2025
Language Model Can Listen While Speaking
AAAI 2025
Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration
AAAI 2025
ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering
AAAI 2025
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
ACL 2025
MuPT: A Generative Symbolic Music Pretrained Transformer
ICLR 2025
ChatMusician: Understanding and Generating Music Intrinsically with LLM
ACL 2024
BAT: Learning to Reason about Spatial Sounds with Large Language Models
ICML 2024
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
IJCAI 2024
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
INTERSPEECH 2024
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
INTERSPEECH 2024
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
INTERSPEECH 2024
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers
INTERSPEECH 2024
Source-free Domain Adaptation for Aspect-based Sentiment Analysis
COLING 2024
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
ACL 2024
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation
INTERSPEECH 2023
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
INTERSPEECH 2023
Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation
INTERSPEECH 2023
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
INTERSPEECH 2023
Video Super-Resolution via Deep Draft-Ensemble Learning
ICCV 2015
Handling Motion Blur in Multi-Frame Super-Resolution
CVPR 2015
Constant Time Weighted Median Filtering for Stereo Matching and Beyond
ICCV 2013