Puyuan Peng
14 papers · 2022–2025 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+9 more ↓ Show less ↑
🐝 Cross-Pollinator (6) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (8) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (6)
🌈
Renaissance Researcher
(6)
🌍
Conference Polyglot
(8)
👥
Mega-Team
(76)
🤝
Dynamic Duo
(14)
🏆
Keyword Champion
(4)
📈
Trend Setter
⚡
Prolific Year
(5)
🗃️
Keyword Collector
(55)
💎
Century Club
(14)
Conferences
INTERSPEECH (6)
ICLR (2)
ACL (1)
ECCV (1)
EMNLP (1)
ICCV (1)
ICML (1)
WACV (1)
Top co-authors
Keywords
neural codec
(4)
self-supervised learning
(3)
zero-shot learning
(3)
speech synthesis
(3)
multimodal learning
(2)
speech editing
(2)
video understanding
(2)
voice conversion
(1)
autoregressive generation
(1)
action recognition
(1)
prompt engineering
(1)
cross-lingual transfer
(1)
word segmentation
(1)
multilingual processing
(1)
video captioning
(1)
deep learning
(1)
visual grounding
(1)
model architecture
(1)
weakly-supervised learning
(1)
speech recognition
(1)
Papers
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
ICLR 2025
VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing
EMNLP 2025
Temporally Streaming Audio-Visual Synchronization for Real-World Videos
WACV 2025
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
ICCV 2025
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
ICLR 2025
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
ACL 2024
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
ECCV 2024
BAT: Learning to Reason about Spatial Sounds with Large Language Models
ICML 2024
Neural Codec Language Models for Disentangled and Textless Voice Conversion
INTERSPEECH 2024
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model
INTERSPEECH 2023
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization
INTERSPEECH 2023
Style-transfer based Speech and Audio-visual Scene understanding for Robot Action Sequence Acquisition from Videos
INTERSPEECH 2023
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer
INTERSPEECH 2022
Word Discovery in Visually Grounded, Self-Supervised Speech Models
INTERSPEECH 2022