Guangzhi Sun

20 papers · 2022–2026 · 6 conferences · across top CS/AI conferences

Achievements

+7 more ↓

🌍 Conference Polyglot (6) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (11) 🐣 Hot Topic Early Bird

🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🤝 Dynamic Duo (14) 🗃️ Keyword Collector (72) ❓ The Questioner (3) ⚡ Prolific Year (8) 💎 Century Club (19)

Conferences

INTERSPEECH (6) ACL (5) ICML (4) EMNLP (2) ICLR (2) NAACL (1)

Top co-authors

Chao Zhang (14) Zejun Ma (6) Phil Woodland (6) Wei Li (6) Changli Tang (6) Wenyi Yu (4) Xianzhao Chen (3) Yudong Yang (3) Jimin Zhuang (3) Yixuan Li (3)

Keywords

large language model (4) speech recognition (3) automatic speech recognition (3) low-rank adaptation (3) video understanding (2) pointer generator (2) speech synthesis (1) visual question answering (1) zero-shot learning (1) parameter-efficient fine-tuning (1) machine unlearning (1) speaker embedding (1) speaker verification (1) parallel processing (1) sound source localization (1) variational autoencoder (1) multimodal dataset (1) audio-visual learning (1) latent representation (1) speaker diarization (1)

Papers

Protecting Bystander Privacy via Selective Hearing in Audio LLMs ACL 2026 video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model ICML 2025 SkillAggregation: Reference-free LLM-Dependent Aggregation ACL 2025 Audio-centric Video Understanding Benchmark without Text Shortcut EMNLP 2025 Unlearning vs. Obfuscation: Are We Truly Removing Knowledge? EMNLP 2025 Bayesian WeakS-to-Strong from Text Classification to Generation ICLR 2025 Improving LLM Video Understanding with 16 Frames Per Second ICML 2025 CASE-Bench: Context-Aware SafEty Benchmark for Large Language Models ICML 2025 Wav2Prompt: End-to-End Speech Prompt Learning and Task-based Fine-tuning for Text-based LLMs NAACL 2025 Can Large Language Models Understand Spatial Audio? INTERSPEECH 2024 SALMONN: Towards Generic Hearing Abilities for Large Language Models ICLR 2024 Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models INTERSPEECH 2024 video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models ICML 2024 M3AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset ACL 2024 Speech-based Slot Filling using Large Language Models ACL 2024 SOT Triggered Neural Clustering for Speaker Attributed ASR INTERSPEECH 2024 SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR INTERSPEECH 2024 Can Contextual Biasing Remain Effective with Whisper and GPT-2? INTERSPEECH 2023 Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech ACL 2022 Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition INTERSPEECH 2022