Guangzhi Sun
20 papers · 2022–2026 · 6 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+7 more ↓ Show less ↑
π Conference Polyglot (6) π§ Keyword Pioneer π Interdisciplinary Bridge πΊοΈ Taxonomy Completionist (11) π£ Hot Topic Early Bird
π§
Keyword Pioneer
π
Interdisciplinary Bridge
π€
Dynamic Duo
(14)
ποΈ
Keyword Collector
(72)
β
The Questioner
(3)
β‘
Prolific Year
(8)
π
Century Club
(19)
Conferences
INTERSPEECH (6)
ACL (5)
ICML (4)
EMNLP (2)
ICLR (2)
NAACL (1)
Top co-authors
Keywords
large language model
(4)
speech recognition
(3)
automatic speech recognition
(3)
low-rank adaptation
(3)
video understanding
(2)
pointer generator
(2)
speech synthesis
(1)
visual question answering
(1)
zero-shot learning
(1)
parameter-efficient fine-tuning
(1)
machine unlearning
(1)
speaker embedding
(1)
speaker verification
(1)
parallel processing
(1)
sound source localization
(1)
variational autoencoder
(1)
multimodal dataset
(1)
audio-visual learning
(1)
latent representation
(1)
speaker diarization
(1)
Papers
Protecting Bystander Privacy via Selective Hearing in Audio LLMs
ACL 2026
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
ICML 2025
SkillAggregation: Reference-free LLM-Dependent Aggregation
ACL 2025
Audio-centric Video Understanding Benchmark without Text Shortcut
EMNLP 2025
Unlearning vs. Obfuscation: Are We Truly Removing Knowledge?
EMNLP 2025
Bayesian WeakS-to-Strong from Text Classification to Generation
ICLR 2025
Improving LLM Video Understanding with 16 Frames Per Second
ICML 2025
CASE-Bench: Context-Aware SafEty Benchmark for Large Language Models
ICML 2025
Wav2Prompt: End-to-End Speech Prompt Learning and Task-based Fine-tuning for Text-based LLMs
NAACL 2025
Can Large Language Models Understand Spatial Audio?
INTERSPEECH 2024
SALMONN: Towards Generic Hearing Abilities for Large Language Models
ICLR 2024
Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models
INTERSPEECH 2024
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
ICML 2024
M3AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
ACL 2024
Speech-based Slot Filling using Large Language Models
ACL 2024
SOT Triggered Neural Clustering for Speaker Attributed ASR
INTERSPEECH 2024
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
INTERSPEECH 2024
Can Contextual Biasing Remain Effective with Whisper and GPT-2?
INTERSPEECH 2023
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech
ACL 2022
Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition
INTERSPEECH 2022