Hung-yi Lee
142 papers · 2016–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+18 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (31) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (5) π£ Hot Topic Early Bird
π
Renaissance Researcher
(5)
π
Interdisciplinary Bridge
π
Conference Polyglot
(11)
π
Conference Loyalist
(23)
π
Keyword Trendsetter Combo
(3)
π€
Dynamic Duo
(17)
π§¬
Topic Evolution
π
Keyword Champion
(2)
π
Grand Slam
π₯
Mega-Team
(76)
π¬
Deep Specialist
(26)
π
Trend Setter
π
Conference Pioneer
π₯
Unstoppable
(10)
β
The Questioner
(8)
π
Century Club
(137)
ποΈ
Keyword Collector
(84)
β‘
Prolific Year
(32)
Conferences
INTERSPEECH (60)
ACL (27)
EMNLP (23)
IJCNLP (9)
NAACL (8)
NIPS (4)
AAAI (3)
ICML (3)
EACL (2)
ICLR (2)
AACL (1)
Top co-authors
Research topics
Keywords
self-supervised learning
(20)
large language model
(19)
transfer learning
(14)
automatic speech recognition
(12)
speech processing
(10)
domain adaptation
(9)
speech recognition
(8)
generative adversarial network
(8)
speech synthesis
(7)
unsupervised learning
(7)
speaker verification
(6)
representation learning
(6)
voice conversion
(6)
model merging
(5)
speech representation
(5)
few-shot learning
(5)
one-shot learning
(4)
spoken language understanding
(4)
question answering
(4)
model compression
(4)
Papers
Full-Duplex-Bench-v2: A Multi-Turn Evaluation Framework for Duplex Dialogue Systems with an Automated Examiner
ACL 2026
CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks
ACL 2026
An Exploration of Mamba for Speech Self-Supervised Models
ACL 2026
Shanks: Simultaneous Hearing and Thinking for Spoken Language Models
ACL 2026
BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation
EACL 2026
Hierarchical Speculative Decoding with Dynamic Window
NAACL 2025
Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging
EMNLP 2025
Gender Bias in Instruction-Guided Speech Synthesis Models
NAACL 2025
Generative Audio Language Modeling with Continuous-valued Tokens and Masked Next-Token Prediction
ICML 2025
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey
EMNLP 2025
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
ICLR 2025
Creativity in LLM-based Multi-Agent Systems: A Survey
EMNLP 2025
IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling
ICML 2025
TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge
ACL 2025
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
ACL 2025
Transferring Textual Preferences to Vision-Language Understanding through Model Merging
ACL 2025
InstructionCP: A Simple yet Effective Approach for Transferring Large Language Models to Target Languages
ACL 2025
Audio-Aware Large Language Models as Judges for Speaking Styles
EMNLP 2025
Merging Facts, Crafting Fallacies: Evaluating the Contradictory Nature of Aggregated Factual Claims in Long-Form Generations
ACL 2024
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
ACL 2024
On the Evaluation of Speech Foundation Models for Spoken Language Understanding
ACL 2024
Over-Reasoning and Redundant Calculation of Large Language Models
EACL 2024
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR
NIPS 2024
GSQA: An End-to-End Model for Generative Spoken Question Answering
INTERSPEECH 2024
I Need Help! Evaluating LLMβs Ability to Ask for Usersβ Support: A Case Study on Text-to-SQL Generation
EMNLP 2024
Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course
EMNLP 2024
Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition
EMNLP 2024
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging
EMNLP 2024
Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
EMNLP 2024
Let Me Speak Freely? A Study On The Impact Of Format Restrictions On Large Language Model Performance.
EMNLP 2024
Can LLMs Understand the Implication of Emphasized Sentences in Dialogue?
EMNLP 2024
Do Metadata and Appearance of the Retrieved Webpages Affect LLMβs Reasoning in Retrieval-Augmented Generation?
EMNLP 2024
Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
EMNLP 2024
Meta-Diffu$B$: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration
NIPS 2024
Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks
INTERSPEECH 2024
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
INTERSPEECH 2024
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
INTERSPEECH 2024
DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models
INTERSPEECH 2024
Emo-bias: A Large Scale Evaluation of Social Bias on Speech Emotion Recognition
INTERSPEECH 2024
On the social bias of speech self-supervised models
INTERSPEECH 2024
Singing Voice Graph Modeling for SingFake Detection
INTERSPEECH 2024
Systematic Analysis for Pretrained Language Model Priming for Parameter-Efficient Fine-tuning
NAACL 2024
StreamBench: Towards Benchmarking Continuous Improvement of Language Agents
NIPS 2024
Neural Codec-based Adversarial Sample Detection for Speaker Verification
INTERSPEECH 2024
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
INTERSPEECH 2024
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems
INTERSPEECH 2024
Dataset-Distillation Generative Model for Speech Emotion Recognition
INTERSPEECH 2024
Parameter-efficient Fine-tuning of Speaker-Aware Dynamic Prompts for Speaker Verification
INTERSPEECH 2024
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations
ACL 2024
Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages
ACL 2024
How to Estimate Model Transferability of Pre-Trained Speech Models?
INTERSPEECH 2023
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks
ACL 2023
Introducing Semantics into Speech Encoders
ACL 2023
Can Large Language Models Be an Alternative to Human Evaluations?
ACL 2023
Are Synonym Substitution Attacks Really Synonym Substitution Attacks?
ACL 2023
Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue
ACL 2023
Revealing the Blind Spot of Sentence Encoder Evaluation by HEROS
ACL 2023
A Closer Look into Using Large Language Models for Automatic Evaluation
EMNLP 2023
Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs
ICML 2023
Why We Should Report the Details in Subjective Evaluation of TTS More Rigorously
INTERSPEECH 2023
Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target
INTERSPEECH 2023
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
INTERSPEECH 2023
Anticipation-Free Training for Simultaneous Machine Translation
ACL 2022
Self-supervised Representation Learning for Speech Processing
NAACL 2022
Recent Advances in Pre-trained Language Models: Why Do They Work and How Do They Work
IJCNLP 2022
Meta Learning for Natural Language Processing: A Survey
NAACL 2022
XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding
ACL 2022
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
ACL 2022
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation
INTERSPEECH 2022
DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering
INTERSPEECH 2022
Membership Inference Attacks Against Self-supervised Speech Models
INTERSPEECH 2022
An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks
INTERSPEECH 2022
Few Shot Cross-Lingual TTS Using Transferable Phoneme Embedding
INTERSPEECH 2022
DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores
INTERSPEECH 2022
Spoofing-Aware Speaker Verification by Multi-Level Fusion
INTERSPEECH 2022
Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition
INTERSPEECH 2022
Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation
INTERSPEECH 2022
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification
INTERSPEECH 2022
On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets
AAAI 2022
Recent Advances in Pre-trained Language Models: Why Do They Work and How Do They Work
AACL 2022
AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks
NAACL 2022
Multi-accent Speech Separation with One Shot Learning
ACL 2021
S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations
INTERSPEECH 2021
Meta Learning and Its Applications to Natural Language Processing
IJCNLP 2021
Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation
IJCNLP 2021
Multi-accent Speech Separation with One Shot Learning
IJCNLP 2021
Mitigating Biases in Toxic Language Detection through Invariant Rationalization
IJCNLP 2021
SUPERB: Speech Processing Universal PERformance Benchmark
INTERSPEECH 2021
Towards Lifelong Learning of End-to-End ASR
INTERSPEECH 2021
Put Chatbot into Its Interlocutorβs Shoes: New Framework to Learn Chatbot Responding with Intention
NAACL 2021
Utilizing Self-Supervised Representations for MOS Prediction
INTERSPEECH 2021
Stabilizing Label Assignment for Speech Separation by Self-Supervised Pre-Training
INTERSPEECH 2021
Mitigating Biases in Toxic Language Detection through Invariant Rationalization
ACL 2021
Auto-KWS 2021 Challenge: Task, Datasets, and Baselines
INTERSPEECH 2021
Multi-modal User Intent Classification Under the Scenario of Smart Factory (Student Abstract)
AAAI 2021
Unsupervised Multiple Choices Question Answering: Start Learning from Basic Knowledge
EMNLP 2021
Is BERT a Cross-Disciplinary Knowledge Learner? A Surprising Finding of Pre-trained Modelsβ Transferability
EMNLP 2021
Voting for the Right Answer: Adversarial Defense for Speaker Verification
INTERSPEECH 2021
Meta Learning and Its Applications to Natural Language Processing
ACL 2021
Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation
ACL 2021
WG-WaveNet: Real-Time High-Fidelity Speech Synthesis Without GPU
INTERSPEECH 2020
Order-Free Learning Alleviating Exposure Bias in Multi-Label Classification
AAAI 2020
Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation
ACL 2020
Pretrained Language Model Embryology: The Birth of ALBERT
EMNLP 2020
LAMOL: LAnguage MOdeling for Lifelong Language Learning
ICLR 2020
TaylorGAN: Neighbor-Augmented Policy Update Towards Sample-Efficient Natural Language Generation
NIPS 2020
DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation
INTERSPEECH 2020
Semi-Supervised Learning for Multi-Speaker Text-to-Speech Synthesis Using Discrete Speech Representation
INTERSPEECH 2020
Defense for Black-Box Attacks on Anti-Spoofing Models by Self-Supervised Learning
INTERSPEECH 2020
Understanding Self-Attention of Self-Supervised Audio Transformers
INTERSPEECH 2020
SpeechBERT: An Audio-and-Text Jointly Learned Language Model for End-to-End Spoken Question Answering
INTERSPEECH 2020
VQVC+: One-Shot Voice Conversion by Vector Quantization and U-Net Architecture
INTERSPEECH 2020
Tree Transformer: Integrating Tree Structures into Self-Attention
IJCNLP 2019
Polly Want a Cracker: Analyzing Performance of Parroting on Paraphrase Generation Datasets
EMNLP 2019
Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model
EMNLP 2019
Completely Unsupervised Phoneme Recognition by a Generative Adversarial Network Harmonized with Iteratively Refined Hidden Markov Models
INTERSPEECH 2019
Improved Speech Separation with Time-and-Frequency Cross-Domain Joint Embedding and Clustering
INTERSPEECH 2019
Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion
INTERSPEECH 2019
Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech
INTERSPEECH 2019
One-Shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization
INTERSPEECH 2019
Code-Switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation
INTERSPEECH 2019
Personalized Dialogue Response Generation Learned from Monologues
INTERSPEECH 2019
Tree Transformer: Integrating Tree Structures into Self-Attention
EMNLP 2019
DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs
EMNLP 2019
Polly Want a Cracker: Analyzing Performance of Parroting on Paraphrase Generation Datasets
IJCNLP 2019
Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model
IJCNLP 2019
DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs
IJCNLP 2019
Noise Adaptive Speech Enhancement Using Domain Adversarial Training
INTERSPEECH 2019
End-to-End Text-to-Speech for Low-Resource Languages by Cross-Lingual Transfer Learning
INTERSPEECH 2019
Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings
INTERSPEECH 2018
Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening Comprehension
INTERSPEECH 2018
Joint Learning of Interactive Spoken Content Retrieval and Trainable User Simulator
INTERSPEECH 2018
Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations
INTERSPEECH 2018
Supervised and Unsupervised Transfer Learning for Question Answering
NAACL 2018
Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks
EMNLP 2018
Learning Chinese Word Representations From Glyphs Of Characters
EMNLP 2017
Gate Activation Signal Analysis for Gated Recurrent Neural Networks and its Correlation with Phoneme Boundaries
INTERSPEECH 2017
Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification
INTERSPEECH 2017
Audio Word2Vec: Unsupervised Learning of Audio Segment Representations Using Sequence-to-Sequence Autoencoder
INTERSPEECH 2016
Interactive Spoken Content Retrieval by Deep Reinforcement Learning
INTERSPEECH 2016
Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection
INTERSPEECH 2016
Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening Comprehension Test by Machine
INTERSPEECH 2016