Kai Yu
120 papers · 2006–2026 · 13 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+18 more ↓ Show less ↑
π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (9) πΊοΈ Taxonomy Completionist (38) π£ Hot Topic Early Bird
π
Renaissance Researcher
(9)
π
Interdisciplinary Bridge
π
Cross-Pollinator
(7)
π
Conference Loyalist
(36)
π
Keyword Trendsetter Combo
(5)
π¬
Deep Specialist
(20)
π±
Topic Pioneer
π
Keyword Champion
π§¬
Topic Evolution
π₯
Mega-Team
(23)
π€
Dynamic Duo
(48)
π
Trend Setter
π
Conference Pioneer
π₯
Unstoppable
(14)
β
The Questioner
(4)
β‘
Prolific Year
(17)
π
Century Club
(116)
ποΈ
Keyword Collector
(116)
Conferences
INTERSPEECH (36)
ACL (18)
EMNLP (17)
NIPS (12)
AAAI (9)
COLING (7)
NAACL (5)
ICCV (4)
ICML (4)
IJCNLP (3)
CVPR (2)
EACL (2)
MICCAI (1)
Top co-authors
Keywords
large language model
(15)
semantic parsing
(9)
speech synthesis
(7)
domain adaptation
(6)
data augmentation
(6)
speaker verification
(5)
automatic speech recognition
(5)
speaker embedding
(5)
knowledge distillation
(5)
graph neural network
(5)
long short-term memory
(4)
model compression
(4)
vector quantization
(4)
connectionist temporal classification
(4)
text-to-speech synthesis
(4)
speech recognition
(4)
dialogue state tracking
(4)
transfer learning
(4)
unsupervised learning
(4)
semi-supervised learning
(4)
Papers
MergeDNA: Context-Aware Genome Modeling with Dynamic Tokenization Through Token Merging
AAAI 2026
BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction
EACL 2026
AHAMask: Reliable Task Specification for Large Audio Language Models Without Instructions
AAAI 2026
Phased One-Step Adversarial Equilibrium for Video Diffusion Models
AAAI 2026
Alignment for Efficient Tool Calling of Large Language Models
EMNLP 2025
When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models
EMNLP 2025
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation
NAACL 2025
Converging to a Lingua Franca: Evolution of Linguistic Regions and Semantics Alignment in Multilingual Large Language Models
COLING 2025
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
ACL 2025
NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question Answering
ACL 2025
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
ACL 2025
SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training
ACL 2025
From Generalist to Specialist: A Survey of Large Language Models for Chemistry
COLING 2025
VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization
AAAI 2025
Reducing Tool Hallucination via Reliability Alignment
ICML 2025
ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary
COLING 2025
Heads up! Large Language Models Can Perform Tasks Without Your Instruction via Selective Attention Head Masking
ICML 2025
Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video
ICCV 2025
URO-Bench: Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models
EMNLP 2025
Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation
EMNLP 2025
AlignSum: Data Pyramid Hierarchical Fine-tuning for Aligning with Human Summarization Preference
EMNLP 2024
Multilingual Brain Surgeon: Large Language Models Can Be Compressed Leaving No Language behind
COLING 2024
SPEADO: Segmentation and Punctuation for Ancient Chinese Texts via Example Augmentation and Decoding Optimization
COLING 2024
DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors
CVPR 2024
UrFound: Towards Universal Retinal Foundation Models via Knowledge-Guided Masked Modeling
MICCAI 2024
DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation
INTERSPEECH 2024
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
NIPS 2024
On the Effectiveness of Acoustic BPE in Decoder-Only TTS
INTERSPEECH 2024
Text-aware Speech Separation for Multi-talker Keyword Spotting
INTERSPEECH 2024
FakeSound: Deepfake General Audio Detection
INTERSPEECH 2024
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
AAAI 2024
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
AAAI 2024
Evolving Subnetwork Training for Large Language Models
ICML 2024
CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions
NAACL 2024
IBSEN: Director-Actor Agent Collaboration for Controllable and Interactive Drama Script Generation
ACL 2024
Sparsity-Accelerated Training for Large Language Models
ACL 2024
Is LLM a Reliable Reviewer? A Comprehensive Evaluation of LLM on Automatic Paper Reviewing Tasks
COLING 2024
UnSE: Unsupervised Speech Enhancement Using Optimal Transport
INTERSPEECH 2023
PointGPT: Auto-regressively Generative Pre-training from Point Clouds
NIPS 2023
Large Language Models Are Semi-Parametric Reinforcement Learning Agents
NIPS 2023
TeCS: A Dataset and Benchmark for Tense Consistency of Machine Translation
ACL 2023
SPM: A Split-Parsing Method for Joint Multi-Intent Detection and Slot Filling
ACL 2023
Exploring Schema Generalizability of Text-to-SQL
ACL 2023
CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset
ACL 2023
ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought
EMNLP 2023
Towards Instance-adaptive Inference for Federated Learning
ICCV 2023
Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning
ICCV 2023
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
INTERSPEECH 2023
Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation
INTERSPEECH 2023
How ChatGPT is Robust for Spoken Language Understanding?
INTERSPEECH 2023
ReCLR: Reference-Enhanced Contrastive Learning of Audio Representation for Depression Detection
INTERSPEECH 2023
Enhance Temporal Relations in Audio Captioning with Sound Event Detection
INTERSPEECH 2023
AdapterShare: Task Correlation Modeling with Adapter Differentiation
EMNLP 2022
TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages
NAACL 2022
The AISP-SJTU Translation System for WMT 2022
EMNLP 2022
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature
INTERSPEECH 2022
MSDWild: Multi-modal Speaker Diarization Dataset in the Wild
INTERSPEECH 2022
The AISP-SJTU Simultaneous Translation System for IWSLT 2022
ACL 2022
D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat
EMNLP 2022
Efficient Speech Enhancement with Neural Homomorphic Synthesis
INTERSPEECH 2022
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI
EMNLP 2022
WebSRC: A Dataset for Web-Based Structural Reading Comprehension
EMNLP 2021
LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching
AAAI 2021
Glyph Enhanced Chinese Character Pre-Training for Lexical Sememe Prediction
EMNLP 2021
Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL
IJCNLP 2021
Rich Prosody Diversity Modelling with Phone-Level Mixture Density Network
INTERSPEECH 2021
Class-Based Neural Network Language Model for Second-Pass Rescoring in ASR
INTERSPEECH 2021
A Lightweight Framework for Online Voice Activity Detection in the Wild
INTERSPEECH 2021
Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL
ACL 2021
LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations
ACL 2021
LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations
IJCNLP 2021
ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser
NAACL 2021
Dual-Adversarial Domain Adaptation for Generalized Replay Attack Detection
INTERSPEECH 2020
Neural Homomorphic Vocoder
INTERSPEECH 2020
Unsupervised Dual Paraphrasing for Two-stage Semantic Parsing
ACL 2020
Efficient Context and Schema Fusion Networks for Multi-Domain Dialogue State Tracking
EMNLP 2020
Schema-Guided Multi-Domain Dialogue State Tracking with Graph Attention Neural Networks
AAAI 2020
Neural Graph Matching Networks for Chinese Short Text Matching
ACL 2020
Semi-Supervised Text Simplification with Back-Translation and Asymmetric Denoising Autoencoders
AAAI 2020
Line Graph Enhanced AMR-to-Text Generation with Mix-Order Graph Attention Networks
ACL 2020
Voice Activity Detection in the Wild via Weakly Supervised Sound Event Detection
INTERSPEECH 2020
Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding
INTERSPEECH 2020
On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction
INTERSPEECH 2019
Semantic Parsing with Dual Learning
ACL 2019
Data Augmentation with Atomic Templates for Spoken Language Understanding
IJCNLP 2019
The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge
INTERSPEECH 2019
Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification
INTERSPEECH 2019
Joint Decoding of CTC Based Systems for Speech Recognition
INTERSPEECH 2019
Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training
INTERSPEECH 2019
Data Augmentation with Atomic Templates for Spoken Language Understanding
EMNLP 2019
Binarized LSTM Language Model
NAACL 2018
Knowledge Distillation for Sequence Model
INTERSPEECH 2018
Towards Universal Dialogue State Tracking
EMNLP 2018
Angular Softmax for Short-Duration Text-independent Speaker Verification
INTERSPEECH 2018
Structured Dialogue Policy with Graph Neural Networks
COLING 2018
High-quality Voice Conversion Using Spectrogram-Based WaveNet Vocoder
INTERSPEECH 2018
Structured Word Embedding for Low Memory Neural Network Language Model
INTERSPEECH 2018
On-line Dialogue Policy Learning with Companion Teaching
EACL 2017
What Does the Speaker Embedding Encode?
INTERSPEECH 2017
Comparison of Modeling Target in LSTM-RNN Duration Model
INTERSPEECH 2017
Discrete Duration Model for Speech Synthesis
INTERSPEECH 2017
Binary Deep Neural Networks for Speech Recognition
INTERSPEECH 2017
Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning
EMNLP 2017
Affordable On-line Dialogue Policy Learning
EMNLP 2017
Hybrid Dialogue State Tracking for Real World Human-to-Human Dialogues
INTERSPEECH 2016
Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC
INTERSPEECH 2016
Phone Synchronous Decoding with CTC Lattice
INTERSPEECH 2016
Text Flow: A Unified Text Detection System in Natural Scene Images
ICCV 2015
Deep Multiple Instance Learning for Image Classification and Auto-Annotation
CVPR 2015
Communication Efficient Distributed Machine Learning with the Parameter Server
NIPS 2014
Smooth Sparse Coding via Marginal Regression for Learning Sparse Representations
ICML 2013
Deep Learning of Invariant Features via Simulated Fixations in Video
NIPS 2012
Deep Coding Network
NIPS 2010
Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning
ACL 2010
Nonlinear Learning using Local Coordinate Coding
NIPS 2009
Stochastic Relational Models for Large-scale Dyadic Data using MCMC
NIPS 2008
Deep Learning with Kernel Regularization for Visual Recognition
NIPS 2008
Predictive Matrix-Variate t Models
NIPS 2007
Gaussian Process Models for Link Analysis and Transfer Learning
NIPS 2007
Stochastic Relational Models for Discriminative Link Prediction
NIPS 2006