Yuexian Zou
89 papers · 2018–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (17) π Renaissance Researcher (5) π Interdisciplinary Bridge π Conference Polyglot (12)
π
Academic Marathon
(7)
πΊοΈ
Taxonomy Completionist
(17)
π§
Keyword Pioneer
π
Conference Loyalist
(29)
π€
Dynamic Duo
(32)
π¬
Deep Specialist
(21)
π
Keyword Champion
(5)
β‘
Prolific Year
(8)
β
The Questioner
ποΈ
Keyword Collector
(358)
π
Trend Setter
π
Century Club
(87)
π₯
Unstoppable
(8)
Conferences
INTERSPEECH (29)
AAAI (14)
ACL (11)
EMNLP (10)
COLING (5)
CVPR (5)
IJCAI (5)
ECCV (2)
ICCV (2)
ICLR (2)
NAACL (2)
NIPS (2)
Top co-authors
Keywords
spoken language understanding
(15)
contrastive learning
(14)
attention mechanism
(12)
slot filling
(11)
multimodal learning
(11)
intent detection
(11)
neural network
(7)
knowledge distillation
(6)
audio-text retrieval
(5)
automatic speech recognition
(5)
representation learning
(5)
vision-language model
(5)
self-supervised learning
(4)
zero-shot learning
(4)
weakly supervised learning
(4)
pre-trained language model
(4)
video grounding
(4)
cross-modal learning
(4)
metric learning
(4)
speech recognition
(4)
Papers
Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation
AAAI 2026
WhisperDiari: A Whisper-Based Speaker Diarization Framework in Token Space Leveraging Semantic and Speaker Information for Better Text Adaptability
AAAI 2026
VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification
CVPR 2025
Image Conductor: Precision Control for Interactive Video Synthesis
AAAI 2025
ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors
ACL 2025
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding
ACL 2025
UniCoTT: A Unified Framework for Structural Chain-of-Thought Distillation
ICLR 2025
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
AAAI 2024
MoE-SLU: Towards ASR-Robust Spoken Language Understanding via Mixture-of-Experts
ACL 2024
On the Worst Prompt Performance of Large Language Models
NIPS 2024
Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning
AAAI 2024
Towards Multi-Intent Spoken Language Understanding via Hierarchical Attention and Optimal Transport
AAAI 2024
Exploiting Auxiliary Caption for Video Grounding
AAAI 2024
AlignerΒ²: Enhancing Joint Multiple Intent Detection and Slot Filling via Adjustive and Forced Cross-Task Alignment
AAAI 2024
Towards Explainable Joint Models via Information Theory for Multiple Intent Detection and Slot Filling
AAAI 2024
PCAD: Towards ASR-Robust Spoken Language Understanding via Prototype Calibration and Asymmetric Decoupling
ACL 2024
Soul-Mix: Enhancing Multimodal Machine Translation with Manifold Mixup
ACL 2024
Code-Switching Can be Better Aligners: Advancing Cross-Lingual SLU through Representation-Level and Prediction-Level Alignment
ACL 2024
Cyclical Contrastive Learning Based on Geodesic for Zero-shot Cross-lingual Spoken Language Understanding
ACL 2024
Knowledge-enhanced Prompt Tuning for Dialogue-based Relation Extraction with Trigger and Label Semantic
COLING 2024
Towards Multi-modal Sarcasm Detection via Disentangled Multi-grained Multi-modal Distilling
COLING 2024
KDProR: A Knowledge-Decoupling Probabilistic Framework for Video-Text Retrieval
ECCV 2024
Relevance Is a Guiding Light: Relevance-aware Adaptive Learning for End-to-end Task-oriented Dialogue System
EMNLP 2024
What are the Generator Preferences for End-to-end Task-Oriented Dialog System?
EMNLP 2024
Dual-oriented Disentangled Network with Counterfactual Intervention for Multimodal Intent Detection
EMNLP 2024
Game on Tree: Visual Hallucination Mitigation via Coarse-to-Fine View Tree and Game Theory
EMNLP 2024
Learning to Match Representations is Better for End-to-End Task-Oriented Dialog System
EMNLP 2024
Retrieval is Accurate Generation
ICLR 2024
Generating More Audios for End-to-End Spoken Language Understanding
IJCAI 2024
AFL-Net: Integrating Audio, Facial, and Lip Modalities with a Two-step Cross-attention for Robust Speaker Diarization in the Wild
INTERSPEECH 2024
Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation
INTERSPEECH 2024
DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval
INTERSPEECH 2024
GPA: Global and Prototype Alignment for Audio-Text Retrieval
INTERSPEECH 2024
MaCSC: Towards Multimodal-augmented Pre-trained Language Models via Conceptual Prototypes and Self-balancing Calibration
NAACL 2024
Towards Unified Spoken Language Understanding Decoding via Label-aware Compact Linguistics Representations
ACL 2023
Iterative Proposal Refinement for Weakly-Supervised Video Grounding
CVPR 2023
ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding
ACL 2023
Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
ACL 2023
FC-MTLF: A Fine- and Coarse-grained Multi-Task Learning Framework for Cross-Lingual Spoken Language Understanding
INTERSPEECH 2023
Enhancing Code-Switching for Cross-lingual SLU: A Unified View of Semantic and Grammatical Coherence
EMNLP 2023
Accelerating Multiple Intent Detection and Slot Filling via Targeted Knowledge Distillation
EMNLP 2023
MRRL: Modifying the Reference via Reinforcement Learning for Non-Autoregressive Joint Multiple Intent Detection and Slot Filling
EMNLP 2023
GhostT5: Generate More Features with Cheap Operations to Improve Textless Spoken Question Answering
INTERSPEECH 2023
Background-aware Modeling for Weakly Supervised Sound Event Detection
INTERSPEECH 2023
Mix before Align: Towards Zero-shot Cross-lingual Sentiment Analysis via Soft-Mix and Multi-View Learning
INTERSPEECH 2023
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
INTERSPEECH 2023
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
INTERSPEECH 2023
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
ICCV 2023
G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory
ICCV 2023
CΒ²A-SLU: Cross and Contrastive Attention for Improving ASR Robustness in Spoken Language Understanding
INTERSPEECH 2023
FTM: A Frame-Level Timeline Modeling Method for Temporal Graph Representation Learning
AAAI 2023
FiTs: Fine-Grained Two-Stage Training for Knowledge-Aware Question Answering
AAAI 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
ACL 2023
A Transformer-based Threshold-Free Framework for Multi-Intent NLU
COLING 2022
LocVTP: Video-Text Pre-training for Temporal Localization
ECCV 2022
End-to-end Spoken Conversational Question Answering: Task, Dataset and Model
NAACL 2022
Towards Joint Intent Detection and Slot Filling via Higher-order Attention
IJCAI 2022
RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection
INTERSPEECH 2022
Improving Target Sound Extraction with Timestamp Information
INTERSPEECH 2022
Audio Pyramid Transformer with Domain Adaption for Weakly Supervised Sound Event Detection and Audio Classification
INTERSPEECH 2022
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
INTERSPEECH 2022
Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction
INTERSPEECH 2022
Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches
INTERSPEECH 2022
Unsupervised Pre-Training for Temporal Action Localization Tasks
CVPR 2022
Semantic Transportation Prototypical Network for Few-Shot Intent Detection
INTERSPEECH 2021
Unsupervised Multi-Target Domain Adaptation for Acoustic Scene Classification
INTERSPEECH 2021
Contextualized Attention-Based Knowledge Transfer for Spoken Conversational Question Answering
INTERSPEECH 2021
Text Anchor Based Metric Learning for Small-Footprint Keyword Spotting
INTERSPEECH 2021
Audio-Oriented Multimodal Machine Comprehension via Dynamic Inter- and Intra-modality Attention
AAAI 2021
Non-Autoregressive Coarse-to-Fine Video Captioning
AAAI 2021
MRD-Net: Multi-Modal Residual Knowledge Distillation for Spoken Question Answering
IJCAI 2021
Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering
EMNLP 2021
CoLA: Weakly-Supervised Temporal Action Localization With Snippet Contrastive Learning
CVPR 2021
Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation
CVPR 2021
On Pursuit of Designing Multi-modal Transformer for Video Grounding
EMNLP 2021
RR-Net: Injecting Interactive Semantics in Human-Object Interaction Detection
IJCAI 2021
Self-Supervised Dialogue Learning for Spoken Conversational Question Answering
INTERSPEECH 2021
SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification
INTERSPEECH 2021
Prophet Attention: Predicting Attention with Future Attention
NIPS 2020
A Graph-based Interactive Reasoning for Human-Object Interaction Detection
IJCAI 2020
Federated Learning for Spoken Language Understanding
COLING 2020
Gated Multi-Head Attention Pooling for Weakly Labelled Audio Tagging
INTERSPEECH 2020
Environmental Sound Classification with Parallel Temporal-Spectral Attention
INTERSPEECH 2020
Deep Speaker Embedding with Long Short Term Centroid Learning for Text-Independent Speaker Verification
INTERSPEECH 2020
Federated Learning for Vision-and-Language Grounding Problems
AAAI 2020
Rethinking Skip Connection with Layer Normalization
COLING 2020
Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information
INTERSPEECH 2019
Investigation on Joint Representation Learning for Robust Feature Extraction in Speech Emotion Recognition
INTERSPEECH 2018
Joint Noise and Reverberation Adaptive Learning for Robust Speaker DOA Estimation with an Acoustic Vector Sensor
INTERSPEECH 2018