conftrace_

Yuexian Zou

89 papers · 2018–2026 · 12 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+13 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (17) 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (12)

🏃 Academic Marathon (7) 🗺️ Taxonomy Completionist (17) 🧭 Keyword Pioneer 🏠 Conference Loyalist (29) 🤝 Dynamic Duo (32) 🔬 Deep Specialist (21) 🏆 Keyword Champion (5) ⚡ Prolific Year (8) ❓ The Questioner 🗃️ Keyword Collector (358) 📈 Trend Setter 💎 Century Club (87) 🔥 Unstoppable (8)

Conferences

INTERSPEECH (29) AAAI (14) ACL (11) EMNLP (10) COLING (5) CVPR (5) IJCAI (5) ECCV (2) ICCV (2) ICLR (2) NAACL (2) NIPS (2)

Top co-authors

Xuxin Cheng (32) Zhihong Zhu (30) Xianwei Zhuang (17) Hongxiang Li (15) Yaowei Li (13) Zhiqi Huang (12) Fenglin Liu (10) Dongchao Yang (10) Nuo Chen (8) Meng Cao (8)

Keywords

spoken language understanding (15) contrastive learning (14) attention mechanism (12) slot filling (11) multimodal learning (11) intent detection (11) neural network (7) knowledge distillation (6) audio-text retrieval (5) automatic speech recognition (5) representation learning (5) vision-language model (5) self-supervised learning (4) zero-shot learning (4) weakly supervised learning (4) pre-trained language model (4) video grounding (4) cross-modal learning (4) metric learning (4) speech recognition (4)

Papers

Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation AAAI 2026 WhisperDiari: A Whisper-Based Speaker Diarization Framework in Token Space Leveraging Semantic and Speaker Information for Better Text Adaptability AAAI 2026 VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification CVPR 2025 Image Conductor: Precision Control for Interactive Video Synthesis AAAI 2025 ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors ACL 2025 Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding ACL 2025 UniCoTT: A Unified Framework for Structural Chain-of-Thought Distillation ICLR 2025 AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head AAAI 2024 MoE-SLU: Towards ASR-Robust Spoken Language Understanding via Mixture-of-Experts ACL 2024 On the Worst Prompt Performance of Large Language Models NIPS 2024 Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning AAAI 2024 Towards Multi-Intent Spoken Language Understanding via Hierarchical Attention and Optimal Transport AAAI 2024 Exploiting Auxiliary Caption for Video Grounding AAAI 2024 Aligner²: Enhancing Joint Multiple Intent Detection and Slot Filling via Adjustive and Forced Cross-Task Alignment AAAI 2024 Towards Explainable Joint Models via Information Theory for Multiple Intent Detection and Slot Filling AAAI 2024 PCAD: Towards ASR-Robust Spoken Language Understanding via Prototype Calibration and Asymmetric Decoupling ACL 2024 Soul-Mix: Enhancing Multimodal Machine Translation with Manifold Mixup ACL 2024 Code-Switching Can be Better Aligners: Advancing Cross-Lingual SLU through Representation-Level and Prediction-Level Alignment ACL 2024 Cyclical Contrastive Learning Based on Geodesic for Zero-shot Cross-lingual Spoken Language Understanding ACL 2024 Knowledge-enhanced Prompt Tuning for Dialogue-based Relation Extraction with Trigger and Label Semantic COLING 2024 Towards Multi-modal Sarcasm Detection via Disentangled Multi-grained Multi-modal Distilling COLING 2024 KDProR: A Knowledge-Decoupling Probabilistic Framework for Video-Text Retrieval ECCV 2024 Relevance Is a Guiding Light: Relevance-aware Adaptive Learning for End-to-end Task-oriented Dialogue System EMNLP 2024 What are the Generator Preferences for End-to-end Task-Oriented Dialog System? EMNLP 2024 Dual-oriented Disentangled Network with Counterfactual Intervention for Multimodal Intent Detection EMNLP 2024 Game on Tree: Visual Hallucination Mitigation via Coarse-to-Fine View Tree and Game Theory EMNLP 2024 Learning to Match Representations is Better for End-to-End Task-Oriented Dialog System EMNLP 2024 Retrieval is Accurate Generation ICLR 2024 Generating More Audios for End-to-End Spoken Language Understanding IJCAI 2024 AFL-Net: Integrating Audio, Facial, and Lip Modalities with a Two-step Cross-attention for Robust Speaker Diarization in the Wild INTERSPEECH 2024 Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation INTERSPEECH 2024 DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval INTERSPEECH 2024 GPA: Global and Prototype Alignment for Audio-Text Retrieval INTERSPEECH 2024 MaCSC: Towards Multimodal-augmented Pre-trained Language Models via Conceptual Prototypes and Self-balancing Calibration NAACL 2024 Towards Unified Spoken Language Understanding Decoding via Label-aware Compact Linguistics Representations ACL 2023 Iterative Proposal Refinement for Weakly-Supervised Video Grounding CVPR 2023 ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding ACL 2023 Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels ACL 2023 FC-MTLF: A Fine- and Coarse-grained Multi-Task Learning Framework for Cross-Lingual Spoken Language Understanding INTERSPEECH 2023 Enhancing Code-Switching for Cross-lingual SLU: A Unified View of Semantic and Grammatical Coherence EMNLP 2023 Accelerating Multiple Intent Detection and Slot Filling via Targeted Knowledge Distillation EMNLP 2023 MRRL: Modifying the Reference via Reinforcement Learning for Non-Autoregressive Joint Multiple Intent Detection and Slot Filling EMNLP 2023 GhostT5: Generate More Features with Cheap Operations to Improve Textless Spoken Question Answering INTERSPEECH 2023 Background-aware Modeling for Weakly Supervised Sound Event Detection INTERSPEECH 2023 Mix before Align: Towards Zero-shot Cross-lingual Sentiment Analysis via Soft-Mix and Multi-View Learning INTERSPEECH 2023 NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS INTERSPEECH 2023 Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions INTERSPEECH 2023 Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation ICCV 2023 G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory ICCV 2023 C²A-SLU: Cross and Contrastive Attention for Improving ASR Robustness in Spoken Language Understanding INTERSPEECH 2023 FTM: A Frame-Level Timeline Modeling Method for Temporal Graph Representation Learning AAAI 2023 FiTs: Fine-Grained Two-Stage Training for Knowledge-Aware Question Answering AAAI 2023 MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning ACL 2023 A Transformer-based Threshold-Free Framework for Multi-Intent NLU COLING 2022 LocVTP: Video-Text Pre-training for Temporal Localization ECCV 2022 End-to-end Spoken Conversational Question Answering: Task, Dataset and Model NAACL 2022 Towards Joint Intent Detection and Slot Filling via Higher-order Attention IJCAI 2022 RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection INTERSPEECH 2022 Improving Target Sound Extraction with Timestamp Information INTERSPEECH 2022 Audio Pyramid Transformer with Domain Adaption for Weakly Supervised Sound Event Detection and Audio Classification INTERSPEECH 2022 LAE: Language-Aware Encoder for Monolingual and Multilingual ASR INTERSPEECH 2022 Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction INTERSPEECH 2022 Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches INTERSPEECH 2022 Unsupervised Pre-Training for Temporal Action Localization Tasks CVPR 2022 Semantic Transportation Prototypical Network for Few-Shot Intent Detection INTERSPEECH 2021 Unsupervised Multi-Target Domain Adaptation for Acoustic Scene Classification INTERSPEECH 2021 Contextualized Attention-Based Knowledge Transfer for Spoken Conversational Question Answering INTERSPEECH 2021 Text Anchor Based Metric Learning for Small-Footprint Keyword Spotting INTERSPEECH 2021 Audio-Oriented Multimodal Machine Comprehension via Dynamic Inter- and Intra-modality Attention AAAI 2021 Non-Autoregressive Coarse-to-Fine Video Captioning AAAI 2021 MRD-Net: Multi-Modal Residual Knowledge Distillation for Spoken Question Answering IJCAI 2021 Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering EMNLP 2021 CoLA: Weakly-Supervised Temporal Action Localization With Snippet Contrastive Learning CVPR 2021 Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation CVPR 2021 On Pursuit of Designing Multi-modal Transformer for Video Grounding EMNLP 2021 RR-Net: Injecting Interactive Semantics in Human-Object Interaction Detection IJCAI 2021 Self-Supervised Dialogue Learning for Spoken Conversational Question Answering INTERSPEECH 2021 SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification INTERSPEECH 2021 Prophet Attention: Predicting Attention with Future Attention NIPS 2020 A Graph-based Interactive Reasoning for Human-Object Interaction Detection IJCAI 2020 Federated Learning for Spoken Language Understanding COLING 2020 Gated Multi-Head Attention Pooling for Weakly Labelled Audio Tagging INTERSPEECH 2020 Environmental Sound Classification with Parallel Temporal-Spectral Attention INTERSPEECH 2020 Deep Speaker Embedding with Long Short Term Centroid Learning for Text-Independent Speaker Verification INTERSPEECH 2020 Federated Learning for Vision-and-Language Grounding Problems AAAI 2020 Rethinking Skip Connection with Layer Normalization COLING 2020 Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information INTERSPEECH 2019 Investigation on Joint Representation Learning for Robust Feature Extraction in Speech Emotion Recognition INTERSPEECH 2018 Joint Noise and Reverberation Adaptive Learning for Robust Speaker DOA Estimation with an Acoustic Vector Sensor INTERSPEECH 2018