Yu Wu

105 papers · 2013–2026 · 16 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🗺️ Taxonomy Completionist (18) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (15) 🌟 Keyword Trendsetter Combo (5) 🤝 Dynamic Duo (27) 👑 Triple Crown 🏆 Grand Slam 🔬 Deep Specialist (13) 🏆 Keyword Champion 📈 Trend Setter 🚀 Conference Pioneer 🔥 Unstoppable (11) ⚡ Prolific Year (12) 🗃️ Keyword Collector (55) 💎 Century Club (102) ❓ The Questioner

Conferences

CVPR (19) INTERSPEECH (18) ACL (10) AAAI (9) NIPS (8) EMNLP (7) ICCV (7) ECCV (6) ICML (6) IJCNLP (5) ICLR (3) COLING (2) IJCAI (2) EACL (1) MLHC (1) SEMEVAL (1)

Top co-authors

Shujie LIU (27) Jinyu Li (16) Ming Zhou (13) Yi Yang (13) Chengyi Wang (11) Zhoujun Li (9) Furu Wei (9) Zhuo Chen (7) Ye Zhu (7) Yutian Lin (7)

Research topics

Privacy (1)

Keywords

automatic speech recognition (11) self-supervised learning (7) end-to-end speech recognition (5) speech recognition (4) unsupervised learning (4) semantic segmentation (4) transformer transducer (4) speech translation (4) person re-identification (4) video understanding (4) end-to-end model (4) diffusion model (4) object detection (4) text generation (4) image segmentation (3) representation learning (3) weakly supervised learning (3) generative model (3) model compression (3) reinforcement learning (3)

Papers

Text-based Aerial-Ground Person Retrieval AAAI 2026 Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark EACL 2026 Breaking the Generator Barrier: Disentangled Representation for Generalizable AI-Text Detection ACL 2026 Spotlighting Partially Visible Cinematic Language for Video-to-Audio Generation via Self-distillation IJCAI 2025 CodeIO: Condensing Reasoning Patterns via Code Input-Output Prediction ICML 2025 D^3: Scaling Up Deepfake Detection by Learning from Discrepancy CVPR 2025 Learning to Help in Multi-Class Settings ICLR 2025 Rethinking Query-based Transformer for Continual Image Segmentation CVPR 2025 Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement CVPR 2025 CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation CVPR 2025 Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model CVPR 2025 WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration AAAI 2025 Efficient Robustness Evaluation via Constraint Relaxation AAAI 2025 Implicit Bias Injection Attacks against Text-to-Image Diffusion Models CVPR 2025 The Silent Assistant: NoiseQuery as Implicit Guidance for Goal-Driven Image Generation ICCV 2025 Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning CVPR 2024 Iterative Ensemble Training with Anti-Gradient Control for Mitigating Memorization in Diffusion Models ECCV 2024 ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization NIPS 2024 RobIR: Robust Inverse Rendering for High-Illumination Scenes NIPS 2024 Toward Real Ultra Image Segmentation: Leveraging Surrounding Context to Cultivate General Segmentation Model NIPS 2024 An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding ECCV 2024 Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning ICML 2024 Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations ACL 2024 Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation ICLR 2024 Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models EMNLP 2024 Omni-Q: Omni-Directional Scene Understanding for Unsupervised Visual Grounding CVPR 2024 Improving Bird's Eye View Semantic Segmentation by Task Decomposition CVPR 2024 Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation ICLR 2023 LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers INTERSPEECH 2023 Grounded Image Text Matching with Mismatched Relation Reasoning ICCV 2023 Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World ICCV 2023 DVIS: Decoupled Video Instance Segmentation Framework ICCV 2023 GRAVO: Learning to Generate Relevant Audio from Visual Features with Noisy Online Videos INTERSPEECH 2023 RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open Environments NIPS 2023 Boundary Guided Learning-Free Semantic Control with Diffusion Models NIPS 2023 Learning To Segment Every Referring Object Point by Point CVPR 2023 Good Is Bad: Causality Inspired Cloth-Debiasing for Cloth-Changing Person Re-Identification CVPR 2023 Accelerating Transducers through Adjacent Token Merging INTERSPEECH 2023 BEATs: Audio Pre-Training with Acoustic Tokenizers ICML 2023 Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective NIPS 2023 Accurate and Structured Pruning for Efficient Automatic Speech Recognition INTERSPEECH 2023 Magneto: A Foundation Transformer ICML 2023 UDAMA: Unsupervised Domain Adaptation through Multi-discriminator Adversarial Training with Noisy Labels Improves Cardio-fitness Prediction MLHC 2023 Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition INTERSPEECH 2022 Enabling Detailed Action Recognition Evaluation Through Video Dataset Augmentation NIPS 2022 SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing ACL 2022 Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark CVPR 2022 Learning To Learn by Jointly Optimizing Neural Architecture and Weights CVPR 2022 Multi-Query Video Retrieval ECCV 2022 SiRi: A Simple Selective Retraining Mechanism for Transformer-Based Visual Grounding ECCV 2022 Quantized GAN for Complex Music Generation from Dance Videos ECCV 2022 Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings INTERSPEECH 2022 Two-Stream Network for Sign Language Recognition and Translation NIPS 2022 Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training INTERSPEECH 2022 Speech Pre-training with Acoustic Piece INTERSPEECH 2022 Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition? INTERSPEECH 2022 Streaming Multi-Talker ASR with Token-Level Serialized Output Training INTERSPEECH 2022 Template-Based Named Entity Recognition Using BART ACL 2021 On Commonsense Cues in BERT for Solving Commonsense Tasks ACL 2021 Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing CVPR 2021 VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild CVPR 2021 Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition INTERSPEECH 2021 Detecting Speaker Personas from Conversational Texts EMNLP 2021 Template-Based Named Entity Recognition Using BART IJCNLP 2021 On Commonsense Cues in BERT for Solving Commonsense Tasks IJCNLP 2021 Knowledge Enhanced Fine-Tuning for Better Handling Unseen Entities in Dialogue Generation EMNLP 2021 Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone INTERSPEECH 2021 Improving Multilingual Transformer Transducer Models by Reducing Language Confusions INTERSPEECH 2021 Investigation of Practical Aspects of Single Channel Speech Separation for ASR INTERSPEECH 2021 Ultra Fast Speech Separation Model with Teacher Student Learning INTERSPEECH 2021 UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data ICML 2021 Symbiotic Attention with Privileged Information for Egocentric Action Recognition AAAI 2020 Imitative Non-Autoregressive Modeling for Trajectory Forecasting and Imputation CVPR 2020 Gated Channel Transformation for Visual Recognition CVPR 2020 Unsupervised Person Re-Identification via Softened Similarity Learning CVPR 2020 Formality Style Transfer with Shared Latent Space COLING 2020 Curriculum Pre-training for End-to-End Speech Translation ACL 2020 A Retrieve-and-Rewrite Initialization Method for Unsupervised Machine Translation ACL 2020 MuTual: A Dataset for Multi-Turn Dialogue Reasoning ACL 2020 On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition INTERSPEECH 2020 Semantic Mask for Transformer Based End-to-End Speech Recognition INTERSPEECH 2020 Low Latency End-to-End Streaming Speech Recognition with a Scout Network INTERSPEECH 2020 A Dataset for Low-Resource Stylized Sequence-to-Sequence Generation AAAI 2020 Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation AAAI 2020 RobuTrans: A Robust Transformer-Based Text-to-Speech Model AAAI 2020 Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents ECCV 2020 Pose-Guided Feature Alignment for Occluded Person Re-Identification ICCV 2019 Response Generation by Context-Aware Prototype Editing AAAI 2019 Dictionary-Guided Editing Networks for Paraphrase Generation AAAI 2019 Harnessing Pre-Trained Neural Networks with Rules for Formality Style Transfer IJCNLP 2019 Unsupervised Context Rewriting for Open Domain Conversation IJCNLP 2019 Explicit Cross-lingual Pre-training for Unsupervised Machine Translation IJCNLP 2019 Auto-ReID: Searching for a Part-Aware ConvNet for Person Re-Identification ICCV 2019 Dual Attention Matching for Audio-Visual Event Localization ICCV 2019 Explicit Cross-lingual Pre-training for Unsupervised Machine Translation EMNLP 2019 Unsupervised Context Rewriting for Open Domain Conversation EMNLP 2019 Harnessing Pre-Trained Neural Networks with Rules for Formality Style Transfer EMNLP 2019 Exploit the Unknown Gradually: One-Shot Video-Based Person Re-Identification by Stepwise Learning CVPR 2018 Learning Matching Models with Weak Supervision for Response Selection in Retrieval-based Chatbots ACL 2018 Keyphrase Generation with Correlation Constraints EMNLP 2018 Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots ACL 2017 Beihang-MSRA at SemEval-2017 Task 3: A Ranking System with Neural Matching Features for Community Question Answering SEMEVAL 2017 Detecting Context Dependent Messages in a Conversational Environment COLING 2016 Inapproximability of Treewidth and Related Problems (Extended Abstract) IJCAI 2015 Learning Fair Representations ICML 2013