Yu Wu
105 papers · 2013–2026 · 16 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (18) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π£ Hot Topic Early Bird
π
Renaissance Researcher
(6)
π
Interdisciplinary Bridge
π
Conference Polyglot
(15)
π
Keyword Trendsetter Combo
(5)
π€
Dynamic Duo
(27)
π
Triple Crown
π
Grand Slam
π¬
Deep Specialist
(13)
π
Keyword Champion
π
Trend Setter
π
Conference Pioneer
π₯
Unstoppable
(11)
β‘
Prolific Year
(12)
ποΈ
Keyword Collector
(55)
π
Century Club
(102)
β
The Questioner
Conferences
CVPR (19)
INTERSPEECH (18)
ACL (10)
AAAI (9)
NIPS (8)
EMNLP (7)
ICCV (7)
ECCV (6)
ICML (6)
IJCNLP (5)
ICLR (3)
COLING (2)
IJCAI (2)
EACL (1)
MLHC (1)
SEMEVAL (1)
Top co-authors
Research topics
Keywords
automatic speech recognition
(11)
self-supervised learning
(7)
end-to-end speech recognition
(5)
speech recognition
(4)
unsupervised learning
(4)
semantic segmentation
(4)
transformer transducer
(4)
speech translation
(4)
person re-identification
(4)
video understanding
(4)
end-to-end model
(4)
diffusion model
(4)
object detection
(4)
text generation
(4)
image segmentation
(3)
representation learning
(3)
weakly supervised learning
(3)
generative model
(3)
model compression
(3)
reinforcement learning
(3)
Papers
Text-based Aerial-Ground Person Retrieval
AAAI 2026
Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark
EACL 2026
Breaking the Generator Barrier: Disentangled Representation for Generalizable AI-Text Detection
ACL 2026
Spotlighting Partially Visible Cinematic Language for Video-to-Audio Generation via Self-distillation
IJCAI 2025
CodeIO: Condensing Reasoning Patterns via Code Input-Output Prediction
ICML 2025
D^3: Scaling Up Deepfake Detection by Learning from Discrepancy
CVPR 2025
Learning to Help in Multi-Class Settings
ICLR 2025
Rethinking Query-based Transformer for Continual Image Segmentation
CVPR 2025
Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement
CVPR 2025
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
CVPR 2025
Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model
CVPR 2025
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration
AAAI 2025
Efficient Robustness Evaluation via Constraint Relaxation
AAAI 2025
Implicit Bias Injection Attacks against Text-to-Image Diffusion Models
CVPR 2025
The Silent Assistant: NoiseQuery as Implicit Guidance for Goal-Driven Image Generation
ICCV 2025
Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning
CVPR 2024
Iterative Ensemble Training with Anti-Gradient Control for Mitigating Memorization in Diffusion Models
ECCV 2024
ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization
NIPS 2024
RobIR: Robust Inverse Rendering for High-Illumination Scenes
NIPS 2024
Toward Real Ultra Image Segmentation: Leveraging Surrounding Context to Cultivate General Segmentation Model
NIPS 2024
An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding
ECCV 2024
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
ICML 2024
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
ACL 2024
Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation
ICLR 2024
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
EMNLP 2024
Omni-Q: Omni-Directional Scene Understanding for Unsupervised Visual Grounding
CVPR 2024
Improving Bird's Eye View Semantic Segmentation by Task Decomposition
CVPR 2024
Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation
ICLR 2023
LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers
INTERSPEECH 2023
Grounded Image Text Matching with Mismatched Relation Reasoning
ICCV 2023
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World
ICCV 2023
DVIS: Decoupled Video Instance Segmentation Framework
ICCV 2023
GRAVO: Learning to Generate Relevant Audio from Visual Features with Noisy Online Videos
INTERSPEECH 2023
RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open Environments
NIPS 2023
Boundary Guided Learning-Free Semantic Control with Diffusion Models
NIPS 2023
Learning To Segment Every Referring Object Point by Point
CVPR 2023
Good Is Bad: Causality Inspired Cloth-Debiasing for Cloth-Changing Person Re-Identification
CVPR 2023
Accelerating Transducers through Adjacent Token Merging
INTERSPEECH 2023
BEATs: Audio Pre-Training with Acoustic Tokenizers
ICML 2023
Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
NIPS 2023
Accurate and Structured Pruning for Efficient Automatic Speech Recognition
INTERSPEECH 2023
Magneto: A Foundation Transformer
ICML 2023
UDAMA: Unsupervised Domain Adaptation through Multi-discriminator Adversarial Training with Noisy Labels Improves Cardio-fitness Prediction
MLHC 2023
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition
INTERSPEECH 2022
Enabling Detailed Action Recognition Evaluation Through Video Dataset Augmentation
NIPS 2022
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
ACL 2022
Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark
CVPR 2022
Learning To Learn by Jointly Optimizing Neural Architecture and Weights
CVPR 2022
Multi-Query Video Retrieval
ECCV 2022
SiRi: A Simple Selective Retraining Mechanism for Transformer-Based Visual Grounding
ECCV 2022
Quantized GAN for Complex Music Generation from Dance Videos
ECCV 2022
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
INTERSPEECH 2022
Two-Stream Network for Sign Language Recognition and Translation
NIPS 2022
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training
INTERSPEECH 2022
Speech Pre-training with Acoustic Piece
INTERSPEECH 2022
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
INTERSPEECH 2022
Streaming Multi-Talker ASR with Token-Level Serialized Output Training
INTERSPEECH 2022
Template-Based Named Entity Recognition Using BART
ACL 2021
On Commonsense Cues in BERT for Solving Commonsense Tasks
ACL 2021
Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing
CVPR 2021
VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild
CVPR 2021
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition
INTERSPEECH 2021
Detecting Speaker Personas from Conversational Texts
EMNLP 2021
Template-Based Named Entity Recognition Using BART
IJCNLP 2021
On Commonsense Cues in BERT for Solving Commonsense Tasks
IJCNLP 2021
Knowledge Enhanced Fine-Tuning for Better Handling Unseen Entities in Dialogue Generation
EMNLP 2021
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone
INTERSPEECH 2021
Improving Multilingual Transformer Transducer Models by Reducing Language Confusions
INTERSPEECH 2021
Investigation of Practical Aspects of Single Channel Speech Separation for ASR
INTERSPEECH 2021
Ultra Fast Speech Separation Model with Teacher Student Learning
INTERSPEECH 2021
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data
ICML 2021
Symbiotic Attention with Privileged Information for Egocentric Action Recognition
AAAI 2020
Imitative Non-Autoregressive Modeling for Trajectory Forecasting and Imputation
CVPR 2020
Gated Channel Transformation for Visual Recognition
CVPR 2020
Unsupervised Person Re-Identification via Softened Similarity Learning
CVPR 2020
Formality Style Transfer with Shared Latent Space
COLING 2020
Curriculum Pre-training for End-to-End Speech Translation
ACL 2020
A Retrieve-and-Rewrite Initialization Method for Unsupervised Machine Translation
ACL 2020
MuTual: A Dataset for Multi-Turn Dialogue Reasoning
ACL 2020
On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition
INTERSPEECH 2020
Semantic Mask for Transformer Based End-to-End Speech Recognition
INTERSPEECH 2020
Low Latency End-to-End Streaming Speech Recognition with a Scout Network
INTERSPEECH 2020
A Dataset for Low-Resource Stylized Sequence-to-Sequence Generation
AAAI 2020
Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation
AAAI 2020
RobuTrans: A Robust Transformer-Based Text-to-Speech Model
AAAI 2020
Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents
ECCV 2020
Pose-Guided Feature Alignment for Occluded Person Re-Identification
ICCV 2019
Response Generation by Context-Aware Prototype Editing
AAAI 2019
Dictionary-Guided Editing Networks for Paraphrase Generation
AAAI 2019
Harnessing Pre-Trained Neural Networks with Rules for Formality Style Transfer
IJCNLP 2019
Unsupervised Context Rewriting for Open Domain Conversation
IJCNLP 2019
Explicit Cross-lingual Pre-training for Unsupervised Machine Translation
IJCNLP 2019
Auto-ReID: Searching for a Part-Aware ConvNet for Person Re-Identification
ICCV 2019
Dual Attention Matching for Audio-Visual Event Localization
ICCV 2019
Explicit Cross-lingual Pre-training for Unsupervised Machine Translation
EMNLP 2019
Unsupervised Context Rewriting for Open Domain Conversation
EMNLP 2019
Harnessing Pre-Trained Neural Networks with Rules for Formality Style Transfer
EMNLP 2019
Exploit the Unknown Gradually: One-Shot Video-Based Person Re-Identification by Stepwise Learning
CVPR 2018
Learning Matching Models with Weak Supervision for Response Selection in Retrieval-based Chatbots
ACL 2018
Keyphrase Generation with Correlation Constraints
EMNLP 2018
Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots
ACL 2017
Beihang-MSRA at SemEval-2017 Task 3: A Ranking System with Neural Matching Features for Community Question Answering
SEMEVAL 2017
Detecting Context Dependent Messages in a Conversational Environment
COLING 2016
Inapproximability of Treewidth and Related Problems (Extended Abstract)
IJCAI 2015
Learning Fair Representations
ICML 2013