Yi Zhu

66 papers · 2017–2025 · 18 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌍 Conference Polyglot (18) 🏃 Academic Marathon (8) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (7)

🌈 Renaissance Researcher (10) 🐣 Hot Topic Early Bird 🌍 Conference Polyglot (18) 🤝 Dynamic Duo (14) 🏆 Grand Slam 👥 Mega-Team (30) 🔬 Deep Specialist (11) 🧬 Topic Evolution 🗃️ Keyword Collector (317) 📈 Trend Setter ⚡ Prolific Year (13) 🚀 Conference Pioneer 🔥 Unstoppable (9) 💎 Century Club (66)

Conferences

CVPR (12) NIPS (10) ICCV (9) WACV (5) ICLR (4) EMNLP (4) ACL (4) AAAI (4) NAACL (3) COLING (2) ICML (2) ECCV (1) EACL (1) CONLL (1) IJCNLP (1) INTERSPEECH (1) JMLR (1) OSDI (1)

Top co-authors

Xiaodan Liang (14) Mu Li (10) Jipeng Qiang (8) Yun Li (7) Hang Xu (6) Jianbin Jiao (6) Anna Korhonen (6) Jianzhuang Liu (6) Yunhao Yuan (5) Qixiang Ye (5)

Keywords

semantic segmentation (9) large language model (6) contrastive learning (6) vision-language navigation (5) domain adaptation (4) action recognition (3) convolutional neural network (3) reinforcement learning (3) multi-modal learning (3) cross-modal learning (3) object localization (3) speech synthesis (3) transfer learning (3) self-supervised learning (3) text generation (3) instance segmentation (3) zero-shot learning (2) few-shot learning (2) representation learning (2) text classification (2)

Papers

CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image CVPR 2025 DisCo: Discovering Common Affordance from Large Models for Actionable Part Perception WACV 2025 Post-Hoc Watermarking for Robust Detection in Text Generated by Large Language Models COLING 2025 AI4Reading: Chinese Audiobook Interpretation System Based on Multi-Agent Collaboration ACL 2025 Collaborative Document Simplification Using Multi-Agent Systems COLING 2025 Differential Transformer ICLR 2025 Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance EMNLP 2025 EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions CVPR 2025 rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking ICML 2025 UNIT: Unifying Image and Text Recognition in One Vision Encoder NIPS 2024 nnScaler: Constraint-Guided Parallelization Plan Generation for Deep Learning Training OSDI 2024 You Only Cache Once: Decoder-Decoder Architectures for Language Models NIPS 2024 VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation NIPS 2024 SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection NIPS 2024 ParaLS: Lexical Substitution via Pretrained Paraphraser ACL 2023 Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition NIPS 2023 PreDiff: Precipitation Nowcasting with Latent Diffusion Models NIPS 2023 Actional Atomic-Concept Learning for Demystifying Vision-Language Navigation AAAI 2023 Tailoring Instructions to Student’s Learning Levels Boosts Knowledge Distillation ACL 2023 Chinese Lexical Substitution: Dataset and Method EMNLP 2023 MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation ICCV 2023 Motion-Guided Masking for Spatiotemporal Representation Learning ICCV 2023 Towards Geospatial Foundation Models via Continual Pretraining ICCV 2023 ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View Semantic Consistency ICLR 2023 AIM: Adapting Image Models for Efficient Video Action Recognition ICLR 2023 Unsupervised Semantic Segmentation with Self-supervised Object-centric Representations ICLR 2023 ImpDet: Exploring Implicit Fields for 3D Object Detection WACV 2023 RelCLIP: Adapting Language-Image Pretraining for Visual Relationship Detection via Relational Contrastive Learning EMNLP 2022 Earthformer: Exploring Space-Time Transformers for Earth System Forecasting NIPS 2022 Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition ICML 2022 NUTA: Non-Uniform Temporal Aggregation for Action Recognition WACV 2022 CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation NIPS 2022 Contrastive Instruction-Trajectory Learning for Vision-Language Navigation AAAI 2022 ADAPT: Vision-Language Navigation With Modality-Aligned Action Prompts CVPR 2022 Cross-modal Transfer Learning via Multi-grained Alignment for End-to-End Spoken Language Understanding INTERSPEECH 2022 Learning Canonical F-Correlation Projection for Compact Multiview Representation CVPR 2022 Domain Consensus Clustering for Universal Domain Adaptation CVPR 2021 SOON: Scenario Oriented Object Navigation With Graph-Based Exploration CVPR 2021 A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters ACL 2021 Combining Deep Generative Models and Multi-lingual Pretraining for Semi-supervised Document Classification EACL 2021 An Unsupervised Method for Building Sentence Simplification Corpora in Multiple Languages EMNLP 2021 CrossCLR: Cross-Modal Contrastive Learning for Multi-Modal Video Representations ICCV 2021 VidTr: Video Transformer Without Convolutions ICCV 2021 Self-Motivated Communication Agent for Real-World Vision-Dialog Navigation ICCV 2021 CrossNorm and SelfNorm for Generalization Under Distribution Shifts ICCV 2021 Scale Aware Adaptation for Land-Cover Classification in Remote Sensing Imagery WACV 2021 Progressive Coordinate Transforms for Monocular 3D Object Detection NIPS 2021 A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters IJCNLP 2021 Blending Anti-Aliasing into Vision Transformer NIPS 2021 GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing JMLR 2020 Lexical Simplification with Pretrained Encoders AAAI 2020 Vision-Dialog Navigation by Exploring Cross-Modal Memory CVPR 2020 Cross-Time and Orientation-Invariant Overhead Image Geolocalization Using Deep Local Features WACV 2020 Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks CVPR 2020 Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior ECCV 2020 Bayesian Learning for Neural Dependency Parsing NAACL 2019 Tensor Decomposition for Multilayer Networks Clustering AAAI 2019 Selective Sparse Sampling for Fine-Grained Image Recognition ICCV 2019 A Systematic Study of Leveraging Subword Information for Learning Word Representations NAACL 2019 Learning Instance Activation Maps for Weakly Supervised Instance Segmentation CVPR 2019 On the Importance of Subword Information for Morphological Tasks in Truly Low-Resource Languages CONLL 2019 Improving Semantic Segmentation via Video Propagation and Label Relaxation CVPR 2019 Parsing Tweets into Universal Dependencies NAACL 2018 Weakly Supervised Instance Segmentation Using Class Peak Response CVPR 2018 Towards Universal Representation for Unseen Action Recognition CVPR 2018 Soft Proposal Networks for Weakly Supervised Object Localization ICCV 2017