Linchao Zhu

62 papers · 2017–2026 · 10 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (10) 🏃 Academic Marathon (8) 🌈 Renaissance Researcher (12) 🗺️ Taxonomy Completionist (98)

🗺️ Taxonomy Completionist (98) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏠 Conference Loyalist (21) 🔬 Deep Specialist (17) 🏆 Grand Slam 🤝 Dynamic Duo (53) 🧬 Topic Evolution 📈 Trend Setter 🚀 Conference Pioneer 🗃️ Keyword Collector (260) ⚡ Prolific Year (10) 🔥 Unstoppable (9) 💎 Century Club (60)

Conferences

CVPR (21) ICCV (13) AAAI (7) ACL (6) ECCV (4) ICLR (4) NIPS (3) EMNLP (2) ICML (1) IJCAI (1)

Top co-authors

Yi Yang (53) Xiaohan Wang (9) Fan Ma (7) Yuanzhi Liang (4) Yu Wu (4) Hehe Fan (4) Yucheng Suo (4) Zhongwen Xu (3) Aming Wu (3) Heng Wang (3)

Keywords

multimodal learning (6) domain adaptation (4) convolutional neural network (4) transfer learning (3) contrastive learning (3) few-shot learning (3) recurrent neural network (3) video understanding (3) video classification (3) multi-modal learning (3) visual grounding (3) reinforcement learning (3) feature learning (2) question answering (2) representation learning (2) zero-shot learning (2) temporal reasoning (2) image captioning (2) prototype learning (2) object detection (2)

Papers

Attention as Selector: Unlocking VLM Attention for Long Document Page Retrieval ACL 2026 How to Improve LLMs’ Performance on Specific Languages: A Perspective on LLM-Derived Language Similarity ACL 2026 Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection ICLR 2025 VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing ICLR 2025 Holistic Physics Solver: Learning PDEs in a Unified Spectral-Physical Space ICML 2025 H3R: Hybrid Multi-view Correspondence for Generalizable 3D Reconstruction ICCV 2025 MuTIS: Enhancing Reasoning Efficiency through Multi Turn Intervention Sampling in Reinforcement Learning EMNLP 2025 MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs ICCV 2025 Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback AAAI 2025 Scalable Vision-Language Understanding and Generation AAAI 2025 HUST: High-Fidelity Unbiased Skin Tone Estimation via Texture Quantization ICCV 2025 From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment ICCV 2025 Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval CVPR 2024 Stitching Segments and Sentences towards Generalization in Video-Text Pre-training AAAI 2024 DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval AAAI 2024 VillagerAgent: A Graph-Based Multi-Agent Framework for Coordinating Complex Task Dependencies in Minecraft ACL 2024 FragRel: Exploiting Fragment-level Relations in the External Memory of Large Language Models ACL 2024 Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models ICLR 2024 FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention NIPS 2024 CapHuman: Capture Your Moments in Parallel Universes CVPR 2024 WhitenedCSE: Whitening-based Contrastive Learning of Sentence Embeddings ACL 2023 Text Augmented Spatial Aware Zero-shot Referring Image Segmentation EMNLP 2023 MIST: Multi-Modal Iterative Spatial-Temporal Transformer for Long-Form Video Question Answering CVPR 2023 MAAL: Multimodality-Aware Autoencoder-Based Affordance Learning for 3D Articulated Objects ICCV 2023 Efficient Multimodal Fusion via Interactive Prompting CVPR 2023 PointListNet: Deep Learning on 3D Point Lists CVPR 2023 DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training ICLR 2023 Gloss-Free End-to-End Sign Language Translation ACL 2023 Fine-Grained Semantically Aligned Vision-Language Pre-Training NIPS 2022 Compositional Temporal Grounding With Structured Variational Cross-Graph Correspondence Learning CVPR 2022 Complex Video Action Reasoning via Learnable Markov Logic Network CVPR 2022 Unified Transformer Tracker for Object Tracking CVPR 2022 SEEG: Semantic Energized Co-Speech Gesture Generation CVPR 2022 A Simple Episodic Linear Probe Improves Visual Recognition in the Wild CVPR 2022 OpenMix: Reviving Known Knowledge for Discovering Novel Visual Categories in an Open World CVPR 2021 Faster Meta Update Strategy for Noise-Robust Deep Learning CVPR 2021 T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval CVPR 2021 Interactive Prototype Learning for Egocentric Action Recognition ICCV 2021 Universal-Prototype Enhancing for Few-Shot Object Detection ICCV 2021 A Multi-Mode Modulator for Multi-Domain Few-Shot Classification ICCV 2021 Vector-Decomposed Disentanglement for Domain-Invariant Object Detection ICCV 2021 Adaptive Hierarchical Graph Reasoning With Semantic Coherence for Video-and-Language Inference ICCV 2021 Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration CVPR 2020 Semantic Correspondence as an Optimal Transport Problem CVPR 2020 Inflated Episodic Memory With Region Self-Attention for Long-Tailed Visual Recognition CVPR 2020 Learning to Transfer Learn: Reinforcement Learning-Based Selection for Adaptive Transfer Learning ECCV 2020 Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior ECCV 2020 Symbiotic Attention with Privileged Information for Egocentric Action Recognition AAAI 2020 SF-Net: Single-Frame Supervision for Temporal Action Localization ECCV 2020 FASTER Recurrent Networks for Efficient Video Classification AAAI 2020 Gated Channel Transformation for Visual Recognition CVPR 2020 ActBERT: Learning Global-Local Video-Text Representations CVPR 2020 Auto-ReID: Searching for a Part-Aware ConvNet for Person Re-Identification ICCV 2019 Entangled Transformer for Image Captioning ICCV 2019 Cubic LSTMs for Video Prediction AAAI 2019 Connective Cognition Network for Directional Visual Commonsense Reasoning NIPS 2019 Sim-Real Joint Reinforcement Transfer for 3D Indoor Navigation CVPR 2019 Dual Attention Matching for Audio-Visual Event Localization ICCV 2019 Watching a Small Portion could be as Good as Watching All: Towards Efficient Video Classification IJCAI 2018 Compound Memory Networks for Few-shot Video Classification ECCV 2018 Few-Shot Object Recognition From Machine-Labeled Web Images CVPR 2017 Bidirectional Multirate Reconstruction for Temporal Modeling in Videos CVPR 2017