Yan Lu

86 papers · 2015–2026 · 11 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🏃 Academic Marathon (10) 🌍 Conference Polyglot (11) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (6)

🐝 Cross-Pollinator (6) 🌈 Renaissance Researcher (10) 🗺️ Taxonomy Completionist (120) 🏠 Conference Loyalist (32) 🤝 Dynamic Duo (15) 🏆 Grand Slam 🏆 Keyword Champion (3) 👑 Triple Crown 🔬 Deep Specialist (16) ⚡ Prolific Year (15) 🚀 Conference Pioneer 🗃️ Keyword Collector (384) 🔥 Unstoppable (8) 📈 Trend Setter 💎 Century Club (81)

Conferences

CVPR (32) AAAI (13) ICCV (11) NIPS (8) ACL (7) INTERSPEECH (5) ECCV (4) ICLR (3) EMNLP (1) ICML (1) IJCAI (1)

Top co-authors

Jiahao Li (16) Jinglu Wang (14) Bin Li (13) Xiao Li (10) Cuiling Lan (9) Zhizheng Zhang (9) Xiulian Peng (8) Xun Guo (8) Qi Chu (7) Houqiang Li (6)

Keywords

representation learning (7) diffusion model (6) multimodal learning (6) feature extraction (6) video understanding (5) self-supervised learning (5) reinforcement learning (5) neural network (5) contrastive learning (4) temporal context (4) person re-identification (4) video compression (4) attention mechanism (4) video generation (4) transformer network (4) metric learning (3) uncertainty modeling (3) object detection (3) generative model (3) semantic segmentation (3)

Papers

Closing the Modality Reasoning Gap for Speech Large Language Models ACL 2026 SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation ACL 2026 InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training ACL 2026 CSPO: Alleviating Reward Ambiguity for Structured Table-to-LaTeX Generation ACL 2026 From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation ACL 2026 I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models CVPR 2025 SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation CVPR 2025 PICD: Versatile Perceptual Image Compression with Diffusion Rendering CVPR 2025 Towards Anytime Retrieval: A Benchmark for Anytime Person Re-Identification IJCAI 2025 Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video ICCV 2025 DLF: Extreme Image Compression with Dual-generative Latent Fusion ICCV 2025 StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams ICCV 2025 TrInk: Ink Generation with Transformer Network EMNLP 2025 Towards Practical Real-Time Neural Video Compression CVPR 2025 UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping CVPR 2025 MEATRD: Multimodal Anomalous Tissue Region Detection Enhanced with Spatial Transcriptomics AAAI 2025 SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training ACL 2025 UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction Synthesis ACL 2025 Implicit Motion Function CVPR 2024 Slot-VLM: Object-Event Slots for Video-Language Modeling NIPS 2024 Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement NIPS 2024 Arbitrary-Scale Video Super-resolution Guided by Dynamic Context AAAI 2024 MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators AAAI 2024 Unifying Multi-Modal Uncertainty Modeling and Semantic Alignment for Text-to-Image Person Re-identification AAAI 2024 Hierarchical Intra-modal Correlation Learning for Label-free 3D Semantic Segmentation CVPR 2024 MovieChat: From Dense Token to Sparse Memory for Long Video Understanding CVPR 2024 Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis CVPR 2024 QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition CVPR 2024 Generative Latent Coding for Ultra-Low Bitrate Image Compression CVPR 2024 Neural Video Compression with Feature Modulation CVPR 2024 Long-term Temporal Context Gathering for Neural Video Compression ECCV 2024 Mask-Based Modeling for Neural Radiance Fields ICLR 2024 Breaking through the learning plateaus of in-context learning in Transformer ICML 2024 Unifying Layout Generation With a Decoupled Diffusion Model CVPR 2023 Efficient View Synthesis with Neural Radiance Distribution Field ICCV 2023 Neural Video Compression With Diverse Contexts CVPR 2023 VideoTrack: Learning To Track Objects via Video Transformer CVPR 2023 Crossing the Gap: Domain Generalization for Image Captioning CVPR 2023 StableVideo: Text-driven Consistency-aware Diffusion Video Editing ICCV 2023 Robust Referring Video Object Segmentation with Cyclic Structural Consensus ICCV 2023 Adaptive Frequency Filters As Efficient Global Token Mixers ICCV 2023 Two-Shot Video Object Segmentation CVPR 2023 Motion Information Propagation for Neural Video Compression CVPR 2023 EVC: Towards Real-Time Neural Image Compression with Mask Decay ICLR 2023 Deep Frequency Filtering for Domain Generalization CVPR 2023 High-Fidelity and Freely Controllable Talking Head Video Generation CVPR 2023 ABC-KD: Attention-Based-Compression Knowledge Distillation for Deep Learning-Based Noise Suppression INTERSPEECH 2023 Masked Audio Modeling with CLAP and Multi-Objective Learning INTERSPEECH 2023 Learning Trajectories are Generalization Indicators NIPS 2023 DisDiff: Unsupervised Disentanglement of Diffusion Probabilistic Models NIPS 2023 Versatile Neural Processes for Learning Implicit Neural Representations ICLR 2023 Multi-View Domain Adaptive Object Detection on Camera Networks AAAI 2023 Active Token Mixer AAAI 2023 Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction CVPR 2023 Mask-based Latent Reconstruction for Reinforcement Learning NIPS 2022 Towards Error-Resilient Neural Speech Coding INTERSPEECH 2022 Cross-Scale Vector Quantization for Scalable Neural Speech Coding INTERSPEECH 2022 Reliable Propagation-Correction Modulation for Video Object Segmentation AAAI 2022 Hybrid Instance-Aware Temporal Fusion for Online Video Instance Segmentation AAAI 2022 Neural Capture of Animatable 3D Human from Monocular Video ECCV 2022 Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification ECCV 2022 Visual Concepts Tokenization NIPS 2022 Semantic-Aligned Fusion Transformer for One-Shot Object Detection CVPR 2022 Alignment-guided Temporal Attention for Video Action Recognition NIPS 2022 Multi-Modal Multi-Correlation Learning for Audio-Visual Speech Separation INTERSPEECH 2022 Neural Compression-Based Feature Learning for Video Restoration CVPR 2022 Self-Supervised Image Representation Learning With Geometric Set Consistency CVPR 2022 Rethinking Minimal Sufficient Representation in Contrastive Learning CVPR 2022 T-Net: Effective Permutation-Equivariant Network for Two-View Correspondence Learning ICCV 2021 Joint Color-irrelevant Consistency Learning and Identity-aware Modality Adaptation for Visible-infrared Cross Modality Person Re-identification AAAI 2021 Interactive Speech and Noise Modeling for Speech Enhancement AAAI 2021 Weakly-supervised Temporal Action Localization by Uncertainty Modeling AAAI 2021 Deep Contextual Video Compression NIPS 2021 SSAN: Separable Self-Attention Network for Video Representation Learning CVPR 2021 Geometry Uncertainty Projection Network for Monocular 3D Object Detection ICCV 2021 Self-Supervised Video Representation Learning With Meta-Contrastive Network ICCV 2021 Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking CVPR 2020 Cross-Modality Person Re-Identification With Shared-Specific Feature Transfer CVPR 2020 Triangulation Learning Network: From Monocular to Stereo 3D Object Detection CVPR 2019 MVPNet: Multi-View Point Regression Networks for 3D Object Reconstruction from A Single Image AAAI 2019 Relational Knowledge Distillation CVPR 2019 MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization AAAI 2019 Affinity Derivation and Graph Merge for Instance Segmentation ECCV 2018 Local Descriptors Optimized for Average Precision CVPR 2018 Feature Selective Networks for Object Detection CVPR 2018 Robust RGB-D Odometry Using Point and Line Features ICCV 2015