Pan Zhang

39 papers · 2016–2026 · 8 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (10) 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (8)

🌍 Conference Polyglot (8) 🏃 Academic Marathon (9) 🐝 Cross-Pollinator (5) 🧬 Topic Evolution 👥 Mega-Team (24) 🤝 Dynamic Duo (30) 🔬 Deep Specialist (13) 🏆 Grand Slam 💎 Century Club (38) 🗃️ Keyword Collector (202) ❓ The Questioner (3) ⚡ Prolific Year (12) 🔥 Unstoppable (6)

Conferences

CVPR (13) ICCV (7) NIPS (7) ACL (3) ECCV (3) AAAI (2) ICLR (2) ICML (2)

Top co-authors

Jiaqi Wang (30) Xiaoyi Dong (27) Yuhang Zang (24) Dahua Lin (21) Yuhang Cao (15) Haodong Duan (9) Conghui He (9) Tong Wu (7) Dong Chen (6) Shuangrui Ding (6)

Keywords

vision-language model (6) multimodal learning (6) video understanding (4) large language model (4) multimodal large language model (3) multi-modal learning (3) large vision-language model (3) diffusion model (3) exemplar-based image translation (2) benchmark evaluation (2) video language model (2) semantic segmentation (2) temporal consistency (2) vision language model (2) image translation (2) instruction tuning (2) hallucination mitigation (2) instruction following (2) style transfer (1) computer vision (1)

Papers

Linguistic Steganography via Self-Adjusting Asymmetric Number System (Abstract Reprint) AAAI 2026 SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree ICCV 2025 OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? CVPR 2025 Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction CVPR 2025 ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way CVPR 2025 Conical Visual Concentration for Efficient Large Vision-Language Models CVPR 2025 X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting ICCV 2025 Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings ACL 2025 Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data ICCV 2025 MM-IFEngine: Towards Multimodal Instruction Following ICCV 2025 Deciphering Cross-Modal Alignment in Large Vision-Language Models via Modality Integration Rate ICCV 2025 VideoRoPE: What Makes for Good Video Rotary Position Embedding? ICML 2025 SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation ICML 2025 MotionClone: Training-Free Motion Cloning for Controllable Video Generation ICLR 2025 MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models ICLR 2025 Light-A-Video: Training-free Video Relighting via Progressive Light Fusion ICCV 2025 InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model ACL 2025 SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition ACL 2025 OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation CVPR 2024 MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs NIPS 2024 ShareGPT4Video: Improving Video Understanding and Generation with Better Captions NIPS 2024 Are We on the Right Way for Evaluating Large Vision-Language Models? NIPS 2024 InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD NIPS 2024 MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations NIPS 2024 Streaming Long Video Understanding with Large Language Models NIPS 2024 VIGC: Visual Instruction Generation and Correction AAAI 2024 Alpha-CLIP: A CLIP Model Focusing on Wherever You Want CVPR 2024 FreeDrag: Feature Dragging for Reliable Point-based Image Editing CVPR 2024 ShareGPT4V: Improving Large Multi-Modal Models with Better Captions ECCV 2024 Long-CLIP: Unlocking the Long-Text Capability of CLIP ECCV 2024 V3Det: Vast Vocabulary Visual Detection Dataset ICCV 2023 MetaPortrait: Identity-Preserving Talking Head Generation With Fast Personalized Adaptation CVPR 2023 BUOL: A Bottom-Up Framework With Occupancy-Aware Lifting for Panoptic 3D Scene Reconstruction From a Single Image CVPR 2023 Real-Time Neural Character Rendering with Pose-Guided Multiplane Images ECCV 2022 CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation CVPR 2021 Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation CVPR 2021 Cross-Domain Correspondence Learning for Exemplar-Based Image Translation CVPR 2020 Bringing Old Photos Back to Life CVPR 2020 Robust Spectral Detection of Global Structures in the Data by Learning a Regularization NIPS 2016