Pan Zhang
39 papers · 2016–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (10) π Renaissance Researcher (5) π Interdisciplinary Bridge π Conference Polyglot (8)
π
Conference Polyglot
(8)
π
Academic Marathon
(9)
π
Cross-Pollinator
(5)
π§¬
Topic Evolution
π₯
Mega-Team
(24)
π€
Dynamic Duo
(30)
π¬
Deep Specialist
(13)
π
Grand Slam
π
Century Club
(38)
ποΈ
Keyword Collector
(202)
β
The Questioner
(3)
β‘
Prolific Year
(12)
π₯
Unstoppable
(6)
Conferences
CVPR (13)
ICCV (7)
NIPS (7)
ACL (3)
ECCV (3)
AAAI (2)
ICLR (2)
ICML (2)
Top co-authors
Keywords
vision-language model
(6)
multimodal learning
(6)
video understanding
(4)
large language model
(4)
multimodal large language model
(3)
multi-modal learning
(3)
large vision-language model
(3)
diffusion model
(3)
exemplar-based image translation
(2)
benchmark evaluation
(2)
video language model
(2)
semantic segmentation
(2)
temporal consistency
(2)
vision language model
(2)
image translation
(2)
instruction tuning
(2)
hallucination mitigation
(2)
instruction following
(2)
style transfer
(1)
computer vision
(1)
Papers
Linguistic Steganography via Self-Adjusting Asymmetric Number System (Abstract Reprint)
AAAI 2026
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
ICCV 2025
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
CVPR 2025
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
CVPR 2025
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
CVPR 2025
Conical Visual Concentration for Efficient Large Vision-Language Models
CVPR 2025
X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting
ICCV 2025
Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings
ACL 2025
Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data
ICCV 2025
MM-IFEngine: Towards Multimodal Instruction Following
ICCV 2025
Deciphering Cross-Modal Alignment in Large Vision-Language Models via Modality Integration Rate
ICCV 2025
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
ICML 2025
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
ICML 2025
MotionClone: Training-Free Motion Cloning for Controllable Video Generation
ICLR 2025
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
ICLR 2025
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
ICCV 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
ACL 2025
SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition
ACL 2025
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
CVPR 2024
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
NIPS 2024
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
NIPS 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
NIPS 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
NIPS 2024
MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
NIPS 2024
Streaming Long Video Understanding with Large Language Models
NIPS 2024
VIGC: Visual Instruction Generation and Correction
AAAI 2024
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
CVPR 2024
FreeDrag: Feature Dragging for Reliable Point-based Image Editing
CVPR 2024
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
ECCV 2024
Long-CLIP: Unlocking the Long-Text Capability of CLIP
ECCV 2024
V3Det: Vast Vocabulary Visual Detection Dataset
ICCV 2023
MetaPortrait: Identity-Preserving Talking Head Generation With Fast Personalized Adaptation
CVPR 2023
BUOL: A Bottom-Up Framework With Occupancy-Aware Lifting for Panoptic 3D Scene Reconstruction From a Single Image
CVPR 2023
Real-Time Neural Character Rendering with Pose-Guided Multiplane Images
ECCV 2022
CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation
CVPR 2021
Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation
CVPR 2021
Cross-Domain Correspondence Learning for Exemplar-Based Image Translation
CVPR 2020
Bringing Old Photos Back to Life
CVPR 2020
Robust Spectral Detection of Global Structures in the Data by Learning a Regularization
NIPS 2016