Weidi Xie
57 papers · 2018–2026 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
🌍 Conference Polyglot (10) 🏃 Academic Marathon (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (5)
🌈
Renaissance Researcher
(6)
🐣
Hot Topic Early Bird
🌍
Conference Polyglot
(10)
🤝
Dynamic Duo
(21)
🏆
Grand Slam
🔬
Deep Specialist
(16)
🧬
Topic Evolution
🗃️
Keyword Collector
(207)
📈
Trend Setter
⚡
Prolific Year
(12)
🚀
Conference Pioneer
🔥
Unstoppable
(6)
💎
Century Club
(56)
Conferences
CVPR (18)
ICCV (11)
ECCV (10)
NIPS (6)
ICLR (4)
EMNLP (3)
AAAI (2)
ICML (1)
MICCAI (1)
WACV (1)
Top co-authors
Keywords
video understanding
(11)
semantic segmentation
(6)
self-supervised learning
(6)
vision-language model
(6)
multimodal learning
(5)
zero-shot learning
(5)
video question answering
(4)
synthetic datum
(4)
video segmentation
(3)
image segmentation
(3)
depth estimation
(3)
egocentric video
(3)
instruction tuning
(3)
multi-modal learning
(3)
text generation
(3)
synthetic data generation
(3)
audio description
(3)
diffusion model
(3)
contrastive learning
(2)
action recognition
(2)
Papers
Versatile Vision-Language Model for 3D Computed Tomography
AAAI 2026
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation
CVPR 2025
RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining
MICCAI 2025
A Sanity Check for AI-generated Image Detection
ICLR 2025
EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
ICLR 2025
Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
AAAI 2025
Track-On: Transformer-based Online Point Tracking with Memory
ICLR 2025
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
ICLR 2025
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
ICCV 2025
Object-centric Video Question Answering with Visual Grounding and Referring
ICCV 2025
Learning Streaming Video Representation via Multitask Training
ICCV 2025
MRGen: Segmentation Data Engine For Underrepresented MRI Modalities
ICCV 2025
LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
CVPR 2025
Towards Universal Soccer Video Understanding
CVPR 2025
Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
CVPR 2024
A General Protocol to Probe Large Vision Models for 3D Physical Understanding
NIPS 2024
Grounded Question-Answering in Long Egocentric Videos
CVPR 2024
Retrieval-Augmented Egocentric Video Captioning
CVPR 2024
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
CVPR 2024
Amodal Ground Truth and Completion in the Wild
CVPR 2024
AutoAD III: The Prequel - Back to the Pixels
CVPR 2024
VISA: Reasoning Video Object Segmentation via Large Language Model
ECCV 2024
Appearance-based Refinement for Object-Centric Motion Segmentation
ECCV 2024
Knowledge-enhanced Visual-Language Pretraining for Computational Pathology
ECCV 2024
Multi-Sentence Grounding for Long-term Instructional Video
ECCV 2024
Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
ECCV 2024
MatchTime: Towards Automatic Soccer Game Commentary Generation
EMNLP 2024
RaTEScore: A Metric for Radiology Report Generation
EMNLP 2024
EchoSight: Advancing Visual-Language Models with Wiki Knowledge
EMNLP 2024
Annotation-Free Audio-Visual Segmentation
WACV 2024
The Making and Breaking of Camouflage
ICCV 2023
Towards Open-Vocabulary Video Instance Segmentation
ICCV 2023
Open-vocabulary Object Segmentation with Diffusion Models
ICCV 2023
Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision
CVPR 2023
Collaboration Helps Camera Overtake LiDAR in 3D Detection
CVPR 2023
Multi-Modal Classifiers for Open-Vocabulary Object Detection
ICML 2023
OvarNet: Towards Open-Vocabulary Object Attribute Recognition
CVPR 2023
Self-supervised Object-Centric Learning for Videos
NIPS 2023
AutoAD: Movie Description in Context
CVPR 2023
MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis
ICCV 2023
AutoAD II: The Sequel - Who, When, and What in Movie Audio Description
ICCV 2023
Joint-Relation Transformer for Multi-Person Motion Prediction
ICCV 2023
Prompting Visual-Language Models for Efficient Video Understanding
ECCV 2022
ReCo: Retrieve and Co-segment for Zero-shot Transfer
NIPS 2022
Segmenting Moving Objects via an Object-Centric Layered Representation
NIPS 2022
Associating Objects and Their Effects in Video through Coordination Games
NIPS 2022
Label, Verify, Correct: A Simple Few Shot Object Detection Method
CVPR 2022
It's About Time: Analog Clock Reading in the Wild
CVPR 2022
PromptDet: Towards Open-Vocabulary Detection Using Uncurated Images
ECCV 2022
Temporal Alignment Networks for Long-Term Video
CVPR 2022
Self-Supervised Video Object Segmentation by Motion Grouping
ICCV 2021
Localizing Visual Sounds the Hard Way
CVPR 2021
Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval
ECCV 2020
Self-supervised Co-Training for Video Representation Learning
NIPS 2020
Memory-augmented Dense Predictive Coding for Video Representation Learning
ECCV 2020
MAST: A Memory-Augmented Self-Supervised Tracker
CVPR 2020
Comparator Networks
ECCV 2018