Xiaohan Wang

44 papers · 2020–2026 · 12 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🌍 Conference Polyglot (12) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (5)

🏃 Academic Marathon (5) 🐝 Cross-Pollinator (11) 🗺️ Taxonomy Completionist (81) 🤝 Dynamic Duo (20) 🔬 Deep Specialist (11) 🧬 Topic Evolution 🔥 Unstoppable (6) 🗃️ Keyword Collector (168) 📈 Trend Setter 💎 Century Club (43) ❓ The Questioner (2) ⚡ Prolific Year (9)

Conferences

CVPR (14) ICCV (8) AAAI (5) ACL (4) ICLR (4) IJCAI (2) NIPS (2) ECCV (1) EMNLP (1) IJCNLP (1) UAI (1) WACV (1)

Top co-authors

Yi Yang (20) Serena Yeung-Levy (10) Linchao Zhu (9) Yuhui Zhang (6) Ningyu Zhang (5) Wenguan Wang (4) Xinhang Song (4) Shuqiang Jiang (4) Alejandro Lozano (3) Orr Zohar (3)

Keywords

contrastive learning (6) vision-language model (5) video understanding (5) prototype learning (4) vision language model (3) scene understanding (3) multimodal learning (3) representation learning (3) point cloud (3) zero-shot learning (3) multi-modal learning (2) transformer architecture (2) semantic segmentation (2) cross-modal learning (2) video recognition (2) knowledge editing (2) reinforcement learning (2) action recognition (2) transfer learning (2) domain adaptation (2)

Papers

Modality-Balanced Collaborative Distillation for Multi-Modal Domain Generalization AAAI 2026 Video Action Differencing ICLR 2025 Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration ICCV 2025 Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation CVPR 2025 BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature CVPR 2025 Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought Leaps ICLR 2025 Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models WACV 2025 Targeted Learning for Variable Importance UAI 2025 Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision ICLR 2025 Apollo: An Exploration of Video Understanding in Large Multimodal Models CVPR 2025 A Category Agnostic Model for Visual Rearrangment CVPR 2024 Why are Visually-Grounded Language Models Bad at Image Classification? NIPS 2024 Interpretable3D: An Ad-Hoc Interpretable Classifier for 3D Point Clouds AAAI 2024 Cross-Sentence Gloss Consistency for Continuous Sign Language Recognition AAAI 2024 DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval AAAI 2024 EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models ACL 2024 Describing Differences in Image Sets with Natural Language CVPR 2024 An Interactive Navigation Method with Effect-oriented Affordance CVPR 2024 Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation CVPR 2024 VideoAgent: Long-form Video Understanding with Large Language Model as Agent ECCV 2024 Editing Conceptual Knowledge for Large Language Models EMNLP 2024 Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models ICLR 2024 Continual Multimodal Knowledge Graph Construction IJCAI 2024 Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models CVPR 2023 LambdaKG: A Library for Pre-trained Language Model-Based Knowledge Graph Embeddings IJCNLP 2023 Gloss-Free End-to-End Sign Language Translation ACL 2023 Adversarially Masking Synthetic To Mimic Real: Adaptive Noise Injection for Point Cloud Segmentation Adaptation CVPR 2023 CaMP: Causal Multi-policy Planning for Interactive Navigation in Multi-room Scenes NIPS 2023 How to Unleash the Power of Large Language Models for Few-shot Relation Extraction? ACL 2023 Open Anomalous Trajectory Recognition via Probabilistic Metric Learning IJCAI 2023 LANA: A Language-Capable Navigator for Instruction Following and Generation CVPR 2023 Global-to-Local Modeling for Video-Based 3D Human Pose and Shape Estimation CVPR 2023 Bird's-Eye-View Scene Graph for Vision-Language Navigation ICCV 2023 JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery ICCV 2023 Action Sensitivity Learning for Temporal Action Localization ICCV 2023 MAAL: Multimodality-Aware Autoencoder-Based Affordance Learning for 3D Articulated Objects ICCV 2023 Clustering based Point Cloud Representation Learning for 3D Analysis ICCV 2023 WhitenedCSE: Whitening-based Contrastive Learning of Sentence Embeddings ACL 2023 A Simple Episodic Linear Probe Improves Visual Recognition in the Wild CVPR 2022 Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark CVPR 2022 PR-RRN: Pairwise-Regularized Residual-Recursive Networks for Non-Rigid Structure-From-Motion ICCV 2021 Interactive Prototype Learning for Egocentric Action Recognition ICCV 2021 T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval CVPR 2021 Symbiotic Attention with Privileged Information for Egocentric Action Recognition AAAI 2020