Xiaojun Chang

80 papers · 2015–2026 · 13 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (13) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (13)

🌍 Conference Polyglot (13) 🗺️ Taxonomy Completionist (13) 🧭 Keyword Pioneer 🏠 Conference Loyalist (23) 🤝 Dynamic Duo (27) 🏆 Grand Slam 👥 Mega-Team (21) 🔬 Deep Specialist (16) 🏆 Keyword Champion (2) 🔥 Unstoppable (11) ⚡ Prolific Year (10) 🗃️ Keyword Collector (334) 💎 Century Club (75) ❓ The Questioner 📈 Trend Setter 🚀 Conference Pioneer

Conferences

CVPR (23) AAAI (12) IJCAI (11) ICCV (7) ICLR (7) ECCV (6) ICML (4) NIPS (3) ACL (2) EMNLP (2) EACL (1) IJCNLP (1) JMLR (1)

Top co-authors

Xiaodan Liang (27) Mingfei Han (11) Minnan Luo (8) Changlin Li (7) Yi Yang (7) Guangrun Wang (6) Fengda Zhu (6) Lina Yao (6) Sihao Lin (5) Mingjie Li (5)

Keywords

neural architecture search (9) feature extraction (5) self-supervised learning (5) model compression (4) video understanding (4) zero-shot learning (4) semi-supervised learning (4) multimodal learning (4) knowledge distillation (4) video classification (4) vision-language navigation (4) action recognition (4) domain adaptation (3) person re-identification (3) cross-modal retrieval (3) contrastive learning (3) data augmentation (3) vision transformer (3) attention mechanism (3) event detection (3)

Papers

Measuring Social Bias in Vision-Language Models with Face-Only Counterfactuals from Real Photos ACL 2026 MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models AAAI 2026 Correspondence Coverage Matters for Multi-Modal Dataset Distillation AAAI 2026 Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation AAAI 2026 Token Painter: Training-Free Text-Guided Image Inpainting via Mask Autoregressive Models AAAI 2026 Towards Efficient General Feature Prediction in Masked Skeleton Modeling ICCV 2025 Dense Audio-Visual Event Localization Under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration AAAI 2025 Towards Open-Vocabulary Audio-Visual Event Localization CVPR 2025 RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation CVPR 2025 HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation AAAI 2025 Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos ICLR 2025 Let LLM Tell What to Prune and How Much to Prune ICML 2025 Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes ICLR 2025 OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation CVPR 2025 Label-anticipated Event Disentanglement for Audio-Visual Video Parsing ECCV 2024 Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation AAAI 2024 SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-Form Layout-to-Image Generation AAAI 2024 ProAgent: Building Proactive Cooperative Agents with Large Language Models AAAI 2024 Video Recognition in Portrait Mode CVPR 2024 MLP Can Be A Good Transformer Learner CVPR 2024 Masked Distillation Advances Self-Supervised Transformer Architecture Search ICLR 2024 SWAP-NAS: Sample-Wise Activation Patterns for Ultra-fast NAS ICLR 2024 LongVLM: Efficient Long Video Understanding via Large Language Models ECCV 2024 Learning with Counterfactual Explanations for Radiology Report Generation ECCV 2024 Maximum Entropy Heterogeneous-Agent Reinforcement Learning ICLR 2024 ViLPAct: A Benchmark for Compositional Generalization on Multimodal Human Activities EACL 2023 ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View Semantic Consistency ICLR 2023 Dynamic Graph Enhanced Contrastive Learning for Chest X-Ray Report Generation CVPR 2023 3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation AAAI 2023 Vision Language Navigation with Knowledge-driven Environmental Dreamer IJCAI 2023 MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library JMLR 2023 Mask Propagation for Efficient Video Semantic Segmentation NIPS 2023 HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation ICCV 2023 FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration ICCV 2023 Erratum to: 3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation AAAI 2023 Knowledge Distillation via the Target-Aware Transformer CVPR 2022 Dual-AI: Dual-Path Actor Interaction Learning for Group Activity Recognition CVPR 2022 Automated Progressive Learning for Efficient Training of Vision Transformers CVPR 2022 Self-Supervised Global-Local Structure Modeling for Point Cloud Domain Adaptation With Reliable Voted Pseudo Labels CVPR 2022 An Efficient Spatio-Temporal Pyramid Transformer for Action Detection ECCV 2022 PAR: Political Actor Representation Learning with Social Context and Expert Knowledge EMNLP 2022 Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent RL ICML 2022 Cross-Modal Clinical Graph Transformer for Ophthalmic Report Generation CVPR 2022 BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule CVPR 2022 Beyond Fixation: Dynamic Window Visual Transformer CVPR 2022 iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients ICML 2021 Dynamic Slimmable Network CVPR 2021 SOON: Scenario Oriented Object Navigation With Graph-Based Exploration CVPR 2021 Vision-Language Navigation With Random Environmental Mixup ICCV 2021 BossNAS: Exploring Hybrid CNN-Transformers With Block-Wisely Self-Supervised Neural Architecture Search ICCV 2021 Exploring Inter-Channel Correlation for Diversity-Preserved Knowledge Distillation ICCV 2021 UPDeT: Universal Multi-agent RL via Policy Decoupling with Transformers ICLR 2021 Person Search Challenges and Solutions: A Survey IJCAI 2021 Hierarchical Neural Architecture Search for Deep Stereo Matching NIPS 2020 Block-Wisely Supervised Neural Architecture Search With Knowledge Distillation CVPR 2020 Differentiable Neural Architecture Search in Equivalent Space with Exploration Enhancement NIPS 2020 Overcoming Multi-Model Forgetting in One-Shot NAS With Diversity Maximization CVPR 2020 Unity Style Transfer for Person Re-Identification CVPR 2020 Mining Inter-Video Proposal Relations for Video Object Detection ECCV 2020 Vision-Dialog Navigation by Exploring Cross-Modal Memory CVPR 2020 Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks CVPR 2020 Quadratic Sparse Gaussian Graphical Model Estimation Method for Massive Variables IJCAI 2020 ZSTAD: Zero-Shot Temporal Activity Detection CVPR 2020 Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting ACL 2020 Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations EMNLP 2019 Distributionally Robust Semi-Supervised Learning for People-Centric Sensing AAAI 2019 Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations IJCNLP 2019 Teaching Semi-Supervised Classifier via Generalized Distillation IJCAI 2018 RCAA: Relational Context-Aware Agents for Person Search ECCV 2018 Uncertainty Sampling for Action Recognition via Maximizing Expected Average Precision IJCAI 2018 Reinforcement Cutting-Agent Learning for Video Object Segmentation CVPR 2018 Complex Event Detection by Identifying Reliable Shots From Untrimmed Videos ICCV 2017 Discriminative Dictionary Learning With Ranking Metric Embedded for Person Re-Identification IJCAI 2017 Top-k Supervise Feature Selection via ADMM for Integer Programming IJCAI 2017 Self-paced Mixture of Regressions IJCAI 2017 Adaptive Semi-Supervised Learning with Discriminative Least Squares Regression IJCAI 2017 How Unlabeled Web Videos Help Complex Event Detection? IJCAI 2017 They Are Not Equally Reliable: Semantic Event Search Using Differentiated Concept Classifiers CVPR 2016 Semantic Concept Discovery for Large-Scale Zero-Shot Event Detection IJCAI 2015 Complex Event Detection using Semantic Saliency and Nearly-Isotonic SVM ICML 2015