Xinlei Chen
55 papers · 2013–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
π Interdisciplinary Bridge π Renaissance Researcher (11) π Academic Marathon (12) π Conference Polyglot (11) πΊοΈ Taxonomy Completionist (91)
π
Academic Marathon
(12)
π§
Keyword Pioneer
πΊοΈ
Taxonomy Completionist
(91)
π
Keyword Trendsetter Combo
(5)
π¬
Deep Specialist
(12)
π
Grand Slam
π§¬
Topic Evolution
β‘
Prolific Year
(5)
β
The Questioner
π₯
Unstoppable
(13)
ποΈ
Keyword Collector
(219)
π
Century Club
(53)
π
Conference Pioneer
π
Trend Setter
Conferences
CVPR (18)
ICCV (13)
ICLR (5)
ICML (5)
ACL (4)
NIPS (3)
AAAI (2)
ECCV (1)
EMNLP (1)
IJCAI (1)
JMLR (1)
NAACL (1)
Top co-authors
Research topics
Keywords
self-supervised learning
(9)
multimodal learning
(8)
representation learning
(6)
visual question answering
(6)
object detection
(5)
contrastive learning
(4)
large language model
(4)
convolutional neural network
(4)
vision transformer
(3)
visual representation
(3)
knowledge distillation
(3)
image captioning
(3)
masked autoencoder
(3)
transfer learning
(3)
video understanding
(2)
visual grounding
(2)
point cloud
(2)
domain adaptation
(2)
transformer architecture
(2)
computer vision
(2)
Papers
DIMM: Decoupled Multi-hierarchy Kalman Filter via Reinforcement Learning
AAAI 2026
AirCopBench: A Benchmark for Multi-drone Collaborative Embodied Perception and Reasoning
AAAI 2026
Analyzing and Modeling LLM Response Lengths with Extreme Value Theory: Anchoring Effects and Hybrid Distributions
EMNLP 2025
Context-Aware Sentiment Forecasting via LLM-based Multi-Perspective Role-Playing Agents
ACL 2025
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory
ACL 2025
UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces
ACL 2025
Test-Time Training on Video Streams
JMLR 2025
How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM
IJCAI 2025
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
ICML 2025
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
ICML 2025
Highly Compressed Tokenizer Can Generate Without Training
ICML 2025
LLMs can see and hear without any training
ICML 2025
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
ICLR 2025
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
ICLR 2025
Transformers without Normalization
CVPR 2025
PRE-Mamba: A 4D State Space Model for Ultra-High-Frequent Event Camera Deraining
ICCV 2025
Scaling Language-Free Visual Representation Learning
ICCV 2025
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
ICCV 2025
On the Surprising Effectiveness of Attention Transfer for Vision Transformers
NIPS 2024
Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers
NIPS 2024
R-MAE: Regions Meet Masked Autoencoders
ICLR 2024
Improving Selective Visual Question Answering by Learning From Your Peers
CVPR 2023
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding
ICCV 2023
ConvNeXt V2: Co-Designing and Scaling ConvNets With Masked Autoencoders
CVPR 2023
Test-Time Training with Masked Autoencoders
NIPS 2022
Masked Autoencoders Are Scalable Vision Learners
CVPR 2022
On the Importance of Asymmetry for Siamese Representation Learning
CVPR 2022
Point-Level Region Contrast for Object Detection Pre-Training
CVPR 2022
NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training
ICLR 2022
Understanding self-supervised learning dynamics without contrastive pairs
ICML 2021
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
CVPR 2021
Exploring Simple Siamese Representation Learning
CVPR 2021
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
ICLR 2021
An Empirical Study of Training Self-Supervised Vision Transformers
ICCV 2021
In Defense of Grid Features for Visual Question Answering
CVPR 2020
ImVoteNet: Boosting 3D Object Detection in Point Clouds With Image Votes
CVPR 2020
Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation
ECCV 2020
Grounded Video Description
CVPR 2019
Order-Aware Generative Modeling Using the 3D-Craft Dataset
ICCV 2019
Embodied Amodal Recognition: Learning to Move to Perceive Objects
ICCV 2019
TensorMask: A Foundation for Dense Object Segmentation
ICCV 2019
nocaps: novel object captioning at scale
ICCV 2019
Prior-Aware Neural Network for Partially-Supervised Multi-Organ Segmentation
ICCV 2019
CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication
ACL 2019
Towards VQA Models That Can Read
CVPR 2019
Multi-Target Embodied Question Answering
CVPR 2019
Cycle-Consistency for Robust Visual Question Answering
CVPR 2019
Iterative Visual Reasoning Beyond Convolutions
CVPR 2018
Spatial Memory for Context Reasoning in Object Detection
ICCV 2017
Visualizing and Understanding Neural Models in NLP
NAACL 2016
Sense Discovery via Co-Clustering on Images and Text
CVPR 2015
Mind's Eye: A Recurrent Visual Representation for Image Caption Generation
CVPR 2015
Webly Supervised Learning of Convolutional Networks
ICCV 2015
Enriching Visual Knowledge Bases via Object Discovery and Segmentation
CVPR 2014
NEIL: Extracting Visual Knowledge from Web Data
ICCV 2013