conftrace_

Xizhou Zhu

48 papers · 2017–2025 · 6 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓
+14 more ↓ 🏃 Academic Marathon (8) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (6) 🧭 Keyword Pioneer 🐝 Cross-Pollinator (12)
🌈 Renaissance Researcher (6) 🐝 Cross-Pollinator (12) 🌍 Conference Polyglot (6) 🏠 Conference Loyalist (20) 🤝 Dynamic Duo (46) 👥 Mega-Team (38) 🔬 Deep Specialist (11) 🧬 Topic Evolution 🏆 Keyword Champion (4) 🗃️ Keyword Collector (157) 📈 Trend Setter 🔥 Unstoppable (9) 💎 Century Club (48) Prolific Year (12)

Conferences

CVPR (20) ICLR (9) NIPS (8) ECCV (5) ICCV (5) ICML (1)

Papers

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training CVPR 2025 Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy ICCV 2025 LangBridge: Interpreting Image as a Combination of Language Embeddings ICCV 2025 CoMemo: LVLMs Need Image Context with Image Memory ICML 2025 PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models CVPR 2025 SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding CVPR 2025 V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding ICCV 2025 MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models ICLR 2025 Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures ICLR 2025 OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text ICLR 2025 HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding CVPR 2025 Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft CVPR 2024 Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications CVPR 2024 The All-Seeing Project V2: Towards General Relation Comprehension of the Open World ECCV 2024 ControlLLM: Augment Language Models with Tools by Searching on Graphs ECCV 2024 Parameter-Inverted Image Pyramid Networks NIPS 2024 ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process ICLR 2024 The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World ICLR 2024 Needle In A Multimodal Haystack NIPS 2024 Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning NIPS 2024 Learning 1D Causal Visual Representation with De-focus Attention Networks NIPS 2024 VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks NIPS 2024 InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks CVPR 2024 Siamese Image Modeling for Self-Supervised Vision Representation Learning CVPR 2023 VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks NIPS 2023 Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information CVPR 2023 Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks CVPR 2023 BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision CVPR 2023 InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions CVPR 2023 Planning-Oriented Autonomous Driving CVPR 2023 VL-LTR: Learning Class-Wise Visual-Linguistic Representation for Long-Tailed Visual Recognition ECCV 2022 DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation ECCV 2022 Exploring the Equivalence of Siamese Self-Supervised Learning via a Unified Gradient Framework CVPR 2022 AutoLoss-Zero: Searching Loss Functions From Scratch for Generic Tasks CVPR 2022 Uni-Perceiver: Pre-Training Unified Architecture for Generic Perception for Zero-Shot and Few-Shot Tasks CVPR 2022 Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs NIPS 2022 Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation ICLR 2021 Deformable DETR: Deformable Transformers for End-to-End Object Detection ICLR 2021 Searching Parameterized AP Loss for Object Detection NIPS 2021 Unsupervised Object Detection With LIDAR Clues CVPR 2021 VL-BERT: Pre-training of Generic Visual-Linguistic Representations ICLR 2020 Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation ICLR 2020 Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation ECCV 2020 Deformable ConvNets V2: More Deformable, Better Results CVPR 2019 An Empirical Study of Spatial Attention Mechanisms in Deep Networks ICCV 2019 Towards High Performance Video Object Detection CVPR 2018 Flow-Guided Feature Aggregation for Video Object Detection ICCV 2017 Deep Feature Flow for Video Recognition CVPR 2017