Jianfei Cai

105 papers · 2014–2026 · 11 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🧭 Keyword Pioneer 🌍 Conference Polyglot (11) 🗺️ Taxonomy Completionist (14) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (11)

🏃 Academic Marathon (11) 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (14) 🏠 Conference Loyalist (35) 🤝 Dynamic Duo (17) 👑 Triple Crown 🏆 Grand Slam 🔬 Deep Specialist (14) 🔥 Unstoppable (12) 🚀 Conference Pioneer ⚡ Prolific Year (20) ❓ The Questioner (2) 🗃️ Keyword Collector (352) 📈 Trend Setter 💎 Century Club (101)

Conferences

CVPR (35) ECCV (23) ICCV (16) NIPS (8) AAAI (5) ICLR (5) IJCAI (4) ICML (3) MICCAI (3) WACV (2) AISTATS (1)

Top co-authors

Bohan Zhuang (18) Tat-Jen Cham (14) Jianmin Zheng (13) Qianyi Wu (13) Chuanxia Zheng (11) Zizheng Pan (10) Jing Liu (8) Hamid Rezatofighi (8) Xu Yang (8) Dinh Phung (8)

Keywords

attention mechanism (7) image generation (6) object detection (6) vision transformer (5) point cloud (4) semantic segmentation (4) image classification (4) convolutional neural network (4) representation learning (4) neural network (4) zero-shot learning (3) self-supervised learning (3) unsupervised learning (3) image captioning (3) depth estimation (3) 3d reconstruction (3) domain adaptation (3) bayesian inference (3) transfer learning (3) model compression (3)

Papers

Marginalized Generalized IoU (MGIoU): A Unified Objective Function for Optimizing Convex Parametric Shapes AAAI 2026 PanFlow: Decoupled Motion Control for Panoramic Video Generation AAAI 2026 PCGS: Progressive Compression of 3D Gaussian Splatting AAAI 2026 Where and What Matters: Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning AAAI 2026 DrVideo: Document Retrieval Based Long Video Understanding CVPR 2025 McCaD: Multi-Contrast MRI Conditioned Adaptive Adversarial Diffusion Model for High-Fidelity MRI Synthesis WACV 2025 New Multiple Sclerosis Lesion Segmentation via Calibrated Inter-patch Blending MICCAI 2025 FPN-in-FPN: A Nested Multi-Scale Aggregation Network for Polyp Segmentation MICCAI 2025 Fast Feedforward 3D Gaussian Splatting Compression ICLR 2025 T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching ICLR 2025 PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction ICLR 2025 VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior ICCV 2025 Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis CVPR 2025 PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting CVPR 2025 Sharpness-Aware Data Generation for Zero-shot Quantization ICML 2024 Normal-GS: 3D Gaussian Splatting with Normal-Involved Rendering NIPS 2024 GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI NIPS 2024 MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views NIPS 2024 Point-PRC: A Prompt Learning Based Regulation Framework for Generalizable Point Cloud Analysis NIPS 2024 How Far Can We Compress Instant-NGP-Based NeRF? CVPR 2024 Diversified and Personalized Multi-rater Medical Image Segmentation CVPR 2024 Taming Stable Diffusion for Text to 360 Panorama Image Generation CVPR 2024 JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments CVPR 2024 Efficient Stitchable Task Adaptation CVPR 2024 Generative Region-Language Pretraining for Open-Ended Object Detection CVPR 2024 Surface Reconstruction for 3D Gaussian Splatting via Local Structural Hints ECCV 2024 HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression ECCV 2024 Differentiable Convex Polyhedra Optimization from Multi-view Images ECCV 2024 MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images ECCV 2024 Stitched ViTs are Flexible Vision Backbones ECCV 2024 McGrids: Monte Carlo-Driven Adaptive Grids for Iso-Surface Extraction ECCV 2024 Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation ECCV 2024 QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models ICLR 2024 SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation MICCAI 2024 ObjectSDF++: Improved Object-Compositional Neural Implicit Surfaces ICCV 2023 Dynamic Focus-Aware Positional Queries for Semantic Segmentation CVPR 2023 JRDB-Pose: A Large-Scale Dataset for Multi-Person Pose Estimation and Tracking CVPR 2023 Vector Quantized Wasserstein Auto-Encoder ICML 2023 Adversarial Local Distribution Regularization for Knowledge Distillation WACV 2023 Transformer Scale Gate for Semantic Segmentation CVPR 2023 Stitchable Neural Networks CVPR 2023 MARLIN: Masked Autoencoder for Facial Video Representation LearnINg CVPR 2023 Learning Object-Language Alignments for Open-Vocabulary Object Detection ICLR 2023 Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning ICCV 2023 Dual Adaptive Transformations for Weakly Supervised Point Cloud Segmentation ECCV 2022 Object-Compositional Neural Implicit Surfaces ECCV 2022 Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields ECCV 2022 ExtrudeNet: Unsupervised Inverse Sketch-and-Extrude for Shape Parsing ECCV 2022 ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues CVPR 2022 Bridging Global Context Interactions for High-Fidelity Image Completion CVPR 2022 GMFlow: Learning Optical Flow via Global Matching CVPR 2022 Fast Vision Transformers with HiLo Attention NIPS 2022 MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation NIPS 2022 Less Is More: Pay Less Attention in Vision Transformers AAAI 2022 Particle-based Adversarial Local Distribution Regularization AISTATS 2022 EcoFormer: Energy-Saving Attention with Linear Complexity NIPS 2022 Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation ECCV 2022 The Spatially-Correlative Loss for Various Image Translation Tasks CVPR 2021 CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing ICCV 2021 Domain-Invariant Disentangled Network for Generalizable Object Detection ICCV 2021 High-Resolution Optical Flow From 1D Attention and Correlation ICCV 2021 Learning Meta-Class Memory for Few-Shot Semantic Segmentation ICCV 2021 A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder ICCV 2021 Scalable Vision Transformers With Hierarchical Pooling ICCV 2021 Auto-Parsing Network for Image Captioning and Visual Question Answering ICCV 2021 RSG: A Simple but Effective Module for Learning Imbalanced Datasets CVPR 2021 Causal Attention for Vision-Language Tasks CVPR 2021 Exploring Bottom-Up and Top-Down Cues With Attentive Learning for Webly Supervised Object Detection CVPR 2020 Learning Progressive Joint Propagation for Human Motion Prediction ECCV 2020 Splitting vs. Merging: Mining Object Regions with Discrepancy and Intersection Loss for Weakly Supervised Semantic Segmentation ECCV 2020 Self-Supervised Relationship Probing NIPS 2020 Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning ECCV 2020 Learning from the Scene and Borrowing from the Rich: Tackling the Long Tail in Scene Graph Generation IJCAI 2020 End-to-End 3D Point Cloud Instance Segmentation Without Detection CVPR 2020 Learning to Collocate Neural Modules for Image Captioning ICCV 2019 Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks ICCV 2019 Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos IJCAI 2019 Skeleton-Aware 3D Human Shape Reconstruction From Point Clouds ICCV 2019 3D Hand Shape and Pose Estimation From a Single RGB Image CVPR 2019 Auto-Encoding Scene Graphs for Image Captioning CVPR 2019 Scene Graph Generation With External Knowledge and Image Reconstruction CVPR 2019 Pluralistic Image Completion CVPR 2019 Unpaired Image Captioning via Scene Graph Alignments ICCV 2019 Unpaired Image Captioning by Language Pivoting ECCV 2018 Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval With Generative Models CVPR 2018 Alive Caricature From 2D to 3D CVPR 2018 T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks ECCV 2018 Zero-Annotation Object Detection with Web Knowledge Transfer ECCV 2018 VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions ECCV 2018 Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression ICML 2018 Quadtree Convolutional Neural Networks ECCV 2018 Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment ECCV 2018 Weakly-supervised 3D Hand Pose Estimation from Monocular RGB Images ECCV 2018 Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features ECCV 2018 A Generative Model for Depth-Based Robust 3D Facial Pose Tracking CVPR 2017 Robust Survey Aggregation with Student-t Distribution and Sparse Representation IJCAI 2017 Student-t Process Regression with Student-t Likelihood IJCAI 2017 An Empirical Study of Language CNN for Image Captioning ICCV 2017 MIML-FCN+: Multi-Instance Multi-Label Learning via Fully Convolutional Networks With Privileged Information CVPR 2017 Object Co-Skeletonization With Co-Segmentation CVPR 2017 Exploit Bounding Box Annotations for Multi-Label Object Recognition CVPR 2016 Modality and Component Aware Feature Fusion For RGB-D Scene Classification CVPR 2016 MMSS: Multi-Modal Sharable and Specific Feature Learning for RGB-D Object Recognition ICCV 2015 Recovering Surface Details under General Unknown Illumination Using Shading and Coarse Multi-view Stereo CVPR 2014 Compact Representation for Image Classification: To Choose or to Compress? CVPR 2014