Jan Kautz

153 papers · 2013–2025 · 12 conferences · across top CS/AI conferences

Achievements

+19 more ↓

🗺️ Taxonomy Completionist (10) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (5) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (10) 🏠 Conference Loyalist (70) 🌟 Keyword Trendsetter Combo (3) 🏆 Grand Slam 👑 Triple Crown 🤝 Dynamic Duo (38) 👥 Mega-Team (28) 🔬 Deep Specialist (26) 🧬 Topic Evolution 🏆 Keyword Champion ⚡ Prolific Year (13) ❓ The Questioner 📈 Trend Setter 💎 Century Club (153) 🚀 Conference Pioneer 🔥 Unstoppable (13) 🗃️ Keyword Collector (554)

Conferences

CVPR (70) ECCV (20) NIPS (19) ICCV (18) ICLR (14) ICML (5) CORL (2) AAAI (1) ACL (1) JMLR (1) RSS (1) WACV (1)

Top co-authors

Pavlo Molchanov (38) Sifei Liu (27) Shalini De Mello (23) Hongxu Yin (22) Ming-Yu Liu (20) Umar Iqbal (20) Kihwan Kim (17) Ming-Hsuan Yang (17) Arash Vahdat (16) Xueting Li (15)

Research topics

Models (1) Privacy (1)

Keywords

3d reconstruction (12) semantic segmentation (12) convolutional neural network (9) depth estimation (9) self-supervised learning (9) object detection (8) generative model (8) instance segmentation (6) contrastive learning (6) model compression (6) vision transformer (6) unsupervised learning (6) neural network (6) video generation (5) neural rendering (5) representation learning (5) knowledge distillation (5) semi-supervised learning (5) image generation (5) diffusion model (5)

Papers

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders ICLR 2025 LongVILA: Scaling Long-Context Visual Language Models for Long Videos ICLR 2025 LongMamba: Enhancing Mamba's Long-Context Capabilities via Training-Free Receptive Field Enlargement ICLR 2025 NVILA: Efficient Frontier Visual Language Models CVPR 2025 Scaling Vision Pre-Training to 4K Resolution CVPR 2025 Parallel Sequence Modeling via Generalized Spatial Propagation Network CVPR 2025 Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation CVPR 2025 NaVILA: Legged Robot Vision-Language-Action Model for Navigation RSS 2025 GENMO: A GENeralist Model for Human MOtion ICCV 2025 AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion ICCV 2025 HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis ICCV 2025 GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion ICCV 2025 MambaVision: A Hybrid Mamba-Transformer Vision Backbone CVPR 2025 One-Minute Video Generation with Test-Time Training CVPR 2025 LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models ICML 2025 OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning CVPR 2025 FoundationStereo: Zero-Shot Stereo Matching CVPR 2025 SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing CVPR 2025 RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models CVPR 2025 FLARE: Robot Learning with Implicit World Modeling CORL 2025 DreamGen: Unlocking Generalization in Robot Learning through Video World Models CORL 2025 Score-Based Diffusion Models in Function Space JMLR 2025 Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought CVPR 2025 Gated Delta Networks: Improving Mamba2 with Delta Rule ICLR 2025 LLaMaFlex: Many-in-one LLMs via Generalized Pruning and Weight Sharing ICLR 2025 Hymba: A Hybrid-head Architecture for Small Language Models ICLR 2025 SpatialRGPT: Grounded Spatial Reasoning in Vision-Language Models NIPS 2024 Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving? CVPR 2024 COLMAP-Free 3D Gaussian Splatting CVPR 2024 MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models NIPS 2024 CosAE: Learnable Fourier Series for Image Restoration NIPS 2024 Compact Language Models via Pruning and Knowledge Distillation NIPS 2024 GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning CVPR 2024 Flextron: Many-in-One Flexible Large Language Model ICML 2024 FasterViT: Fast Vision Transformers with Hierarchical Attention ICLR 2024 3D Reconstruction with Generalizable Neural Fields using Scene Priors ICLR 2024 Learning to Jointly Understand Visual and Tactile Signals ICLR 2024 A Variational Perspective on Solving Inverse Problems with Diffusion Models ICLR 2024 LITA: Language Instructed Temporal-Localization Assistant ECCV 2024 COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation ECCV 2024 DiffiT: Diffusion Vision Transformers for Image Generation ECCV 2024 FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects CVPR 2024 AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One CVPR 2024 Heterogeneous Continual Learning CVPR 2023 The Best Defense Is a Good Offense: Adversarial Augmentation Against Adversarial Attacks CVPR 2023 Generalizable One-shot 3D Neural Head Avatar NIPS 2023 Global Vision Transformer Pruning With Hessian-Aware Saliency CVPR 2023 Recurrence Without Recurrence: Stable Video Landmark Detection With Deep Equilibrium Models CVPR 2023 Loss-Guided Diffusion Models for Plug-and-Play Controllable Generation ICML 2023 Global Context Vision Transformers ICML 2023 Pseudoinverse-Guided Diffusion Models for Inverse Problems ICLR 2023 Convolutional State Space Models for Long-Range Spatiotemporal Modeling NIPS 2023 BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects CVPR 2023 Zero-Shot Pose Transfer for Unrigged Stylized 3D Characters CVPR 2023 PhysDiff: Physics-Guided Human Motion Diffusion Model ICCV 2023 RANA: Relightable Articulated Neural Avatars ICCV 2023 FreeSOLO: Learning To Segment Objects Without Annotations CVPR 2022 GradViT: Gradient Inversion of Vision Transformers CVPR 2022 GLAMR: Global Occlusion-Aware Human Mesh Recovery With Dynamic Cameras CVPR 2022 GroupViT: Semantic Segmentation Emerges From Text Supervision CVPR 2022 A-ViT: Adaptive Tokens for Efficient Vision Transformer CVPR 2022 Learning Continuous Environment Fields via Implicit Functions ICLR 2022 LANA: Latency Aware Network Acceleration ECCV 2022 Neural Light Field Estimation for Street Scenes with Differentiable Virtual Object Insertion ECCV 2022 Neural Interferometry: Image Reconstruction from Astronomical Interferometers Using Transformer-Conditioned Neural Fields AAAI 2022 CoordGAN: Self-Supervised Dense Correspondences Emerge From GANs CVPR 2022 Self-Supervised Learning on 3D Point Clouds by Learning Discrete Generative Models CVPR 2021 Learning Indoor Inverse Rendering With 3D Spatially-Varying Lighting ICCV 2021 A Contrastive Learning Approach for Training Variational Autoencoder Priors NIPS 2021 Coupled Segmentation and Edge Learning via Dynamic Graph Propagation NIPS 2021 Score-based Generative Modeling in Latent Space NIPS 2021 Binary TTC: A Temporal Geofence for Autonomous Navigation CVPR 2021 Learning to Track Instances without Video Annotations CVPR 2021 Self-Supervised Object Detection via Generative Image Synthesis ICCV 2021 Weakly-Supervised Physically Unconstrained Gaze Estimation CVPR 2021 See Through Gradients: Image Batch Recovery via GradInversion CVPR 2021 DexYCB: A Benchmark for Capturing Hand Grasping of Objects CVPR 2021 Parameter Efficient Multimodal Transformers for Video Representation Learning ICLR 2021 VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models ICLR 2021 NRMVS: Non-Rigid Multi-view Stereo WACV 2020 Convolutional Tensor-Train LSTM for Spatio-Temporal Learning NIPS 2020 Online Adaptation for Consistent Mesh Reconstruction in the Wild NIPS 2020 NVAE: A Deep Hierarchical Variational Autoencoder NIPS 2020 Learning to Generate Multiple Style Transfer Outputs for an Input Sentence ACL 2020 Bi3D: Stereo Depth Estimation via Binary Classifications CVPR 2020 Meshlet Priors for 3D Mesh Reconstruction CVPR 2020 Self-Supervised Viewpoint Learning From Image Collections CVPR 2020 Two-Shot Spatially-Varying BRDF and Shape Estimation CVPR 2020 Novel View Synthesis of Dynamic Scenes With Globally Coherent Depths From a Monocular Camera CVPR 2020 Weakly-Supervised 3D Human Pose Learning via Multi-View Images in the Wild CVPR 2020 Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion CVPR 2020 UNAS: Differentiable Architecture Search Meets Reinforcement Learning CVPR 2020 Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection CVPR 2020 Joint Disentangling and Adaptation for Cross-Domain Person Re-Identification ECCV 2020 Contrastive Learning for Weakly Supervised Phrase Grounding ECCV 2020 DeepGMR: Learning Latent Gaussian Mixture Models for Registration ECCV 2020 Self-supervised Single-view 3D Reconstruction via Semantic Consistency ECCV 2020 Weakly Supervised 3D Hand Pose Estimation via Biomechanical Constraints ECCV 2020 UFO²: A Unified Framework towards Omni-supervised Object Detection ECCV 2020 Angular Visual Hardness ICML 2020 Extreme View Synthesis ICCV 2019 Neural Inverse Rendering of an Indoor Scene From a Single Image ICCV 2019 Few-Shot Adaptive Gaze Estimation ICCV 2019 Few-Shot Unsupervised Image-to-Image Translation ICCV 2019 Joint-task Self-supervised Learning for Temporal Correspondence NIPS 2019 Dancing to Music NIPS 2019 Few-shot Video-to-Video Synthesis NIPS 2019 STEP: Spatio-Temporal Progressive Learning for Video Action Detection CVPR 2019 Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments CVPR 2019 Importance Estimation for Neural Network Pruning CVPR 2019 Pixel-Adaptive Convolutional Neural Networks CVPR 2019 Neural RGB(r)D Sensing: Depth and Uncertainty From a Video Camera CVPR 2019 PlaneRCNN: 3D Plane Detection and Reconstruction From a Single Image CVPR 2019 SCOPS: Self-Supervised Co-Part Segmentation CVPR 2019 Joint Discriminative and Generative Learning for Person Re-Identification CVPR 2019 Learning Linear Transformations for Fast Image and Video Style Transfer CVPR 2019 Learning Propagation for Arbitrarily-Structured Data ICCV 2019 Unsupervised Video Interpolation Using Cycle Consistency ICCV 2019 SENSE: A Shared Encoder Network for Scene-Flow Estimation ICCV 2019 A Closed-form Solution to Photorealistic Image Stylization ECCV 2018 Hand Pose Estimation via Latent 2.5D Heatmap Regression ECCV 2018 Video-to-Video Synthesis NIPS 2018 Context-aware Synthesis and Placement of Object Instances NIPS 2018 Geometry-Aware Learning of Maps for Camera Localization CVPR 2018 SPLATNet: Sparse Lattice Networks for Point Cloud Processing CVPR 2018 Improving Landmark Localization With Semi-Supervised Learning CVPR 2018 MoCoGAN: Decomposing Motion and Content for Video Generation CVPR 2018 Learning Superpixels With Segmentation-Aware Affinity Loss CVPR 2018 Switchable Temporal Propagation Network ECCV 2018 Separating Reflection and Transmission Images in the Wild ECCV 2018 Multimodal Unsupervised Image-to-image Translation ECCV 2018 Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation ECCV 2018 Tackling 3D ToF Artifacts Through Learning and the FLAT Dataset ECCV 2018 Superpixel Sampling Networks ECCV 2018 Simultaneous Edge Alignment and Learning ECCV 2018 Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation CVPR 2018 PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume CVPR 2018 High-Resolution Image Synthesis and Semantic Manipulation With Conditional GANs CVPR 2018 Deep Semantic Face Deblurring CVPR 2018 Making Convolutional Networks Recurrent for Visual Sequence Learning CVPR 2018 Depth-Based 3D Hand Pose Estimation: From Current Achievements to Future Goals CVPR 2018 Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization With Spatially-Varying Lighting ICCV 2017 A Lightweight Approach for On-The-Fly Reflectance Estimation ICCV 2017 Unsupervised Image-to-Image Translation Networks NIPS 2017 Learning Affinity via Spatial Propagation Networks NIPS 2017 Polarimetric Multi-View Stereo CVPR 2017 Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Network CVPR 2017 Accelerated Generative Models for 3D Point Cloud Data CVPR 2016 Online Detection and Classification of Dynamic Hand Gestures With Recurrent 3D Convolutional Neural Network CVPR 2016 Robust Model-Based 3D Head Pose Estimation ICCV 2015 Modeling Object Appearance Using Context-Conditioned Component Analysis CVPR 2015 Hierarchical Subquery Evaluation for Active Learning on a Graph CVPR 2014 Fully-Connected CRFs with Non-Parametric Pairwise Potential CVPR 2013