Jan Kautz
153 papers · 2013–2025 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+19 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (10) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (5) π£ Hot Topic Early Bird
π
Renaissance Researcher
(5)
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(10)
π
Conference Loyalist
(70)
π
Keyword Trendsetter Combo
(3)
π
Grand Slam
π
Triple Crown
π€
Dynamic Duo
(38)
π₯
Mega-Team
(28)
π¬
Deep Specialist
(26)
π§¬
Topic Evolution
π
Keyword Champion
β‘
Prolific Year
(13)
β
The Questioner
π
Trend Setter
π
Century Club
(153)
π
Conference Pioneer
π₯
Unstoppable
(13)
ποΈ
Keyword Collector
(554)
Conferences
CVPR (70)
ECCV (20)
NIPS (19)
ICCV (18)
ICLR (14)
ICML (5)
CORL (2)
AAAI (1)
ACL (1)
JMLR (1)
RSS (1)
WACV (1)
Top co-authors
Research topics
Keywords
3d reconstruction
(12)
semantic segmentation
(12)
convolutional neural network
(9)
depth estimation
(9)
self-supervised learning
(9)
object detection
(8)
generative model
(8)
instance segmentation
(6)
contrastive learning
(6)
model compression
(6)
vision transformer
(6)
unsupervised learning
(6)
neural network
(6)
video generation
(5)
neural rendering
(5)
representation learning
(5)
knowledge distillation
(5)
semi-supervised learning
(5)
image generation
(5)
diffusion model
(5)
Papers
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
ICLR 2025
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
ICLR 2025
LongMamba: Enhancing Mamba's Long-Context Capabilities via Training-Free Receptive Field Enlargement
ICLR 2025
NVILA: Efficient Frontier Visual Language Models
CVPR 2025
Scaling Vision Pre-Training to 4K Resolution
CVPR 2025
Parallel Sequence Modeling via Generalized Spatial Propagation Network
CVPR 2025
Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation
CVPR 2025
NaVILA: Legged Robot Vision-Language-Action Model for Navigation
RSS 2025
GENMO: A GENeralist Model for Human MOtion
ICCV 2025
AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion
ICCV 2025
HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis
ICCV 2025
GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion
ICCV 2025
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
CVPR 2025
One-Minute Video Generation with Test-Time Training
CVPR 2025
LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models
ICML 2025
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
CVPR 2025
FoundationStereo: Zero-Shot Stereo Matching
CVPR 2025
SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing
CVPR 2025
RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models
CVPR 2025
FLARE: Robot Learning with Implicit World Modeling
CORL 2025
DreamGen: Unlocking Generalization in Robot Learning through Video World Models
CORL 2025
Score-Based Diffusion Models in Function Space
JMLR 2025
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
CVPR 2025
Gated Delta Networks: Improving Mamba2 with Delta Rule
ICLR 2025
LLaMaFlex: Many-in-one LLMs via Generalized Pruning and Weight Sharing
ICLR 2025
Hymba: A Hybrid-head Architecture for Small Language Models
ICLR 2025
SpatialRGPT: Grounded Spatial Reasoning in Vision-Language Models
NIPS 2024
Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?
CVPR 2024
COLMAP-Free 3D Gaussian Splatting
CVPR 2024
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
NIPS 2024
CosAE: Learnable Fourier Series for Image Restoration
NIPS 2024
Compact Language Models via Pruning and Knowledge Distillation
NIPS 2024
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
CVPR 2024
Flextron: Many-in-One Flexible Large Language Model
ICML 2024
FasterViT: Fast Vision Transformers with Hierarchical Attention
ICLR 2024
3D Reconstruction with Generalizable Neural Fields using Scene Priors
ICLR 2024
Learning to Jointly Understand Visual and Tactile Signals
ICLR 2024
A Variational Perspective on Solving Inverse Problems with Diffusion Models
ICLR 2024
LITA: Language Instructed Temporal-Localization Assistant
ECCV 2024
COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation
ECCV 2024
DiffiT: Diffusion Vision Transformers for Image Generation
ECCV 2024
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
CVPR 2024
AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One
CVPR 2024
Heterogeneous Continual Learning
CVPR 2023
The Best Defense Is a Good Offense: Adversarial Augmentation Against Adversarial Attacks
CVPR 2023
Generalizable One-shot 3D Neural Head Avatar
NIPS 2023
Global Vision Transformer Pruning With Hessian-Aware Saliency
CVPR 2023
Recurrence Without Recurrence: Stable Video Landmark Detection With Deep Equilibrium Models
CVPR 2023
Loss-Guided Diffusion Models for Plug-and-Play Controllable Generation
ICML 2023
Global Context Vision Transformers
ICML 2023
Pseudoinverse-Guided Diffusion Models for Inverse Problems
ICLR 2023
Convolutional State Space Models for Long-Range Spatiotemporal Modeling
NIPS 2023
BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects
CVPR 2023
Zero-Shot Pose Transfer for Unrigged Stylized 3D Characters
CVPR 2023
PhysDiff: Physics-Guided Human Motion Diffusion Model
ICCV 2023
RANA: Relightable Articulated Neural Avatars
ICCV 2023
FreeSOLO: Learning To Segment Objects Without Annotations
CVPR 2022
GradViT: Gradient Inversion of Vision Transformers
CVPR 2022
GLAMR: Global Occlusion-Aware Human Mesh Recovery With Dynamic Cameras
CVPR 2022
GroupViT: Semantic Segmentation Emerges From Text Supervision
CVPR 2022
A-ViT: Adaptive Tokens for Efficient Vision Transformer
CVPR 2022
Learning Continuous Environment Fields via Implicit Functions
ICLR 2022
LANA: Latency Aware Network Acceleration
ECCV 2022
Neural Light Field Estimation for Street Scenes with Differentiable Virtual Object Insertion
ECCV 2022
Neural Interferometry: Image Reconstruction from Astronomical Interferometers Using Transformer-Conditioned Neural Fields
AAAI 2022
CoordGAN: Self-Supervised Dense Correspondences Emerge From GANs
CVPR 2022
Self-Supervised Learning on 3D Point Clouds by Learning Discrete Generative Models
CVPR 2021
Learning Indoor Inverse Rendering With 3D Spatially-Varying Lighting
ICCV 2021
A Contrastive Learning Approach for Training Variational Autoencoder Priors
NIPS 2021
Coupled Segmentation and Edge Learning via Dynamic Graph Propagation
NIPS 2021
Score-based Generative Modeling in Latent Space
NIPS 2021
Binary TTC: A Temporal Geofence for Autonomous Navigation
CVPR 2021
Learning to Track Instances without Video Annotations
CVPR 2021
Self-Supervised Object Detection via Generative Image Synthesis
ICCV 2021
Weakly-Supervised Physically Unconstrained Gaze Estimation
CVPR 2021
See Through Gradients: Image Batch Recovery via GradInversion
CVPR 2021
DexYCB: A Benchmark for Capturing Hand Grasping of Objects
CVPR 2021
Parameter Efficient Multimodal Transformers for Video Representation Learning
ICLR 2021
VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models
ICLR 2021
NRMVS: Non-Rigid Multi-view Stereo
WACV 2020
Convolutional Tensor-Train LSTM for Spatio-Temporal Learning
NIPS 2020
Online Adaptation for Consistent Mesh Reconstruction in the Wild
NIPS 2020
NVAE: A Deep Hierarchical Variational Autoencoder
NIPS 2020
Learning to Generate Multiple Style Transfer Outputs for an Input Sentence
ACL 2020
Bi3D: Stereo Depth Estimation via Binary Classifications
CVPR 2020
Meshlet Priors for 3D Mesh Reconstruction
CVPR 2020
Self-Supervised Viewpoint Learning From Image Collections
CVPR 2020
Two-Shot Spatially-Varying BRDF and Shape Estimation
CVPR 2020
Novel View Synthesis of Dynamic Scenes With Globally Coherent Depths From a Monocular Camera
CVPR 2020
Weakly-Supervised 3D Human Pose Learning via Multi-View Images in the Wild
CVPR 2020
Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion
CVPR 2020
UNAS: Differentiable Architecture Search Meets Reinforcement Learning
CVPR 2020
Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection
CVPR 2020
Joint Disentangling and Adaptation for Cross-Domain Person Re-Identification
ECCV 2020
Contrastive Learning for Weakly Supervised Phrase Grounding
ECCV 2020
DeepGMR: Learning Latent Gaussian Mixture Models for Registration
ECCV 2020
Self-supervised Single-view 3D Reconstruction via Semantic Consistency
ECCV 2020
Weakly Supervised 3D Hand Pose Estimation via Biomechanical Constraints
ECCV 2020
UFOΒ²: A Unified Framework towards Omni-supervised Object Detection
ECCV 2020
Angular Visual Hardness
ICML 2020
Extreme View Synthesis
ICCV 2019
Neural Inverse Rendering of an Indoor Scene From a Single Image
ICCV 2019
Few-Shot Adaptive Gaze Estimation
ICCV 2019
Few-Shot Unsupervised Image-to-Image Translation
ICCV 2019
Joint-task Self-supervised Learning for Temporal Correspondence
NIPS 2019
Dancing to Music
NIPS 2019
Few-shot Video-to-Video Synthesis
NIPS 2019
STEP: Spatio-Temporal Progressive Learning for Video Action Detection
CVPR 2019
Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments
CVPR 2019
Importance Estimation for Neural Network Pruning
CVPR 2019
Pixel-Adaptive Convolutional Neural Networks
CVPR 2019
Neural RGB(r)D Sensing: Depth and Uncertainty From a Video Camera
CVPR 2019
PlaneRCNN: 3D Plane Detection and Reconstruction From a Single Image
CVPR 2019
SCOPS: Self-Supervised Co-Part Segmentation
CVPR 2019
Joint Discriminative and Generative Learning for Person Re-Identification
CVPR 2019
Learning Linear Transformations for Fast Image and Video Style Transfer
CVPR 2019
Learning Propagation for Arbitrarily-Structured Data
ICCV 2019
Unsupervised Video Interpolation Using Cycle Consistency
ICCV 2019
SENSE: A Shared Encoder Network for Scene-Flow Estimation
ICCV 2019
A Closed-form Solution to Photorealistic Image Stylization
ECCV 2018
Hand Pose Estimation via Latent 2.5D Heatmap Regression
ECCV 2018
Video-to-Video Synthesis
NIPS 2018
Context-aware Synthesis and Placement of Object Instances
NIPS 2018
Geometry-Aware Learning of Maps for Camera Localization
CVPR 2018
SPLATNet: Sparse Lattice Networks for Point Cloud Processing
CVPR 2018
Improving Landmark Localization With Semi-Supervised Learning
CVPR 2018
MoCoGAN: Decomposing Motion and Content for Video Generation
CVPR 2018
Learning Superpixels With Segmentation-Aware Affinity Loss
CVPR 2018
Switchable Temporal Propagation Network
ECCV 2018
Separating Reflection and Transmission Images in the Wild
ECCV 2018
Multimodal Unsupervised Image-to-image Translation
ECCV 2018
Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation
ECCV 2018
Tackling 3D ToF Artifacts Through Learning and the FLAT Dataset
ECCV 2018
Superpixel Sampling Networks
ECCV 2018
Simultaneous Edge Alignment and Learning
ECCV 2018
Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation
CVPR 2018
PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume
CVPR 2018
High-Resolution Image Synthesis and Semantic Manipulation With Conditional GANs
CVPR 2018
Deep Semantic Face Deblurring
CVPR 2018
Making Convolutional Networks Recurrent for Visual Sequence Learning
CVPR 2018
Depth-Based 3D Hand Pose Estimation: From Current Achievements to Future Goals
CVPR 2018
Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization With Spatially-Varying Lighting
ICCV 2017
A Lightweight Approach for On-The-Fly Reflectance Estimation
ICCV 2017
Unsupervised Image-to-Image Translation Networks
NIPS 2017
Learning Affinity via Spatial Propagation Networks
NIPS 2017
Polarimetric Multi-View Stereo
CVPR 2017
Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Network
CVPR 2017
Accelerated Generative Models for 3D Point Cloud Data
CVPR 2016
Online Detection and Classification of Dynamic Hand Gestures With Recurrent 3D Convolutional Neural Network
CVPR 2016
Robust Model-Based 3D Head Pose Estimation
ICCV 2015
Modeling Object Appearance Using Context-Conditioned Component Analysis
CVPR 2015
Hierarchical Subquery Evaluation for Active Learning on a Graph
CVPR 2014
Fully-Connected CRFs with Non-Parametric Pairwise Potential
CVPR 2013