Juan Carlos Niebles
72 papers · 2014–2025 · 14 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
π Conference Polyglot (14) π Academic Marathon (11) π§ Keyword Pioneer π Interdisciplinary Bridge π Cross-Pollinator (7)
π
Cross-Pollinator
(7)
π
Renaissance Researcher
(8)
πΊοΈ
Taxonomy Completionist
(105)
π
Conference Loyalist
(27)
π§¬
Topic Evolution
π€
Dynamic Duo
(20)
π
Keyword Champion
(4)
π
Grand Slam
π¬
Deep Specialist
(11)
β
The Questioner
β‘
Prolific Year
(7)
π
Conference Pioneer
π₯
Unstoppable
(12)
π
Trend Setter
π
Century Club
(72)
ποΈ
Keyword Collector
(302)
Conferences
CVPR (27)
ECCV (10)
ICCV (9)
NIPS (8)
EMNLP (5)
ICML (3)
WACV (3)
AAAI (1)
ACL (1)
CLEAR (1)
CONLL (1)
ICLR (1)
NAACL (1)
PGM (1)
Top co-authors
Research topics
Keywords
video understanding
(14)
action recognition
(7)
vision-language model
(5)
activity recognition
(5)
few-shot learning
(5)
multimodal learning
(5)
temporal alignment
(4)
reinforcement learning
(3)
instructional video
(3)
weakly supervised learning
(3)
graph neural network
(3)
language model
(3)
human pose estimation
(3)
video analysis
(3)
feature learning
(2)
trajectory prediction
(2)
contrastive learning
(2)
multi-modal learning
(2)
knowledge distillation
(2)
self-supervised learning
(2)
Papers
ViUniT: Visual Unit Tests for More Robust Visual Programming
CVPR 2025
xLAM: A Family of Large Action Models to Empower AI Agent Systems
NAACL 2025
Understanding Complexity in VideoQA via Visual Program Generation
ICML 2025
Unifying Specialized Visual Encoders for Video Language Models
ICML 2025
Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
ICML 2025
LAM SIMULATOR: Advancing Data Generation for Large Action Model Training via Online Exploration and Trajectory Feedback
ACL 2025
UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation
ICCV 2025
Re-thinking Temporal Search for Long-Form Video Understanding
CVPR 2025
Contra4: Evaluating Contrastive Cross-Modal Reasoning in Audio, Video, Image, and 3D
EMNLP 2025
ActionStudio: A Lightweight Framework for Data and Training of Large Action Models
EMNLP 2025
LATTE: Learning to Think with Vision Specialists
EMNLP 2025
PRACT: Optimizing Principled Reasoning and Acting of LLM Agent
CONLL 2024
"X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-modal Reasoning"
ECCV 2024
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
ECCV 2024
ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
CVPR 2024
IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos
NIPS 2024
Streaming Detection of Queried Event Start
NIPS 2024
On the Unlikelihood of D-Separation
PGM 2024
Causal Layering via Conditional Entropy
CLEAR 2024
PRACT: Optimizing Principled Reasoning and Acting of LLM Agent
EMNLP 2024
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
ICLR 2024
APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets
NIPS 2024
Mask-Free OVIS: Open-Vocabulary Instance Segmentation Without Manual Mask Annotations
CVPR 2023
ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding
CVPR 2023
PreViTS: Contrastive Pretraining With Video Tracking Supervision
WACV 2023
Temporally Disentangled Representation Learning under Unknown Nonstationarity
NIPS 2023
UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild
NIPS 2023
Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation
ICCV 2023
Procedure-Aware Pretraining for Instructional Video Understanding
CVPR 2023
PrivHAR: Recognizing Human Actions from Privacy-Preserving Lens
ECCV 2022
MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing
NIPS 2022
Align and Prompt: Video-and-Language Pre-Training With Entity Prompts
CVPR 2022
Revisiting the "Video" in Video-Language Understanding
CVPR 2022
Open Vocabulary Object Detection with Pseudo Bounding-Box Labels
ECCV 2022
Metadata Normalization
CVPR 2021
MOMA: Multi-Object Multi-Actor Activity Parsing
NIPS 2021
TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild
ICCV 2021
Detecting Human-Object Relationships in Videos
ICCV 2021
Learning Privacy-Preserving Optics for Human Pose Estimation
ICCV 2021
Representation Learning With Statistical Independence to Mitigate Bias
WACV 2021
Home Action Genome: Cooperative Compositional Action Understanding
CVPR 2021
Spatio-Temporal Graph for Video Captioning With Knowledge Distillation
CVPR 2020
Few-Shot Video Classification via Temporal Alignment
CVPR 2020
Disentangling Human Dynamics for Pedestrian Locomotion Forecasting with Noisy Supervision
WACV 2020
Adversarial Cross-Domain Action Recognition with Co-Attention
AAAI 2020
RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition
ECCV 2020
Procedure Planning in Instructional Videos
ECCV 2020
Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs
CVPR 2020
D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation
CVPR 2019
Learning Temporal Action Proposals With Fewer Labels
ICCV 2019
Imitation Learning for Human Pose Prediction
ICCV 2019
Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration
CVPR 2019
Peeking Into the Future: Predicting Future Person Activities and Locations in Videos
CVPR 2019
Liquid Pouring Monitoring via Rich Sensory Inputs
ECCV 2018
Temporal Modular Networks for Retrieving Complex Compositional Activities in Videos
ECCV 2018
Learning to Decompose and Disentangle Representations for Video Prediction
NIPS 2018
Translating Navigation Instructions in Natural Language to a High-Level Plan for Behavioral Robot Navigation
EMNLP 2018
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets
CVPR 2018
Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos
CVPR 2018
Graph Distillation for Action Detection with Privileged Modalities
ECCV 2018
End-to-End Joint Semantic Segmentation of Actors and Actions in Video
ECCV 2018
Visual Forecasting by Imitating Dynamics in Natural Sequences
ICCV 2017
Dense-Captioning Events in Videos
ICCV 2017
Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization
CVPR 2017
SST: Single-Stream Temporal Action Proposals
CVPR 2017
Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos
CVPR 2017
A Hierarchical Pose-Based Approach to Complex Action Understanding Using Dictionaries of Actionlets and Motion Poselets
CVPR 2016
Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos
CVPR 2016
ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding
CVPR 2015
Robust Manhattan Frame Estimation From a Single RGB-D Image
CVPR 2015
On the Relationship Between Visual Attributes and Convolutional Networks
CVPR 2015
Discriminative Hierarchical Modeling of Spatio-Temporally Composable Human Activities
CVPR 2014