Lorenzo Torresani

60 papers · 2006–2026 · 9 conferences · across top CS/AI conferences

Achievements

+18 more ↓

🐣 Hot Topic Early Bird 🗺️ Taxonomy Completionist (11) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (9)

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (20) 🗺️ Taxonomy Completionist (11) 🏠 Conference Loyalist (25) 🌟 Keyword Trendsetter Combo (9) 🤝 Dynamic Duo (19) 🌱 Topic Pioneer 🏆 Keyword Champion 👥 Mega-Team (100) 🔬 Deep Specialist (19) 🧬 Topic Evolution ❓ The Questioner 🚀 Conference Pioneer ⚡ Prolific Year (7) 🗃️ Keyword Collector (267) 💎 Century Club (60) 📈 Trend Setter 🔥 Unstoppable (12)

Conferences

CVPR (25) NIPS (12) ICCV (9) ECCV (4) WACV (4) AISTATS (2) ICML (2) AAAI (1) EMNLP (1)

Top co-authors

Gedas Bertasius (19) Du Tran (13) Jianbo Shi (9) Huiyu Wang (8) Tushar Nagarajan (8) Kristen Grauman (6) Matt Feiszli (6) Heng Wang (6) Yale Song (5) Manohar Paluri (5)

Research topics

Core AI (1)

Keywords

video understanding (25) action recognition (14) multimodal learning (9) video classification (7) egocentric video (6) temporal modeling (4) convolutional neural network (4) image classification (4) instructional video (4) semantic segmentation (3) zero-shot learning (3) video question answering (3) temporal localization (3) activity recognition (3) contrastive learning (3) video captioning (3) transfer learning (3) self-supervised learning (3) multi-modal learning (3) 3d convolutional network (3)

Papers

TimeRefine: Temporal Grounding with Time Refining Video LLM WACV 2026 BIMBA: Selective-Scan Compression for Long-Range Video Question Answering CVPR 2025 VITED: Video Temporal Evidence Distillation CVPR 2025 Enrich and Detect: Video Temporal Grounding with Multimodal LLMs ICCV 2025 Learning to Segment Referred Objects from Narrated Egocentric Videos CVPR 2024 4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation ECCV 2024 Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives CVPR 2024 UNICORN: A Unified Causal Video-Oriented Language-Modeling Framework for Temporal Video-Language Tasks EMNLP 2024 Video ReCap: Recursive Captioning of Hour-Long Videos CVPR 2024 Step Differences in Instructional Video CVPR 2024 Egocentric Video Task Translation CVPR 2023 HierVL: Learning Hierarchical Video-Language Embeddings CVPR 2023 Ego-Only: Egocentric Action Detection without Exocentric Transferring ICCV 2023 Learning to Ground Instructional Articles in Videos through Narrations ICCV 2023 Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities NIPS 2023 HT-Step: Aligning Instructional Articles with How-To Videos NIPS 2023 Relational Space-Time Query in Long-Form Videos CVPR 2023 Ego4D: Around the World in 3,000 Hours of Egocentric Video CVPR 2022 Label Hallucination for Few-Shot Classification AAAI 2022 Long-Short Temporal Contrastive Learning of Video Transformers CVPR 2022 Learning To Recognize Procedural Activities With Distant Supervision CVPR 2022 Deformable Video Transformer CVPR 2022 Beyond Short Clips: End-to-End Video-Level Learning With Collaborative Memories CVPR 2021 Supervoxel Attention Graphs for Long-Range Video Modeling WACV 2021 Slot Machines: Discovering Winning Combinations of Random Weights in Neural Networks ICML 2021 Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs CVPR 2021 Is Space-Time Attention All You Need for Video Understanding? ICML 2021 Learn Like a Pathologist: Curriculum Learning by Annotator Agreement for Histopathology Image Classification WACV 2021 Only Time Can Tell: Discovering Temporal Data for Temporal Modeling WACV 2021 Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation CVPR 2020 Video Modeling With Correlation Networks CVPR 2020 Stein Variational Inference for Discrete Distributions AISTATS 2020 COBE: Contextualized Object Embeddings from Narrated Instructional Video NIPS 2020 Self-Supervised Learning by Cross-Modal Audio-Video Clustering NIPS 2020 Listen to Look: Action Recognition by Previewing Audio CVPR 2020 STAR-Caps: Capsule Networks with Straight-Through Attentive Routing NIPS 2019 Video Classification With Channel-Separated Convolutional Networks ICCV 2019 DistInit: Learning Video Representations Without a Single Labeled Video ICCV 2019 Learning Temporal Pose Estimation from Sparsely-Labeled Videos NIPS 2019 HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization ICCV 2019 SCSampler: Sampling Salient Clips From Video for Efficient Action Recognition ICCV 2019 What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets CVPR 2018 Object Detection in Video with Spatiotemporal Sampling Networks ECCV 2018 MaskConnect: Connectivity Learning by Gradient Descent ECCV 2018 Scenes-Objects-Actions: A Multi-Task, Multi-Label Video Dataset ECCV 2018 Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization NIPS 2018 Detect-and-Track: Efficient Pose Estimation in Videos CVPR 2018 A Closer Look at Spatiotemporal Convolutions for Action Recognition CVPR 2018 Learning to Inpaint for Image Compression NIPS 2017 Convolutional Random Walk Networks for Semantic Image Segmentation CVPR 2017 Local Perturb-and-MAP for Structured Prediction AISTATS 2017 Semantic Segmentation With Boundary Neural Fields CVPR 2016 Learning Spatiotemporal Features With 3D Convolutional Networks ICCV 2015 High-for-Low and Low-for-High: Efficient Boundary Detection From Deep Object Features and its Applications to High-Level Vision ICCV 2015 DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection CVPR 2015 Leveraging Structure from Motion to Learn Discriminative Codebooks for Scalable Landmark Classification CVPR 2013 PiCoDes: Learning a Compact Code for Novel-Category Recognition NIPS 2011 Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach NIPS 2010 Large Margin Component Analysis NIPS 2006 Learning Motion Style Synthesis from Perceptual Observations NIPS 2006