Gedas Bertasius

45 papers · 2015–2026 · 10 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🏃 Academic Marathon (11) 🌍 Conference Polyglot (10) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (7)

🐝 Cross-Pollinator (7) 🌈 Renaissance Researcher (6) 🗺️ Taxonomy Completionist (66) 🔬 Deep Specialist (13) 🤝 Dynamic Duo (19) 👥 Mega-Team (100) ⚡ Prolific Year (5) 🚀 Conference Pioneer 📈 Trend Setter 🗃️ Keyword Collector (184) 💎 Century Club (45) 🔥 Unstoppable (12) ❓ The Questioner (2)

Conferences

CVPR (18) ECCV (8) WACV (6) ICCV (5) EMNLP (2) NIPS (2) ACL (1) AISTATS (1) ICML (1) RSS (1)

Top co-authors

Lorenzo Torresani (19) Jianbo Shi (12) Md Mohaiminul Islam (11) Mohit Bansal (10) Feng Cheng (9) Yan-Bo Lin (6) Ziyang Wang (6) Huiyu Wang (5) Hyun Soo Park (4) Tushar Nagarajan (4)

Keywords

video understanding (11) multimodal learning (7) convolutional neural network (5) video question answering (5) semantic segmentation (4) action recognition (3) egocentric vision (3) large language model (3) temporal modeling (3) state-space model (2) zero-shot learning (2) temporal grounding (2) hand pose estimation (2) pose estimation (2) video analysis (2) multi-modal learning (2) video captioning (2) contrastive learning (2) vision transformer (2) video classification (2)

Papers

Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising WACV 2026 Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction WACV 2026 TimeRefine: Temporal Grounding with Time Refining Video LLM WACV 2026 VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos CVPR 2025 VMAs: Video-to-Music Generation via Semantic Alignment in Web Music Videos WACV 2025 Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning EMNLP 2025 ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos CVPR 2025 DAM: Dynamic Adapter Merging for Continual Video QA Learning WACV 2025 BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation CVPR 2025 BIMBA: Selective-Scan Compression for Long-Range Video Question Answering CVPR 2025 4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation ECCV 2024 A Simple LLM Framework for Long-Range Video Question-Answering EMNLP 2024 Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences ACL 2024 LoCoNet: Long-Short Context Network for Active Speaker Detection CVPR 2024 Video ReCap: Recursive Captioning of Hour-Long Videos CVPR 2024 Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives CVPR 2024 Siamese Vision Transformers are Scalable Audio-visual Learners ECCV 2024 "Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos" ECCV 2024 RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos ECCV 2024 Vision Transformers Are Parameter-Efficient Audio-Visual Learners CVPR 2023 Unified Coarse-to-Fine Alignment for Video-Text Retrieval ICCV 2023 SimpleClick: Interactive Image Segmentation with Simple Vision Transformers ICCV 2023 Efficient Movie Scene Detection Using State-Space Transformers CVPR 2023 VindLU: A Recipe for Effective Video-and-Language Pretraining CVPR 2023 Learning To Recognize Procedural Activities With Distant Supervision CVPR 2022 Long-Short Temporal Contrastive Learning of Video Transformers CVPR 2022 ECLIPSE: Efficient Long-Range Video Retrieval Using Sight and Sound ECCV 2022 TALLFormer: Temporal Action Localization with a Long-Memory Transformer ECCV 2022 Long Movie Clip Classification with State-Space Video Models ECCV 2022 Supervoxel Attention Graphs for Long-Range Video Modeling WACV 2021 Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs CVPR 2021 Is Space-Time Attention All You Need for Video Understanding? ICML 2021 COBE: Contextualized Object Embeddings from Narrated Instructional Video NIPS 2020 Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation CVPR 2020 Learning Temporal Pose Estimation from Sparsely-Labeled Videos NIPS 2019 Object Detection in Video with Spatiotemporal Sampling Networks ECCV 2018 Egocentric Basketball Motion Planning From a Single First-Person Image CVPR 2018 First-Person Action-Object Detection with EgoNet RSS 2017 Unsupervised Learning of Important Objects From First-Person Videos ICCV 2017 Am I a Baller? Basketball Performance Assessment From First-Person Videos ICCV 2017 Convolutional Random Walk Networks for Semantic Image Segmentation CVPR 2017 Local Perturb-and-MAP for Structured Prediction AISTATS 2017 Semantic Segmentation With Boundary Neural Fields CVPR 2016 DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection CVPR 2015 High-for-Low and Low-for-High: Efficient Boundary Detection From Deep Object Features and its Applications to High-Level Vision ICCV 2015