Gedas Bertasius
45 papers · 2015–2026 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π Academic Marathon (11) π Conference Polyglot (10) π§ Keyword Pioneer π Interdisciplinary Bridge π Cross-Pollinator (7)
π
Cross-Pollinator
(7)
π
Renaissance Researcher
(6)
πΊοΈ
Taxonomy Completionist
(66)
π¬
Deep Specialist
(13)
π€
Dynamic Duo
(19)
π₯
Mega-Team
(100)
β‘
Prolific Year
(5)
π
Conference Pioneer
π
Trend Setter
ποΈ
Keyword Collector
(184)
π
Century Club
(45)
π₯
Unstoppable
(12)
β
The Questioner
(2)
Conferences
CVPR (18)
ECCV (8)
WACV (6)
ICCV (5)
EMNLP (2)
NIPS (2)
ACL (1)
AISTATS (1)
ICML (1)
RSS (1)
Top co-authors
Keywords
video understanding
(11)
multimodal learning
(7)
convolutional neural network
(5)
video question answering
(5)
semantic segmentation
(4)
action recognition
(3)
egocentric vision
(3)
large language model
(3)
temporal modeling
(3)
state-space model
(2)
zero-shot learning
(2)
temporal grounding
(2)
hand pose estimation
(2)
pose estimation
(2)
video analysis
(2)
multi-modal learning
(2)
video captioning
(2)
contrastive learning
(2)
vision transformer
(2)
video classification
(2)
Papers
Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising
WACV 2026
Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction
WACV 2026
TimeRefine: Temporal Grounding with Time Refining Video LLM
WACV 2026
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
CVPR 2025
VMAs: Video-to-Music Generation via Semantic Alignment in Web Music Videos
WACV 2025
Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning
EMNLP 2025
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
CVPR 2025
DAM: Dynamic Adapter Merging for Continual Video QA Learning
WACV 2025
BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation
CVPR 2025
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
CVPR 2025
4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation
ECCV 2024
A Simple LLM Framework for Long-Range Video Question-Answering
EMNLP 2024
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences
ACL 2024
LoCoNet: Long-Short Context Network for Active Speaker Detection
CVPR 2024
Video ReCap: Recursive Captioning of Hour-Long Videos
CVPR 2024
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
CVPR 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
ECCV 2024
"Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos"
ECCV 2024
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
ECCV 2024
Vision Transformers Are Parameter-Efficient Audio-Visual Learners
CVPR 2023
Unified Coarse-to-Fine Alignment for Video-Text Retrieval
ICCV 2023
SimpleClick: Interactive Image Segmentation with Simple Vision Transformers
ICCV 2023
Efficient Movie Scene Detection Using State-Space Transformers
CVPR 2023
VindLU: A Recipe for Effective Video-and-Language Pretraining
CVPR 2023
Learning To Recognize Procedural Activities With Distant Supervision
CVPR 2022
Long-Short Temporal Contrastive Learning of Video Transformers
CVPR 2022
ECLIPSE: Efficient Long-Range Video Retrieval Using Sight and Sound
ECCV 2022
TALLFormer: Temporal Action Localization with a Long-Memory Transformer
ECCV 2022
Long Movie Clip Classification with State-Space Video Models
ECCV 2022
Supervoxel Attention Graphs for Long-Range Video Modeling
WACV 2021
Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
CVPR 2021
Is Space-Time Attention All You Need for Video Understanding?
ICML 2021
COBE: Contextualized Object Embeddings from Narrated Instructional Video
NIPS 2020
Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation
CVPR 2020
Learning Temporal Pose Estimation from Sparsely-Labeled Videos
NIPS 2019
Object Detection in Video with Spatiotemporal Sampling Networks
ECCV 2018
Egocentric Basketball Motion Planning From a Single First-Person Image
CVPR 2018
First-Person Action-Object Detection with EgoNet
RSS 2017
Unsupervised Learning of Important Objects From First-Person Videos
ICCV 2017
Am I a Baller? Basketball Performance Assessment From First-Person Videos
ICCV 2017
Convolutional Random Walk Networks for Semantic Image Segmentation
CVPR 2017
Local Perturb-and-MAP for Structured Prediction
AISTATS 2017
Semantic Segmentation With Boundary Neural Fields
CVPR 2016
DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection
CVPR 2015
High-for-Low and Low-for-High: Efficient Boundary Detection From Deep Object Features and its Applications to High-Level Vision
ICCV 2015