Andrew Zisserman
144 papers · 2006–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+21 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (25) π Interdisciplinary Bridge π Renaissance Researcher (5) π£ Hot Topic Early Bird
π§
Keyword Pioneer
πΊοΈ
Taxonomy Completionist
(25)
π
Renaissance Researcher
(5)
π
Keyword Trendsetter Combo
(23)
π
Conference Loyalist
(28)
π
The Namer
π±
Topic Pioneer
π
Keyword Champion
(2)
π§¬
Topic Evolution
π€
Dynamic Duo
(21)
π
Triple Crown
π₯
Mega-Team
(27)
π
Grand Slam
π¬
Deep Specialist
(23)
β
The Questioner
(2)
π₯
Unstoppable
(13)
ποΈ
Keyword Collector
(80)
β‘
Prolific Year
(13)
π
Century Club
(142)
π
Trend Setter
π
Conference Pioneer
Conferences
CVPR (52)
NIPS (28)
ICCV (24)
ECCV (17)
INTERSPEECH (9)
ICLR (5)
ICML (3)
MICCAI (2)
AAAI (1)
ACL (1)
MIDL (1)
WACV (1)
Top co-authors
Research topics
Keywords
video understanding
(22)
self-supervised learning
(15)
action recognition
(14)
multimodal learning
(11)
contrastive learning
(10)
representation learning
(9)
object detection
(9)
optical flow
(9)
convolutional neural network
(9)
video representation
(8)
zero-shot learning
(7)
transformer architecture
(7)
weakly supervised learning
(7)
semantic segmentation
(6)
depth estimation
(5)
transfer learning
(5)
video segmentation
(5)
image segmentation
(5)
cross-modal learning
(5)
feature extraction
(4)
Papers
Segment, Embed, and Align: A Universal Recipe for Aligning Subtitles to Signing
ACL 2026
Open-World Object Counting in Videos
AAAI 2026
Understanding Co-speech Gestures in-the-wild
ICCV 2025
LayerLock: Non-collapsing Representation Learning with Progressive Freezing
ICCV 2025
From Panels to Prose: Generating Literary Narratives from Comics
ICCV 2025
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
ICCV 2025
Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues
CVPR 2025
SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications
ICCV 2025
Learning from Streaming Video with Orthogonal Gradients
CVPR 2025
Amodal Ground Truth and Completion in the Wild
CVPR 2024
TIM: A Time Interval Machine for Audio-Visual Action Recognition
CVPR 2024
Appearance-based Refinement for Object-Centric Motion Segmentation
ECCV 2024
Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
CVPR 2024
AutoAD III: The Prequel - Back to the Pixels
CVPR 2024
Learning from One Continuous Video Stream
CVPR 2024
A General Protocol to Probe Large Vision Models for 3D Physical Understanding
NIPS 2024
CountGD: Multi-Modal Open-World Counting
NIPS 2024
FlexCap: Describe Anything in Images in Controllable Detail
NIPS 2024
TAPVid-3D: A Benchmark for Tracking Any Point in 3D
NIPS 2024
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
CVPR 2024
The Manga Whisperer: Automatically Generating Transcriptions for Comics
CVPR 2024
Speech Recognition Models are Strong Lip-readers
INTERSPEECH 2024
3D Spine Shape Estimation from Single 2D DXA
MICCAI 2024
Automated Spinal MRI Labelling from Reports Using a Large Language Model
MICCAI 2024
N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields
ECCV 2024
Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
ECCV 2024
Text-Conditioned Resampler For Long Form Video Understanding
ECCV 2024
The Change You Want To See
WACV 2023
AutoAD: Movie Description in Context
CVPR 2023
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
INTERSPEECH 2023
Multi-Modal Classifiers for Open-Vocabulary Object Detection
ICML 2023
A Light Touch Approach to Teaching Transformers Multi-View Geometry
CVPR 2023
Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion
NIPS 2023
No Representation Rules Them All in Category Discovery
NIPS 2023
Perception Test: A Diagnostic Benchmark for Multimodal Video Models
NIPS 2023
TAPIR: Tracking Any Point with Per-Frame Initialization and Temporal Refinement
ICCV 2023
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model
ICCV 2023
The Making and Breaking of Camouflage
ICCV 2023
Verbs in Action: Improving Verb Understanding in Video-Language Models
ICCV 2023
AutoAD II: The Sequel - Who, When, and What in Movie Audio Description
ICCV 2023
Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime
MIDL 2023
Automatic Dense Annotation of Large-Vocabulary Sign Language Videos
ECCV 2022
Object Discovery and Representation Networks
ECCV 2022
Associating Objects and Their Effects in Video through Coordination Games
NIPS 2022
Flamingo: a Visual Language Model for Few-Shot Learning
NIPS 2022
TAP-Vid: A Benchmark for Tracking Any Point in a Video
NIPS 2022
Input-Level Inductive Biases for 3D Reconstruction
CVPR 2022
Sub-Word Level Lip Reading With Visual Attention
CVPR 2022
Temporal Alignment Networks for Long-Term Video
CVPR 2022
Generalized Category Discovery
CVPR 2022
It's About Time: Analog Clock Reading in the Wild
CVPR 2022
Reading To Listen at the Cocktail Party: Multi-Modal Speech Separation
CVPR 2022
Label, Verify, Correct: A Simple Few Shot Object Detection Method
CVPR 2022
Open-Set Recognition: A Good Closed-Set Classifier is All You Need
ICLR 2022
Perceiver IO: A General Architecture for Structured Inputs & Outputs
ICLR 2022
Segmenting Moving Objects via an Object-Centric Layered Representation
NIPS 2022
TeachText: CrossModal Generalized Distillation for Text-Video Retrieval
ICCV 2021
With a Little Help From My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations
ICCV 2021
Aligning Subtitles in Sign Language Videos
ICCV 2021
Broaden Your Views for Self-Supervised Video Learning
ICCV 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
ICCV 2021
Perceiver: General Perception with Iterative Attention
ICML 2021
Localizing Visual Sounds the Hard Way
CVPR 2021
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval With Transformers
CVPR 2021
Temporal Query Networks for Fine-Grained Video Understanding
CVPR 2021
Co-Attention for Conditioned Image Matching
CVPR 2021
Omnimatte: Associating Objects and Their Effects in Video
CVPR 2021
Read and Attend: Temporal Localisation in Sign Language Videos
CVPR 2021
Self-Supervised Video Object Segmentation by Motion Grouping
ICCV 2021
Self-Supervised Learning of Audio-Visual Objects from Video
ECCV 2020
Self-Supervised MultiModal Versatile Networks
NIPS 2020
Self-supervised Co-Training for Video Representation Learning
NIPS 2020
CrossTransformers: spatially-aware few-shot transfer
NIPS 2020
Visual Grounding in Video for Unsupervised Word Translation
CVPR 2020
Speech2Action: Cross-Modal Supervision for Action Recognition
CVPR 2020
End-to-End Learning of Visual Representations From Uncurated Instructional Videos
CVPR 2020
Counting Out Time: Class Agnostic Video Repetition Counting in the Wild
CVPR 2020
Memory-augmented Dense Predictive Coding for Video Representation Learning
ECCV 2020
Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval
ECCV 2020
BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues
ECCV 2020
Amplifying Key Cues for Human-Object-Interaction Detection
ECCV 2020
Adaptive Text Recognition through Visual Matching
ECCV 2020
Automatically Discovering and Learning New Visual Categories with Ranking Statistics
ICLR 2020
Training Neural Networks for and by Interpolation
ICML 2020
Spot the Conversation: Speaker Diarisation in the Wild
INTERSPEECH 2020
Now Youβre Speaking My Language: Visual Language Identification
INTERSPEECH 2020
Deep Frank-Wolfe For Neural Network Optimization
ICLR 2019
Exploiting Temporal Context for 3D Human Pose Estimation in the Wild
CVPR 2019
Sim2real transfer learning for 3D human pose estimation: motion to the rescue
NIPS 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
ICCV 2019
Controllable Attention for Structured Layered Video Decomposition
ICCV 2019
Learning to Discover Novel Visual Categories via Deep Transfer Clustering
ICCV 2019
Unsupervised Learning of Object Keypoints for Perception and Control
NIPS 2019
LAEO-Net: Revisiting People Looking at Each Other in Videos
CVPR 2019
My Lips Are Concealed: Audio-Visual Speech Enhancement Through Obstructions
INTERSPEECH 2019
Temporal Cycle-Consistency Learning
CVPR 2019
Video Action Transformer Network
CVPR 2019
The Visual Centrifuge: Model-Free Layered Video Representations
CVPR 2019
Learning to Navigate in Cities Without a Map
NIPS 2018
VoxCeleb2: Deep Speaker Recognition
INTERSPEECH 2018
Smooth Loss Functions for Deep Top-k Classification
ICLR 2018
Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching
CVPR 2018
Learning and Using the Arrow of Time
CVPR 2018
What Have We Learned From Deep Representations for Action Recognition?
CVPR 2018
Massively Parallel Video Networks
ECCV 2018
Comparator Networks
ECCV 2018
Learnable PINs: Cross-Modal Embeddings for Person Identity
ECCV 2018
Objects that Sound
ECCV 2018
X2Face: A network for controlling face generation using images, audio, and pose codes
ECCV 2018
Deep Lip Reading: A Comparison of Models and an Online Application
INTERSPEECH 2018
The Conversation: Deep Audio-Visual Speech Enhancement
INTERSPEECH 2018
Multi-Task Self-Supervised Visual Learning
ICCV 2017
Lip Reading Sentences in the Wild
CVPR 2017
Look, Listen and Learn
ICCV 2017
Detect to Track and Track to Detect
ICCV 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
CVPR 2017
VoxCeleb: A Large-Scale Speaker Identification Dataset
INTERSPEECH 2017
Synthetic Data for Text Localisation in Natural Images
CVPR 2016
Personalizing Human Video Pose Estimation
CVPR 2016
3D Shape Attributes
CVPR 2016
Convolutional Two-Stream Network Fusion for Video Action Recognition
CVPR 2016
Flowing ConvNets for Human Pose Estimation in Videos
ICCV 2015
Spatial Transformer Networks
NIPS 2015
Talking Heads: Detecting Humans and Recognizing Their Interactions
CVPR 2014
Seeing the Arrow of Time
CVPR 2014
Immediate, Scalable Object Category Detection
CVPR 2014
Triangulation Embedding and Democratic Aggregation for Image Search
CVPR 2014
Two-Stream Convolutional Networks for Action Recognition in Videos
NIPS 2014
A Compact and Discriminative Face Track Descriptor
CVPR 2014
Blocks That Shout: Distinctive Parts for Scene Classification
CVPR 2013
All About VLAD
CVPR 2013
Learning to Detect Partially Overlapping Instances
CVPR 2013
Discriminative Sub-categorization
CVPR 2013
Deep Fisher Networks for Large-Scale Image Classification
NIPS 2013
Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
ICCV 2013
Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
CVPR 2013
Pylon Model for Semantic Segmentation
NIPS 2011
Learning To Count Objects in Images
NIPS 2010
Simultaneous Object Detection and Ranking with Weak Supervision
NIPS 2010
Structured output regression for detection with partial truncation
NIPS 2009
Segmenting Scenes by Matching Image Composites
NIPS 2009
Supervised Dictionary Learning
NIPS 2008
Learning Visual Attributes
NIPS 2007
Bayesian Image Super-resolution, Continued
NIPS 2006