Triantafyllos Afouras
21 papers · 2017–2025 · 6 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+10 more ↓ Show less ↑
π Interdisciplinary Bridge π Conference Polyglot (6) π§ Keyword Pioneer π£ Hot Topic Early Bird π Academic Marathon (8)
π
Renaissance Researcher
(9)
π
Conference Polyglot
(6)
π
Academic Marathon
(8)
π€
Dynamic Duo
(13)
π₯
Mega-Team
(100)
π§¬
Topic Evolution
π
Century Club
(21)
ποΈ
Keyword Collector
(91)
π
Trend Setter
π₯
Unstoppable
(9)
Conferences
CVPR (6)
INTERSPEECH (6)
ICCV (3)
ECCV (2)
ICML (2)
NIPS (2)
Top co-authors
Keywords
multimodal learning
(3)
instructional video
(3)
lip reading
(3)
video understanding
(3)
transformer model
(2)
contrastive learning
(2)
object detection
(2)
word error rate
(2)
visual speech recognition
(2)
zero-shot learning
(2)
unsupervised learning
(2)
weakly supervised learning
(2)
visual speech
(2)
temporal localization
(2)
speech separation
(2)
transformer architecture
(2)
speech recognition
(1)
video segmentation
(1)
attention mechanism
(1)
pose estimation
(1)
Papers
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
ICCV 2025
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
CVPR 2024
Speech Recognition Models are Strong Lip-readers
INTERSPEECH 2024
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
ICML 2024
HT-Step: Aligning Instructional Articles with How-To Videos
NIPS 2023
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
NIPS 2023
Learning to Ground Instructional Articles in Videos through Narrations
ICCV 2023
Reading To Listen at the Cocktail Party: Multi-Modal Speech Separation
CVPR 2022
Sub-Word Level Lip Reading With Visual Attention
CVPR 2022
Self-Supervised Object Detection From Audio-Visual Correspondence
CVPR 2022
Aligning Subtitles in Sign Language Videos
ICCV 2021
Read and Attend: Temporal Localisation in Sign Language Videos
CVPR 2021
Localizing Visual Sounds the Hard Way
CVPR 2021
Self-Supervised Learning of Audio-Visual Objects from Video
ECCV 2020
Now Youβre Speaking My Language: Visual Language Identification
INTERSPEECH 2020
Spot the Conversation: Speaker Diarisation in the Wild
INTERSPEECH 2020
BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues
ECCV 2020
My Lips Are Concealed: Audio-Visual Speech Enhancement Through Obstructions
INTERSPEECH 2019
The Conversation: Deep Audio-Visual Speech Enhancement
INTERSPEECH 2018
Deep Lip Reading: A Comparison of Models and an Online Application
INTERSPEECH 2018
Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning
ICML 2017