Triantafyllos Afouras

21 papers · 2017–2025 · 6 conferences · across top CS/AI conferences

Achievements

+10 more ↓

🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (6) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏃 Academic Marathon (8)

🌈 Renaissance Researcher (9) 🌍 Conference Polyglot (6) 🏃 Academic Marathon (8) 🤝 Dynamic Duo (13) 👥 Mega-Team (100) 🧬 Topic Evolution 💎 Century Club (21) 🗃️ Keyword Collector (91) 📈 Trend Setter 🔥 Unstoppable (9)

Conferences

CVPR (6) INTERSPEECH (6) ICCV (3) ECCV (2) ICML (2) NIPS (2)

Top co-authors

Andrew Zisserman (13) Joon Son Chung (7) Effrosyni Mavroudi (4) Lorenzo Torresani (4) Liliane Momeni (3) K R Prajwal (3) Gül Varol (3) Huiyu Wang (3) Samuel Albanie (3) Shraman Pramanick (2)

Keywords

multimodal learning (3) instructional video (3) lip reading (3) video understanding (3) transformer model (2) contrastive learning (2) object detection (2) word error rate (2) visual speech recognition (2) zero-shot learning (2) unsupervised learning (2) weakly supervised learning (2) visual speech (2) temporal localization (2) speech separation (2) transformer architecture (2) speech recognition (1) video segmentation (1) attention mechanism (1) pose estimation (1)

Papers

Enrich and Detect: Video Temporal Grounding with Multimodal LLMs ICCV 2025 Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives CVPR 2024 Speech Recognition Models are Strong Lip-readers INTERSPEECH 2024 MusicFlow: Cascaded Flow Matching for Text Guided Music Generation ICML 2024 HT-Step: Aligning Instructional Articles with How-To Videos NIPS 2023 Video-Mined Task Graphs for Keystep Recognition in Instructional Videos NIPS 2023 Learning to Ground Instructional Articles in Videos through Narrations ICCV 2023 Reading To Listen at the Cocktail Party: Multi-Modal Speech Separation CVPR 2022 Sub-Word Level Lip Reading With Visual Attention CVPR 2022 Self-Supervised Object Detection From Audio-Visual Correspondence CVPR 2022 Aligning Subtitles in Sign Language Videos ICCV 2021 Read and Attend: Temporal Localisation in Sign Language Videos CVPR 2021 Localizing Visual Sounds the Hard Way CVPR 2021 Self-Supervised Learning of Audio-Visual Objects from Video ECCV 2020 Now You’re Speaking My Language: Visual Language Identification INTERSPEECH 2020 Spot the Conversation: Speaker Diarisation in the Wild INTERSPEECH 2020 BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues ECCV 2020 My Lips Are Concealed: Audio-Visual Speech Enhancement Through Obstructions INTERSPEECH 2019 The Conversation: Deep Audio-Visual Speech Enhancement INTERSPEECH 2018 Deep Lip Reading: A Comparison of Models and an Online Application INTERSPEECH 2018 Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning ICML 2017