Florian Metze
55 papers · 2007–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (12) π§ Keyword Pioneer π Renaissance Researcher (5) π Interdisciplinary Bridge π£ Hot Topic Early Bird
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(12)
π
Cross-Pollinator
(8)
π
Conference Loyalist
(22)
π
Keyword Trendsetter Combo
(5)
π€
Dynamic Duo
(12)
π§¬
Topic Evolution
π
Keyword Champion
π¬
Deep Specialist
(12)
π₯
Unstoppable
(8)
π
Conference Pioneer
β‘
Prolific Year
(9)
π
Trend Setter
ποΈ
Keyword Collector
(209)
π
Century Club
(54)
Conferences
INTERSPEECH (22)
ACL (7)
EMNLP (7)
NAACL (5)
EACL (4)
NIPS (3)
CVPR (2)
IJCNLP (2)
AAAI (1)
ICCV (1)
ICLR (1)
Top co-authors
Keywords
automatic speech recognition
(10)
multimodal learning
(7)
zero-shot learning
(5)
attention mechanism
(5)
connectionist temporal classification
(4)
end-to-end speech recognition
(4)
low-resource language
(4)
speech recognition
(4)
acoustic model
(4)
end-to-end model
(4)
word error rate
(3)
self-supervised learning
(3)
contrastive learning
(3)
transfer learning
(3)
word embedding
(3)
video understanding
(3)
deep neural network
(3)
conversational context
(3)
deep learning
(2)
spoken language understanding
(2)
Papers
Aligning Paralinguistic Understanding and Generation in Speech LLMs via Multi-Task Reinforcement Learning
EACL 2026
CTC Alignments Improve Autoregressive Translation
EACL 2023
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification
INTERSPEECH 2022
Masked Autoencoders that Listen
NIPS 2022
Normalized Contrastive Learning for Text-Video Retrieval
EMNLP 2022
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
EMNLP 2022
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization
EMNLP 2022
Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble
ACL 2022
ASR2K: Speech Recognition for Around 2000 Languages without Audio
INTERSPEECH 2022
Self-Supervised Object Detection From Audio-Visual Correspondence
CVPR 2022
Hierarchical Phone Recognition with Compositional Phonetics
INTERSPEECH 2021
Multimodal Speech Summarization Through Semantic Concept Learning
INTERSPEECH 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
NIPS 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
ACL 2021
How2Sign: A Large-Scale Multimodal Dataset for Continuous American Sign Language
CVPR 2021
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
NAACL 2021
NoiseQA: Challenge Set Evaluation for User-Centric Question Answering
EACL 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
EMNLP 2021
Space-Time Crop & Attend: Improving Cross-Modal Video Representation Learning
ICCV 2021
Support-set bottlenecks for video-text representation learning
ICLR 2021
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks
NAACL 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
IJCNLP 2021
Differentiable Allophone Graphs for Language-Universal Speech Recognition
INTERSPEECH 2021
Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding
INTERSPEECH 2021
On Long-Tailed Phenomena in Neural Machine Translation
EMNLP 2020
Multimodal Speech Recognition with Unstructured Audio Masking
EMNLP 2020
Towards Context-Aware End-to-End Code-Switching Speech Recognition
INTERSPEECH 2020
Fine-Grained Grounding for Multimodal Speech Recognition
EMNLP 2020
On Dimensional Linguistic Properties of the Word Embedding Space
ACL 2020
Towards Zero-Shot Learning for Automatic Phonemic Transcription
AAAI 2020
Contextual RNN-T for Open Domain ASR
INTERSPEECH 2020
Multimodal Abstractive Summarization for How2 Videos
ACL 2019
Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion
ACL 2019
Adversarial Music: Real world Audio Adversary against Wake-word Detection System
NIPS 2019
Effective Dimensionality Reduction for Word Embeddings
ACL 2019
Multilingual Speech Recognition with Corpus Relatedness Sampling
INTERSPEECH 2019
Survey Talk: Multimodal Processing of Speech and Language
INTERSPEECH 2019
SANTLR: Speech Annotation Toolkit for Low Resource Languages
INTERSPEECH 2019
Cross-Attention End-to-End ASR for Two-Party Conversations
INTERSPEECH 2019
Acoustic-to-Word Models with Conversational Context Information
NAACL 2019
The ACLEW DiViMe: An Easy-to-use Diarization Tool
INTERSPEECH 2018
Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks
INTERSPEECH 2018
Subword and Crossword Units for CTC Acoustic Models
INTERSPEECH 2018
Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection
INTERSPEECH 2018
Comparison of Decoding Strategies for CTC Acoustic Models
INTERSPEECH 2017
A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification
INTERSPEECH 2017
Open-Domain Audio-Visual Speech Recognition: A Deep Learning Approach
INTERSPEECH 2016
Manipulating Word Lattices to Incorporate Human Corrections
INTERSPEECH 2016
Experiences with Shared Resources for Research and Education in Speech and Language Processing
INTERSPEECH 2016
Virtual Machines and Containers as a Platform for Experimentation
INTERSPEECH 2016
Augmenting Translation Models with Simulated Acoustic Confusions for Improved Spoken Language Translation
EACL 2014
Semantics for Large-Scale Multimedia: New Challenges for NLP
ACL 2014
Prosody-Based Unsupervised Speech Summarization with Two-Layer Mutually Reinforced Random Walk
IJCNLP 2013
Intra-Speaker Topic Modeling for Improved Multi-Party Meeting Summarization with Integrated Random Walk
NAACL 2012
On using Articulatory Features for Discriminative Speaker Adaptation
NAACL 2007