James Glass

120 papers · 2004–2025 · 15 conferences · across top CS/AI conferences

Achievements

+17 more ↓

🗺️ Taxonomy Completionist (28) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (5) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (15) 🏠 Conference Loyalist (42) 🌟 Keyword Trendsetter Combo (10) 🤝 Dynamic Duo (15) 🧬 Topic Evolution 🏆 Keyword Champion 🌱 Topic Pioneer 🔬 Deep Specialist (22) 📈 Trend Setter 🚀 Conference Pioneer 🔥 Unstoppable (11) ⚡ Prolific Year (24) ❓ The Questioner (5) 💎 Century Club (120) 🗃️ Keyword Collector (96)

Conferences

INTERSPEECH (42) ACL (22) EMNLP (13) NAACL (12) IJCNLP (5) CVPR (4) EACL (4) NIPS (4) AAAI (3) ICLR (3) COLING (2) ICCV (2) SEMEVAL (2) AACL (1) ECCV (1)

Top co-authors

David Harwath (15) Hongyin Luo (15) Yonatan Belinkov (14) Wei-Ning Hsu (14) Preslav Nakov (12) Mitra Mohtarami (10) Ramy Baly (9) Yung-Sung Chuang (9) Rogerio Feris (9) Yu-An Chung (9)

Research topics

Applications (1)

Keywords

self-supervised learning (13) representation learning (12) unsupervised learning (10) speech recognition (9) text classification (8) multimodal learning (8) transfer learning (8) automatic speech recognition (8) stance detection (6) speaker verification (6) convolutional neural network (6) video retrieval (6) deep neural network (5) language model (5) contrastive learning (5) recurrent neural network (4) zero-shot learning (4) domain adaptation (4) attention mechanism (4) neural machine translation (4)

Papers

Teaching VLMs to Localize Specific Objects from In-context Examples ICCV 2025 What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions CVPR 2024 Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning NAACL 2024 Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation INTERSPEECH 2024 Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer INTERSPEECH 2024 Self-Specialization: Uncovering Latent Expertise within Large Language Models ACL 2024 Joint Inference of Retrieval and Generation for Passage Re-ranking EACL 2024 R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces NAACL 2024 Found in the middle: Calibrating Positional Attention Bias Improves Long Context Utilization ACL 2024 Revealing the Blind Spot of Sentence Encoder Evaluation by HEROS ACL 2023 Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering ACL 2023 ConvRGX: Recognition, Generation, and Extraction for Self-trained Conversational Question Answering ACL 2023 Search Augmented Instruction Learning EMNLP 2023 On the Blind Spots of Model-Based Evaluation Metrics for Text Generation ACL 2023 Entailment as Robust Self-Learner ACL 2023 Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages INTERSPEECH 2023 Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers INTERSPEECH 2023 Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering INTERSPEECH 2023 Logic Against Bias: Textual Entailment Mitigates Stereotypical Sentence Reasoning EACL 2023 PCFG-Based Natural Language Interface Improves Generalization for Controlled Text Generation ACL 2023 Cross-Modal Discrete Representation Learning ACL 2022 DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings NAACL 2022 Cooperative Self-training of Machine Reading Comprehension NAACL 2022 Controlling the Focus of Pretrained Language Generation Models ACL 2022 Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval CVPR 2022 Simple and Effective Unsupervised Speech Synthesis INTERSPEECH 2022 Detecting Dementia from Long Neuropsychological Interviews EMNLP 2022 SSAST: Self-Supervised Audio Spectrogram Transformer AAAI 2022 Text-Free Image-to-Speech Synthesis Using Learned Segmental Units IJCNLP 2021 Mitigating Biases in Toxic Language Detection through Invariant Rationalization IJCNLP 2021 Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset INTERSPEECH 2021 Joint Retrieval-Extraction Training for Evidence-Aware Dialog Response Selection INTERSPEECH 2021 Cascaded Multilingual Audio-Visual Learning from Videos INTERSPEECH 2021 Exposure Bias versus Self-Recovery: Are Distortions Really Incremental for Autoregressive Text Generation? EMNLP 2021 Mitigating Biases in Toxic Language Detection through Invariant Rationalization ACL 2021 Text-Free Image-to-Speech Synthesis Using Learned Segmental Units ACL 2021 CLAC: A Speech Corpus of Healthy English Speakers INTERSPEECH 2021 AVLnet: Learning Audio-Visual Language Representations from Instructional Videos INTERSPEECH 2021 AST: Audio Spectrogram Transformer INTERSPEECH 2021 Multimodal Clustering Networks for Self-Supervised Learning From Unlabeled Videos ICCV 2021 Spoken Moments: Learning Joint Audio-Visual Representations From Video Descriptions CVPR 2021 Analyzing the Forgetting Problem in Pretrain-Finetuning of Open-domain Dialogue Response Models EACL 2021 Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies INTERSPEECH 2021 Negative Training for Neural Dialogue Response Generation ACL 2020 What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context ACL 2020 Prototypical Q Networks for Automatic Conversational Diagnosis and Few-Shot New Disease Adaption INTERSPEECH 2020 Vector-Quantized Autoregressive Predictive Coding INTERSPEECH 2020 A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation AACL 2020 Multimodal Association for Speaker Verification INTERSPEECH 2020 A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning INTERSPEECH 2020 We Can Detect Your Bias: Predicting the Political Ideology of News Articles EMNLP 2020 Knowledge Grounded Conversational Symptom Detection with Graph Memory Networks EMNLP 2020 Similarity Analysis of Contextual Word Representation Models ACL 2020 Improved Speech Representations with Multi-Target Autoregressive Predictive Coding ACL 2020 Pair Expansion for Learning Multilingual Semantic Embeddings Using Disjoint Visually-Grounded Speech Audio Datasets INTERSPEECH 2020 What Does an End-to-End Dialect Identification Model Learn About Non-Dialectal Information? INTERSPEECH 2020 Unsupervised Methods for Evaluating Speech Representations INTERSPEECH 2020 Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech ICLR 2020 Tanbih: Get To Know What You Are Reading IJCNLP 2019 Improving Neural Language Models by Segmenting, Attending, and Predicting the Future ACL 2019 Learning Words by Drawing Images CVPR 2019 Towards Bilingual Lexicon Discovery From Visually Grounded Speech Audio INTERSPEECH 2019 Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media NAACL 2019 FAKTA: An Automatic End-to-End Fact Checking System NAACL 2019 Team QCRI-MIT at SemEval-2019 Task 4: Propaganda Analysis Meets Hyperpartisan News Detection SEMEVAL 2019 MCE 2018: The 1st Multi-Target Speaker Detection and Identification Challenge Evaluation INTERSPEECH 2019 What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models AAAI 2019 Contrastive Language Adaptation for Cross-Lingual Stance Detection EMNLP 2019 Tanbih: Get To Know What You Are Reading EMNLP 2019 Neural Multi-Task Learning for Stance Prediction EMNLP 2019 Integrating Video Retrieval and Moment Detection in a Unified Corpus for Video Question Answering INTERSPEECH 2019 Transfer Learning from Audio-Visual Grounding to Speech Recognition INTERSPEECH 2019 VoiceID Loss: Speech Enhancement for Speaker Verification INTERSPEECH 2019 Detecting Egregious Responses in Neural Sequence-to-sequence Models ICLR 2019 Identifying and Controlling Important Neurons in Neural Machine Translation ICLR 2019 Contrastive Language Adaptation for Cross-Lingual Stance Detection IJCNLP 2019 NeuroX: A Toolkit for Analyzing Individual Neurons in Neural Networks AAAI 2019 A Comparison of Deep Learning Methods for Language Understanding INTERSPEECH 2019 Multiple Sound Source Localization with SVD-PHAT INTERSPEECH 2019 Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition INTERSPEECH 2019 An Unsupervised Autoregressive Model for Speech Representation Learning INTERSPEECH 2019 A Deep Residual Network for Large-Scale Acoustic Scene Analysis INTERSPEECH 2019 A Study of Enhancement, Augmentation and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition INTERSPEECH 2018 Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces NIPS 2018 Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign COLING 2018 Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input ECCV 2018 Predicting Factuality of Reporting and Bias of News Media Sources EMNLP 2018 Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech INTERSPEECH 2018 Scalable Factorized Hierarchical Variational Autoencoder Training INTERSPEECH 2018 Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition INTERSPEECH 2018 Detecting Depression with Audio/Text Sequence Modeling of Interviews INTERSPEECH 2018 Automatic Stance Detection Using End-to-End Memory Networks NAACL 2018 Supervised and Unsupervised Transfer Learning for Question Answering NAACL 2018 Integrating Stance Detection and Fact Checking in a Unified Corpus NAACL 2018 On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference NAACL 2018 Role-specific Language Models for Processing Recorded Neuropsychological Exams NAACL 2018 Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks IJCNLP 2017 What do Neural Machine Translation Models Learn about Morphology? ACL 2017 Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data NIPS 2017 Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems NIPS 2017 Learning Latent Representations for Speech Generation and Transformation INTERSPEECH 2017 QMDIS: QCRI-MIT Advanced Dialect Identification System INTERSPEECH 2017 An Environmental Feature Representation for Robust Speech Recognition and for Environment Identification INTERSPEECH 2017 Character-Based Embedding Models and Reranking Strategies for Understanding Natural Language Meal Descriptions INTERSPEECH 2017 Learning Word-Like Units from Joint Audio-Visual Analysis ACL 2017 Automatic Dialect Detection in Arabic Broadcast Speech INTERSPEECH 2016 Memory-Efficient Modeling and Search Techniques for Hardware ASR Decoders INTERSPEECH 2016 Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition INTERSPEECH 2016 Neural Attention for Learning to Rank Questions in Community Question Answering COLING 2016 Unsupervised Learning of Spoken Language with Visual Context NIPS 2016 VectorSLU: A Continuous Word Vector Approach to Answer Selection in Community Question Answering Systems SEMEVAL 2015 Arabic Diacritization with Recurrent Neural Networks EMNLP 2015 Joint Learning of Phonetic Units and Word Pronunciations for ASR EMNLP 2013 A Nonparametric Bayesian Approach to Acoustic Model Discovery ACL 2012 Syntactic Phrase Reordering for English-to-Arabic Statistical Machine Translation EACL 2009 Segmentation for English-to-Arabic Statistical Machine Translation ACL 2008 N-gram Weighting: Reducing Training Data Mismatch in Cross-Domain Language Model Estimation EMNLP 2008 Making Sense of Sound: Unsupervised Topic Segmentation over Acoustic Input ACL 2007 Style & Topic Language Model Adaptation Using HMM-LDA EMNLP 2006 Feature-based Pronunciation Modeling for Speech Recognition NAACL 2004