Brian Roark

52 papers · 2000–2025 · 7 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🐣 Hot Topic Early Bird 🌍 Conference Polyglot (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🏃 Academic Marathon (25)

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (34) 🐣 Hot Topic Early Bird 🏠 Conference Loyalist (23) 🧬 Topic Evolution 👥 Mega-Team (27) 🏆 Keyword Champion (2) 🗃️ Keyword Collector (78) ⚡ Prolific Year (6) 🚀 Conference Pioneer 💎 Century Club (52) 🔥 Unstoppable (13) ❓ The Questioner (2)

Conferences

ACL (23) NAACL (10) EMNLP (9) COLING (4) EACL (3) INTERSPEECH (2) IJCNLP (1)

Top co-authors

Ryan Cotterell (6) Richard Sproat (5) Alexander Gutkin (5) Christo Kirov (5) Kristy Hollingshead (5) Michael Riley (4) Aaron Dunlop (4) Tiago Pimentel (4) Izhak Shafran (4) Murat Saraclar (3)

Research topics

Linguistics (1) Statistics (1)

Keywords

language modeling (3) cross-linguistic analysis (3) text classification (2) large language model (2) speech recognition (2) information theory (2) morphological complexity (2) recurrent neural network (2) text normalization (2) abbreviation expansion (2) finite-state transducer (2) multilingual nlp (2) script normalization (2) language identification (2) natural language processing (1) speech processing (1) natural language generation (1) transfer learning (1) benchmark evaluation (1) language model adaptation (1)

Papers

Improving Informally Romanized Language Identification EMNLP 2025 Abbreviation Across the World’s Languages and Scripts COLING 2024 Distinguishing Romanized Hindi from Romanized Urdu ACL 2023 XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages EMNLP 2023 Spelling convention sensitivity in neural language models EACL 2023 Beyond Arabic: Software for Perso-Arabic Script Manipulation EMNLP 2022 Design principles of an open-source language modeling microservice package for AAC text-entry applications ACL 2022 Finite-state script normalization and processing utilities: The Nisaba Brahmic library EACL 2021 Finding Concept-specific Biases in Form–Meaning Associations NAACL 2021 Disambiguatory Signals are Stronger in Word-initial Positions EACL 2021 Structured abbreviation expansion in context EMNLP 2021 Rethinking Phonotactic Complexity ACL 2019 Meaning to Form: Measuring Systematicity as Information ACL 2019 What Kind of Language Is Hard to Language-Model? ACL 2019 Are All Languages Equally Hard to Language-Model? NAACL 2018 Learning N-Gram Language Models from Uncertain Data INTERSPEECH 2016 Contextual Prediction Models for Speech Recognition INTERSPEECH 2016 Hippocratic Abbreviation Expansion ACL 2014 Data Driven Grammatical Error Detection in Transcripts of Children’s Speech EMNLP 2014 Transforming trees into hedges and parsing with “hedgebank” grammars ACL 2014 Smoothed marginal distribution constraints for language modeling ACL 2013 Pair Language Models for Deriving Alternative Pronunciations and Spellings from Pronunciation Dictionaries EMNLP 2013 Discriminative Joint Modeling of Lexical Variation and Acoustic Confusion for Automated Narrative Retelling Assessment NAACL 2013 Distributional semantic models for the evaluation of disordered language NAACL 2013 The OpenGrm open-source finite-state grammar software libraries ACL 2012 Beam-Width Prediction for Efficient Context-Free Parsing ACL 2011 Lexicographic Semirings for Exact Automata Encoding of Sequence Models ACL 2011 An ERP-based Brain-Computer Interface for text entry using Rapid Serial Visual Presentation and Language Modeling ACL 2011 Unary Constraints for Efficient Context-Free Parsing ACL 2011 Semi-Supervised Modeling for Prenominal Modifier Ordering ACL 2011 Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation EMNLP 2011 Prenominal Modifier Ordering via Multiple Sequence Alignment NAACL 2010 Linear Complexity Context-Free Parsing Pipelines via Chart Constraints NAACL 2009 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop ACL 2009 Deriving lexical and syntactic expectation-based measures for psycholinguistic modeling via incremental top-down parsing EMNLP 2009 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop IJCNLP 2009 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts NAACL 2009 Classifying Chart Cells for Quadratic Complexity Context-Free Inference COLING 2008 The utility of parse-derived features for automatic discourse segmentation ACL 2007 Pipeline Iteration ACL 2007 Probabilistic Context-Free Grammar Induction Based on Structural Zeros NAACL 2006 PCFGs with Syntactic and Prosodic Indicators of Speech Repairs COLING 2006 PCFGs with Syntactic and Prosodic Indicators of Speech Repairs ACL 2006 Discriminative Syntactic Language Modeling for Speech Recognition ACL 2005 Comparing and Combining Finite-State and Context-Free Parsers EMNLP 2005 Incremental Parsing with the Perceptron Algorithm ACL 2004 Language Model Adaptation with MAP Estimation and the Perceptron Algorithm NAACL 2004 Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm ACL 2004 Supervised and unsupervised PCFG adaptation to novel domains NAACL 2003 Generalized Algorithms for Constructing Statistical Language Models ACL 2003 Markov Parsing: Lattice Rescoring with a Statistical Parser ACL 2002 Compact non-left-recursive grammars using the selective left-corner transform and factoring COLING 2000