Byung-Doh Oh

18 papers · 2019–2026 · 7 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🏃 Academic Marathon (6) 🌍 Conference Polyglot (7) 🗺️ Taxonomy Completionist (35)

🗺️ Taxonomy Completionist (35) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏆 Keyword Champion (2) 🧬 Topic Evolution 🤝 Dynamic Duo (15) 💎 Century Club (17) 🔥 Unstoppable (5) 🗃️ Keyword Collector (65) ❓ The Questioner (2) ⚡ Prolific Year (6)

Conferences

ACL (6) EMNLP (5) IJCNLP (2) NAACL (2) AACL (1) COLING (1) EACL (1)

Top co-authors

William Schuler (15) Christian Clark (5) Nanjiang Jiang (1) Lifeng Jin (1) Evan Jaffe (1) Hongao Zhu (1) Pranav Maneriker (1) Shisen Yue (1) Sathvik Nair (1)

Keywords

reading time (8) language model (6) psycholinguistic modeling (4) cognitive modeling (2) word entropy (2) language modeling (2) language model surprisal (2) monte carlo estimation (2) self-paced reading (2) reading time prediction (2) sentence processing (2) word frequency (2) character model (2) surprisal estimation (2) first-token entropy (2) hidden state (1) grammar induction (1) semantic processing (1) probabilistic modeling (1) incremental parsing (1)

Papers

Clozing the Gap: Exploring Why Language Model Surprisal Outperforms Cloze Surprisal ACL 2026 How Well Does First-Token Entropy Approximate Word Entropy as a Psycholinguistic Predictor? AACL 2025 The Impact of Token Granularity on the Predictive Power of Language Model Surprisal ACL 2025 The Inverse Scaling Effect of Pre-Trained Language Model Surprisal Is Not Due to Data Leakage ACL 2025 Linear Recency Bias During Training Improves Transformers’ Fit to Reading Times COLING 2025 How Well Does First-Token Entropy Approximate Word Entropy as a Psycholinguistic Predictor? IJCNLP 2025 Leading Whitespaces of Language Models’ Subword Vocabulary Pose a Confound for Calculating Word Probabilities EMNLP 2024 Frequency Explains the Inverse Correlation of Large Language Models’ Size, Training Data Amount, and Surprisal’s Fit to Reading Times EACL 2024 Transformer-Based Language Model Surprisal Predicts Human Reading Times Best with About Two Billion Training Tokens EMNLP 2023 Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions ACL 2023 Entropy- and Distance-Based Predictors From GPT-2 Attention Patterns Predict Reading Times Over and Above GPT-2 Surprisal EMNLP 2022 Team Ohio State at CMCL 2021 Shared Task: Fine-Tuned RoBERTa for Eye-Tracking Data Prediction NAACL 2021 Coreference-aware Surprisal Predicts Brain Response EMNLP 2021 Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages EMNLP 2021 Surprisal Estimators for Human Reading Times Need Character Models IJCNLP 2021 Surprisal Estimators for Human Reading Times Need Character Models ACL 2021 Contributions of Propositional Content and Syntactic Category Information in Sentence Processing NAACL 2021 THOMAS: The Hegemonic OSU Morphological Analyzer using Seq2seq ACL 2019