David Mimno

33 papers · 2009–2026 · 9 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🐝 Cross-Pollinator (10) 🧭 Keyword Pioneer 🏃 Academic Marathon (16) 🌍 Conference Polyglot (9) 🌈 Renaissance Researcher (8)

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🐝 Cross-Pollinator (10) 🧬 Topic Evolution 🏆 Keyword Champion (3) ❓ The Questioner 🚀 Conference Pioneer 💎 Century Club (31) 🔥 Unstoppable (9) ⚡ Prolific Year (5) 📈 Trend Setter 🗃️ Keyword Collector (129)

Conferences

EMNLP (15) EACL (5) IJCNLP (3) NAACL (3) ICML (2) NIPS (2) ACL (1) AISTATS (1) COLING (1)

Top co-authors

Moontae Lee (7) David Bindel (5) Jack Hessel (4) Gregory Yauney (4) Maria Antoniak (3) Rebecca Hicke (3) Sungjun Cho (3) Lillian Lee (3) Laure Thompson (3) Matthew Wilkens (3)

Research topics

Digital Humanities (3)

Keywords

topic model (4) language model (4) topic modeling (4) unsupervised learning (4) social bia (3) topic inference (3) multimodal learning (3) pretraining datum (2) large language model (2) spectral topic model (2) literary text (2) digital humanities (2) seed lexicon (2) anchor word (2) probabilistic modeling (2) bias measurement (2) matrix factorization (2) latent topic analysis (2) spectral algorithm (2) image-sentence matching (2)

Papers

Show or Tell? Modeling the evolution of request-making in Human-LLM conversations EACL 2026 Too Long, Didn’t Model: Decomposing LLM Long Context Understanding With Novels EACL 2026 A City of Millions: Mapping Literary Social Networks At Scale NAACL 2025 A Pretrainer’s Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity NAACL 2024 Contextualized Topic Coherence Metrics EACL 2024 [Lions: 1] and [Tigers: 2] and [Bears: 3], Oh My! Literary Coreference Annotation with LLMs EACL 2024 Hyperpolyglot LLMs: Cross-Lingual Interpretability in Token Embeddings EMNLP 2023 Data Similarity is Not Enough to Explain Language Model Performance EMNLP 2023 Modeling Legal Reasoning: LM Annotation at the Edge of Human Agreement EMNLP 2023 On-the-fly Rectification for Robust Large-Vocabulary Topic Inference ICML 2021 Bad Seeds: Evaluating Lexical Methods for Bias Measurement ACL 2021 Comparing Text Representations: A Theory-Driven Approach EMNLP 2021 Bad Seeds: Evaluating Lexical Methods for Bias Measurement IJCNLP 2021 ‘Tecnologica cosa’: Modeling Storyteller Personalities in Boccaccio’s ‘Decameron’ EMNLP 2021 Prior-aware Composition Inference for Spectral Topic Models AISTATS 2020 Domain-Specific Lexical Grounding in Noisy Visual-Textual Documents EMNLP 2020 Practical Correlated Topic Modeling and Analysis via the Rectified Anchor Word Algorithm EMNLP 2019 Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents EMNLP 2019 Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents IJCNLP 2019 Practical Correlated Topic Modeling and Analysis via the Rectified Anchor Word Algorithm IJCNLP 2019 Authorless Topic Models: Biasing Models Away from Known Structure COLING 2018 Quantifying the Visual Concreteness of Words and Topics in Multimodal Datasets NAACL 2018 Quantifying the Effects of Text Duplication on Semantic Models EMNLP 2017 Pulling Out the Stops: Rethinking Stopword Removal for Topic Models EACL 2017 The strange geometry of skip-gram with negative sampling EMNLP 2017 Beyond Exchangeability: The Chinese Voting Process NIPS 2016 Robust Spectral Inference for Joint Stochastic Matrix Factorization NIPS 2015 Evaluation methods for unsupervised word embeddings EMNLP 2015 Low-dimensional Embeddings for Interpretable Anchor-based Topic Inference EMNLP 2014 A Practical Algorithm for Topic Modeling with Provable Guarantees ICML 2013 Optimizing Semantic Coherence in Topic Models EMNLP 2011 Bayesian Checking for Topic Models EMNLP 2011 Polylingual Topic Models EMNLP 2009