Shane Bergsma

29 papers · 2005–2025 · 8 conferences · across top CS/AI conferences

Achievements

+5 more ↓

🗺️ Taxonomy Completionist (20) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🏃 Academic Marathon (20) 🐝 Cross-Pollinator (5)

🌈 Renaissance Researcher (5) 🌍 Conference Polyglot (8) 💎 Century Club (29) 🔥 Unstoppable (10) ⚡ Prolific Year (5)

Conferences

ACL (10) CONLL (4) NIPS (4) COLING (3) EMNLP (3) NAACL (3) ICLR (1) IJCNLP (1)

Top co-authors

Dekang Lin (8) Grzegorz Kondrak (4) Randy Goebel (3) Joel Hestness (3) Colin Cherry (3) Benjamin Van Durme (3) David Yarowsky (3) Emily Pitler (2) Tim Zeyl (2) Gavia Gray (2)

Keywords

neural network (2) probabilistic forecasting (2) neural network optimization (2) time series (2) autoregressive model (2) training dynamics (1) sparse training (1) sparse neural network (1) sparsity level (1) model scaling (1) transformer training (1) model sparsity (1) gradient noise scale (1) per-example gradient (1) layer normalization (1) batch size schedule (1) batch size (1) gradient propagation (1) error accumulation (1) unstructured sparsity (1)

Papers

Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs ICLR 2025 Sparse maximal update parameterization: A holistic approach to sparse training dynamics NIPS 2024 Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers NIPS 2024 SutraNets: Sub-series Autoregressive Networks for Long-Sequence, Probabilistic Forecasting NIPS 2023 C2FAR: Coarse-to-Fine Autoregressive Networks for Precise Probabilistic Forecasting NIPS 2022 I’m a Belieber: Social Roles via Self-identification and Conceptual Attributes ACL 2014 Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter NAACL 2013 Using Conceptual Class Attributes to Characterize Social Media Users ACL 2013 Explicit and Implicit Syntactic Features for Text Classification ACL 2013 Stylometric Analysis of Scientific Articles NAACL 2012 Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation ACL 2011 Joint Training of Dependency Parsing Filters through Latent Support Vector Machines ACL 2011 Creating Robust Supervised Classifiers via Web-Scale N-Gram Data ACL 2010 Fast and Accurate Arc Filtering for Dependency Parsing COLING 2010 Using Web-scale N-grams to Improve Base NP Parsing Performance COLING 2010 Improved Natural Language Learning via Variance-Regularization Support Vector Machines CONLL 2010 Predicting the Semantic Compositionality of Prefix Verbs EMNLP 2010 Glen, Glenda or Glendale: Unsupervised and Semi-supervised Learning of English Noun Gender CONLL 2009 A Ranking Approach to Stress Prediction for Letter-to-Phoneme Conversion ACL 2009 A Ranking Approach to Stress Prediction for Letter-to-Phoneme Conversion IJCNLP 2009 Discriminative Learning of Selectional Preference from Unlabeled Text EMNLP 2008 Distributional Identification of Non-Referential Pronouns ACL 2008 Learning Noun Phrase Query Segmentation EMNLP 2007 Learning Noun Phrase Query Segmentation CONLL 2007 Automatic Answer Typing for How-Questions NAACL 2007 Alignment-Based Discriminative String Similarity ACL 2007 Bootstrapping Path-Based Pronoun Resolution ACL 2006 Bootstrapping Path-Based Pronoun Resolution COLING 2006 An Expectation Maximization Approach to Pronoun Resolution CONLL 2005