Shane Bergsma
29 papers · 2005–2025 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+5 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (20) π Interdisciplinary Bridge π§ Keyword Pioneer π Academic Marathon (20) π Cross-Pollinator (5)
π
Renaissance Researcher
(5)
π
Conference Polyglot
(8)
π
Century Club
(29)
π₯
Unstoppable
(10)
β‘
Prolific Year
(5)
Conferences
ACL (10)
CONLL (4)
NIPS (4)
COLING (3)
EMNLP (3)
NAACL (3)
ICLR (1)
IJCNLP (1)
Top co-authors
Keywords
neural network
(2)
probabilistic forecasting
(2)
neural network optimization
(2)
time series
(2)
autoregressive model
(2)
training dynamics
(1)
sparse training
(1)
sparse neural network
(1)
sparsity level
(1)
model scaling
(1)
transformer training
(1)
model sparsity
(1)
gradient noise scale
(1)
per-example gradient
(1)
layer normalization
(1)
batch size schedule
(1)
batch size
(1)
gradient propagation
(1)
error accumulation
(1)
unstructured sparsity
(1)
Papers
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
ICLR 2025
Sparse maximal update parameterization: A holistic approach to sparse training dynamics
NIPS 2024
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers
NIPS 2024
SutraNets: Sub-series Autoregressive Networks for Long-Sequence, Probabilistic Forecasting
NIPS 2023
C2FAR: Coarse-to-Fine Autoregressive Networks for Precise Probabilistic Forecasting
NIPS 2022
Iβm a Belieber: Social Roles via Self-identification and Conceptual Attributes
ACL 2014
Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter
NAACL 2013
Using Conceptual Class Attributes to Characterize Social Media Users
ACL 2013
Explicit and Implicit Syntactic Features for Text Classification
ACL 2013
Stylometric Analysis of Scientific Articles
NAACL 2012
Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
ACL 2011
Joint Training of Dependency Parsing Filters through Latent Support Vector Machines
ACL 2011
Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
ACL 2010
Fast and Accurate Arc Filtering for Dependency Parsing
COLING 2010
Using Web-scale N-grams to Improve Base NP Parsing Performance
COLING 2010
Improved Natural Language Learning via Variance-Regularization Support Vector Machines
CONLL 2010
Predicting the Semantic Compositionality of Prefix Verbs
EMNLP 2010
Glen, Glenda or Glendale: Unsupervised and Semi-supervised Learning of English Noun Gender
CONLL 2009
A Ranking Approach to Stress Prediction for Letter-to-Phoneme Conversion
ACL 2009
A Ranking Approach to Stress Prediction for Letter-to-Phoneme Conversion
IJCNLP 2009
Discriminative Learning of Selectional Preference from Unlabeled Text
EMNLP 2008
Distributional Identification of Non-Referential Pronouns
ACL 2008
Learning Noun Phrase Query Segmentation
EMNLP 2007
Learning Noun Phrase Query Segmentation
CONLL 2007
Automatic Answer Typing for How-Questions
NAACL 2007
Alignment-Based Discriminative String Similarity
ACL 2007
Bootstrapping Path-Based Pronoun Resolution
ACL 2006
Bootstrapping Path-Based Pronoun Resolution
COLING 2006
An Expectation Maximization Approach to Pronoun Resolution
CONLL 2005