Martin Jaggi
104 papers · 2012–2026 · 15 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+17 more ↓ Show less ↑
π§ Keyword Pioneer π£ Hot Topic Early Bird πΊοΈ Taxonomy Completionist (11) π Interdisciplinary Bridge π Conference Polyglot (15)
πΊοΈ
Taxonomy Completionist
(11)
π
Conference Polyglot
(15)
π§
Keyword Pioneer
π
Keyword Trendsetter Combo
(4)
π
Conference Loyalist
(28)
π₯
Mega-Team
(24)
π
Keyword Champion
(2)
π
Triple Crown
π¬
Deep Specialist
(31)
π€
Dynamic Duo
(12)
π
Grand Slam
ποΈ
Keyword Collector
(60)
π
Conference Pioneer
π₯
Unstoppable
(14)
β‘
Prolific Year
(13)
π
Trend Setter
π
Century Club
(103)
Conferences
NIPS (28)
ICML (26)
ICLR (14)
AISTATS (12)
ACL (6)
NAACL (3)
SEMEVAL (3)
AAAI (2)
EMNLP (2)
IJCNLP (2)
JMLR (2)
COLT (1)
CONLL (1)
ICCV (1)
INTERSPEECH (1)
Top co-authors
Keywords
distributed optimization
(12)
communication efficiency
(10)
stochastic gradient descent
(10)
convex optimization
(9)
federated learning
(8)
distributed learning
(7)
representation learning
(6)
coordinate descent
(6)
decentralized learning
(6)
model compression
(6)
sentence embedding
(5)
neural network
(5)
knowledge distillation
(5)
gradient compression
(5)
neural network training
(4)
unsupervised learning
(4)
word embedding
(4)
neural network optimization
(4)
stochastic optimization
(3)
decentralized optimization
(3)
Papers
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
ACL 2026
Attention with Markov: A Curious Case of Single-layer Transformers
ICLR 2025
Effective Interplay between Sparsity and Quantization: From Theory to Practice
ICLR 2025
CoTFormer: A Chain of Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference
ICLR 2025
On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists
ICML 2025
Improving Stochastic Cubic Newton with Momentum
AISTATS 2025
Intrinsic User-Centric Interpretability through Global Mixture of Experts
ICLR 2025
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
NIPS 2024
Ghost Noise for Regularizing Deep Neural Networks
AAAI 2024
Spectral Preconditioning for Gradient Methods on Graded Non-convex Functions
ICML 2024
The Privacy Power of Correlated Noise in Decentralized Learning
ICML 2024
Layer-wise linear mode connectivity
ICLR 2024
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
ICML 2024
LASER: Linear Compression in Wireless Distributed Optimization
ICML 2024
On Convergence of Incremental Gradient for Non-convex Smooth Functions
ICML 2024
DOGE: Domain Reweighting with Generalization Estimation
ICML 2024
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
NIPS 2024
CoBo: Collaborative Learning via Bilevel Optimization
NIPS 2024
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
NIPS 2024
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
NIPS 2024
Agree to Disagree: Diversity through Disagreement for Better Transferability
ICLR 2023
Linearization Algorithms for Fully Composite Optimization
COLT 2023
Special Properties of Gradient Descent with Large Learning Rates
ICML 2023
Second-Order Optimization with Lazy Hessians
ICML 2023
Collaborative Learning via Prediction Consensus
NIPS 2023
Multiplication-Free Transformer Training via Piecewise Affine Operations
NIPS 2023
MultiMoDNβMultimodal, Multi-Task, Interpretable Modular Networks
NIPS 2023
Random-Access Infinite Context Length for Transformers
NIPS 2023
Fast Attention Over Long Sequences With Dynamic Sparse Flash Attention
NIPS 2023
Beyond Spectral Gap: The Role of the Topology in Decentralized Learning
JMLR 2023
SIMSUM: Document-level Text Simplification via Simultaneous Summarization
ACL 2023
Beyond spectral gap: the role of the topology in decentralized learning
NIPS 2022
SKILL: Structured Knowledge Infusion for Large Language Models
NAACL 2022
Masked Training of Neural Networks with Partial Gradients
AISTATS 2022
Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing
ICLR 2022
FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings
NIPS 2022
Implicit Gradient Alignment in Distributed and Federated Learning
AAAI 2022
Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning
NIPS 2022
RelaySum for Decentralized Deep Learning on Heterogeneous Data
NIPS 2021
Breaking the centralized barrier for cross-device federated learning
NIPS 2021
Lightweight Cross-Lingual Sentence Representation Learning
ACL 2021
Obtaining Better Static Word Embeddings Using Contextual Embedding Models
ACL 2021
LENA: Communication-Efficient Distributed Learning with Self-Triggered Gradient Uploads
AISTATS 2021
Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates
AISTATS 2021
A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!
AISTATS 2021
Self-Supervised Neural Topic Modeling
EMNLP 2021
Semantic Perturbations With Normalizing Flows for Improved Generalization
ICCV 2021
Taming GANs with Lookahead-Minmax
ICLR 2021
Understanding the effects of data parallelism and sparsity on neural network training
ICLR 2021
Exact Optimization of Conformal Predictors via Incremental and Decremental Learning
ICML 2021
Learning from History for Byzantine Robust Optimization
ICML 2021
Consensus Control for Decentralized Deep Learning
ICML 2021
Quasi-global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data
ICML 2021
Lightweight Cross-Lingual Sentence Representation Learning
IJCNLP 2021
Obtaining Better Static Word Embeddings Using Contextual Embedding Models
IJCNLP 2021
Optimizer Benchmarking Needs to Account for Hyperparameter Tuning
ICML 2020
Dynamic Model Pruning with Feedback
ICLR 2020
Masking as an Efficient Alternative to Finetuning for Pretrained Language Models
EMNLP 2020
Don't Use Large Mini-batches, Use Local SGD
ICLR 2020
Linearly Convergent Frank-Wolfe with Backtracking Line-Search
AISTATS 2020
Context Moverβs Distance & Barycenters: Optimal Transport of Contexts for Building Representations
AISTATS 2020
Evaluating The Search Phase of Neural Architecture Search
ICLR 2020
Decentralized Deep Learning with Arbitrary Communication Compression
ICLR 2020
A Unified Theory of Decentralized SGD with Changing Topology and Local Updates
ICML 2020
Extrapolation for Large-batch Training in Deep Learning
ICML 2020
Ensemble Distillation for Robust Model Fusion in Federated Learning
NIPS 2020
Practical Low-Rank Communication Compression in Decentralized Deep Learning
NIPS 2020
Model Fusion via Optimal Transport
NIPS 2020
On the Relationship between Self-Attention and Convolutional Layers
ICLR 2020
Better Word Embeddings by Disentangling Contextual n-Gram Information
NAACL 2019
Error Feedback Fixes SignSGD and other Gradient Compression Schemes
ICML 2019
Efficient Greedy Coordinate Descent for Composite Problems
AISTATS 2019
Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication
ICML 2019
Overcoming Multi-model Forgetting
ICML 2019
Unsupervised Scalable Representation Learning for Multivariate Time Series
NIPS 2019
Open-Vocabulary Keyword Spotting with Audio and Text Embeddings
INTERSPEECH 2019
Correlating Twitter Language with Community-Level Health Outcomes
ACL 2019
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization
NIPS 2019
Simple Unsupervised Keyphrase Extraction using Sentence Embeddings
CONLL 2018
Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features
NAACL 2018
Adaptive balancing of gradient and update computation times using global geometry and approximate subproblems
AISTATS 2018
A Distributed Second-Order Algorithm You Can Trust
ICML 2018
On Matching Pursuit and Coordinate Descent
ICML 2018
Sparsified SGD with Memory
NIPS 2018
Training DNNs with Hybrid Block Floating Point
NIPS 2018
COLA: Decentralized Linear Learning
NIPS 2018
CoCoA: A General Framework for Communication-Efficient Distributed Optimization
JMLR 2018
Safe Adaptive Importance Sampling
NIPS 2017
Generating Steganographic Text with LSTMs
ACL 2017
A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe
AISTATS 2017
Faster Coordinate Descent via Adaptive Importance Sampling
AISTATS 2017
Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems
NIPS 2017
Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees
NIPS 2017
Approximate Steepest Coordinate Descent
ICML 2017
Primal-Dual Rates and Certificates
ICML 2016
SwissCheese at SemEval-2016 Task 4: Sentiment Classification Using an Ensemble of Convolutional Neural Networks with Distant Supervision
SEMEVAL 2016
On the Global Linear Convergence of Frank-Wolfe Optimization Variants
NIPS 2015
Swiss-Chocolate: Combining Flipout Regularization and Random Forests with Artificially Built Subsystems to Boost Text-Classification for Sentiment
SEMEVAL 2015
Adding vs. Averaging in Distributed Primal-Dual Optimization
ICML 2015
Swiss-Chocolate: Sentiment Detection using Sparse SVMs and Part-Of-Speech n-Grams
SEMEVAL 2014
Communication-Efficient Distributed Dual Coordinate Ascent
NIPS 2014
Block-Coordinate Frank-Wolfe Optimization for Structural SVMs
ICML 2013
Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization
ICML 2013
Regularization Paths with Guarantees for Convex Semidefinite Optimization
AISTATS 2012