Martin Jaggi

104 papers · 2012–2026 · 15 conferences · across top CS/AI conferences

Achievements

+17 more ↓

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🗺️ Taxonomy Completionist (11) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (15)

🗺️ Taxonomy Completionist (11) 🌍 Conference Polyglot (15) 🧭 Keyword Pioneer 🌟 Keyword Trendsetter Combo (4) 🏠 Conference Loyalist (28) 👥 Mega-Team (24) 🏆 Keyword Champion (2) 👑 Triple Crown 🔬 Deep Specialist (31) 🤝 Dynamic Duo (12) 🏆 Grand Slam 🗃️ Keyword Collector (60) 🚀 Conference Pioneer 🔥 Unstoppable (14) ⚡ Prolific Year (13) 📈 Trend Setter 💎 Century Club (103)

Conferences

NIPS (28) ICML (26) ICLR (14) AISTATS (12) ACL (6) NAACL (3) SEMEVAL (3) AAAI (2) EMNLP (2) IJCNLP (2) JMLR (2) COLT (1) CONLL (1) ICCV (1) INTERSPEECH (1)

Top co-authors

Sebastian Stich (12) Sai Praneeth Karimireddy (12) Sebastian U Stich (11) Matteo Pagliardini (9) Tao LIN (9) Thijs Vogels (8) Anastasia Koloskova (7) Amirkeivan Mohtashami (7) Prakhar Gupta (6) Nikita Doikov (6)

Keywords

distributed optimization (12) communication efficiency (10) stochastic gradient descent (10) convex optimization (9) federated learning (8) distributed learning (7) representation learning (6) coordinate descent (6) decentralized learning (6) model compression (6) sentence embedding (5) neural network (5) knowledge distillation (5) gradient compression (5) neural network training (4) unsupervised learning (4) word embedding (4) neural network optimization (4) stochastic optimization (3) decentralized optimization (3)

Papers

Apertus: Democratizing Open and Compliant LLMs for Global Language Environments ACL 2026 Attention with Markov: A Curious Case of Single-layer Transformers ICLR 2025 Effective Interplay between Sparsity and Quantization: From Theory to Practice ICLR 2025 CoTFormer: A Chain of Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference ICLR 2025 On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists ICML 2025 Improving Stochastic Cubic Newton with Momentum AISTATS 2025 Intrinsic User-Centric Interpretability through Global Mixture of Experts ICLR 2025 DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging NIPS 2024 Ghost Noise for Regularizing Deep Neural Networks AAAI 2024 Spectral Preconditioning for Gradient Methods on Graded Non-convex Functions ICML 2024 The Privacy Power of Correlated Noise in Decentralized Learning ICML 2024 Layer-wise linear mode connectivity ICLR 2024 Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks ICML 2024 LASER: Linear Compression in Wireless Distributed Optimization ICML 2024 On Convergence of Incremental Gradient for Non-convex Smooth Functions ICML 2024 DOGE: Domain Reweighting with Generalization Estimation ICML 2024 Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training NIPS 2024 CoBo: Collaborative Learning via Bilevel Optimization NIPS 2024 Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations NIPS 2024 QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs NIPS 2024 Agree to Disagree: Diversity through Disagreement for Better Transferability ICLR 2023 Linearization Algorithms for Fully Composite Optimization COLT 2023 Special Properties of Gradient Descent with Large Learning Rates ICML 2023 Second-Order Optimization with Lazy Hessians ICML 2023 Collaborative Learning via Prediction Consensus NIPS 2023 Multiplication-Free Transformer Training via Piecewise Affine Operations NIPS 2023 MultiMoDN—Multimodal, Multi-Task, Interpretable Modular Networks NIPS 2023 Random-Access Infinite Context Length for Transformers NIPS 2023 Fast Attention Over Long Sequences With Dynamic Sparse Flash Attention NIPS 2023 Beyond Spectral Gap: The Role of the Topology in Decentralized Learning JMLR 2023 SIMSUM: Document-level Text Simplification via Simultaneous Summarization ACL 2023 Beyond spectral gap: the role of the topology in decentralized learning NIPS 2022 SKILL: Structured Knowledge Infusion for Large Language Models NAACL 2022 Masked Training of Neural Networks with Partial Gradients AISTATS 2022 Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing ICLR 2022 FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings NIPS 2022 Implicit Gradient Alignment in Distributed and Federated Learning AAAI 2022 Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning NIPS 2022 RelaySum for Decentralized Deep Learning on Heterogeneous Data NIPS 2021 Breaking the centralized barrier for cross-device federated learning NIPS 2021 Lightweight Cross-Lingual Sentence Representation Learning ACL 2021 Obtaining Better Static Word Embeddings Using Contextual Embedding Models ACL 2021 LENA: Communication-Efficient Distributed Learning with Self-Triggered Gradient Uploads AISTATS 2021 Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates AISTATS 2021 A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free! AISTATS 2021 Self-Supervised Neural Topic Modeling EMNLP 2021 Semantic Perturbations With Normalizing Flows for Improved Generalization ICCV 2021 Taming GANs with Lookahead-Minmax ICLR 2021 Understanding the effects of data parallelism and sparsity on neural network training ICLR 2021 Exact Optimization of Conformal Predictors via Incremental and Decremental Learning ICML 2021 Learning from History for Byzantine Robust Optimization ICML 2021 Consensus Control for Decentralized Deep Learning ICML 2021 Quasi-global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data ICML 2021 Lightweight Cross-Lingual Sentence Representation Learning IJCNLP 2021 Obtaining Better Static Word Embeddings Using Contextual Embedding Models IJCNLP 2021 Optimizer Benchmarking Needs to Account for Hyperparameter Tuning ICML 2020 Dynamic Model Pruning with Feedback ICLR 2020 Masking as an Efficient Alternative to Finetuning for Pretrained Language Models EMNLP 2020 Don't Use Large Mini-batches, Use Local SGD ICLR 2020 Linearly Convergent Frank-Wolfe with Backtracking Line-Search AISTATS 2020 Context Mover’s Distance & Barycenters: Optimal Transport of Contexts for Building Representations AISTATS 2020 Evaluating The Search Phase of Neural Architecture Search ICLR 2020 Decentralized Deep Learning with Arbitrary Communication Compression ICLR 2020 A Unified Theory of Decentralized SGD with Changing Topology and Local Updates ICML 2020 Extrapolation for Large-batch Training in Deep Learning ICML 2020 Ensemble Distillation for Robust Model Fusion in Federated Learning NIPS 2020 Practical Low-Rank Communication Compression in Decentralized Deep Learning NIPS 2020 Model Fusion via Optimal Transport NIPS 2020 On the Relationship between Self-Attention and Convolutional Layers ICLR 2020 Better Word Embeddings by Disentangling Contextual n-Gram Information NAACL 2019 Error Feedback Fixes SignSGD and other Gradient Compression Schemes ICML 2019 Efficient Greedy Coordinate Descent for Composite Problems AISTATS 2019 Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication ICML 2019 Overcoming Multi-model Forgetting ICML 2019 Unsupervised Scalable Representation Learning for Multivariate Time Series NIPS 2019 Open-Vocabulary Keyword Spotting with Audio and Text Embeddings INTERSPEECH 2019 Correlating Twitter Language with Community-Level Health Outcomes ACL 2019 PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization NIPS 2019 Simple Unsupervised Keyphrase Extraction using Sentence Embeddings CONLL 2018 Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features NAACL 2018 Adaptive balancing of gradient and update computation times using global geometry and approximate subproblems AISTATS 2018 A Distributed Second-Order Algorithm You Can Trust ICML 2018 On Matching Pursuit and Coordinate Descent ICML 2018 Sparsified SGD with Memory NIPS 2018 Training DNNs with Hybrid Block Floating Point NIPS 2018 COLA: Decentralized Linear Learning NIPS 2018 CoCoA: A General Framework for Communication-Efficient Distributed Optimization JMLR 2018 Safe Adaptive Importance Sampling NIPS 2017 Generating Steganographic Text with LSTMs ACL 2017 A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe AISTATS 2017 Faster Coordinate Descent via Adaptive Importance Sampling AISTATS 2017 Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems NIPS 2017 Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees NIPS 2017 Approximate Steepest Coordinate Descent ICML 2017 Primal-Dual Rates and Certificates ICML 2016 SwissCheese at SemEval-2016 Task 4: Sentiment Classification Using an Ensemble of Convolutional Neural Networks with Distant Supervision SEMEVAL 2016 On the Global Linear Convergence of Frank-Wolfe Optimization Variants NIPS 2015 Swiss-Chocolate: Combining Flipout Regularization and Random Forests with Artificially Built Subsystems to Boost Text-Classification for Sentiment SEMEVAL 2015 Adding vs. Averaging in Distributed Primal-Dual Optimization ICML 2015 Swiss-Chocolate: Sentiment Detection using Sparse SVMs and Part-Of-Speech n-Grams SEMEVAL 2014 Communication-Efficient Distributed Dual Coordinate Ascent NIPS 2014 Block-Coordinate Frank-Wolfe Optimization for Structural SVMs ICML 2013 Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization ICML 2013 Regularization Paths with Guarantees for Convex Semidefinite Optimization AISTATS 2012