Sanjiv Kumar

103 papers · 2009–2025 · 10 conferences · across top CS/AI conferences

Achievements

+17 more ↓

🗺️ Taxonomy Completionist (18) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (10) 🏠 Conference Loyalist (28) 🌟 Keyword Trendsetter Combo (7) 🤝 Dynamic Duo (24) 👑 Triple Crown 🏆 Grand Slam 🔬 Deep Specialist (16) 🏆 Keyword Champion (3) 🚀 Conference Pioneer 🔥 Unstoppable (14) ⚡ Prolific Year (10) 🗃️ Keyword Collector (277) 💎 Century Club (103) ❓ The Questioner (10) 📈 Trend Setter

Conferences

ICLR (31) NIPS (28) ICML (24) AISTATS (6) CVPR (4) ICCV (3) JMLR (3) EMNLP (2) AAAI (1) ACL (1)

Top co-authors

Ankit Singh Rawat (24) Felix Yu (17) Sashank Reddi (16) Aditya Krishna Menon (15) Ruiqi Guo (14) Sashank J. Reddi (12) Srinadh Bhojanapalli (11) Seungyeon Kim (11) Wittawat Jitkrittum (10) Manzil Zaheer (9)

Keywords

knowledge distillation (5) nearest neighbor search (5) dimensionality reduction (5) random fourier feature (4) model compression (4) stochastic gradient descent (4) binary embedding (4) kernel approximation (3) adaptive gradient method (3) negative sampling (3) differential privacy (3) language model (3) representation learning (3) fast fourier transform (3) information retrieval (3) image retrieval (3) low-rank approximation (3) vector quantization (3) approximate nearest neighbor (3) binary code (3)

Papers

Structured Preconditioners in Adaptive Optimization: A Unified Analysis ICML 2025 Reasoning with Latent Thoughts: On the Power of Looped Transformers ICLR 2025 Better autoregressive regression with LLMs via regression-aware fine-tuning ICLR 2025 LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization ICLR 2025 Efficient stagewise pretraining via progressive subnetworks ICLR 2025 Faster Cascades via Speculative Decoding ICLR 2025 Bipartite Ranking From Multiple Labels: On Loss Versus Label Aggregation ICML 2025 LAuReL: Learned Augmented Residual Layer ICML 2025 Rethinking FID: Towards a Better Evaluation Metric for Image Generation CVPR 2024 Two-stage LLM Fine-tuning with Less Specialization and More Generalization ICLR 2024 On Bias-Variance Alignment in Deep Models ICLR 2024 Tandem Transformers for Inference Efficient LLMs ICML 2024 Accelerating Blockwise Parallel Language Models with Draft Refinement NIPS 2024 On the Inductive Bias of Stacking Towards Improving Reasoning NIPS 2024 Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines ICML 2024 DistillSpec: Improving Speculative Decoding via Knowledge Distillation ICLR 2024 USTAD: Unified Single-model Training Achieving Diverse Scores for Information Retrieval ICML 2024 MarkovGen: Structured Prediction for Efficient Text-to-Image Generation CVPR 2024 Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning? ICML 2024 Regression Aware Inference with LLMs EMNLP 2024 Functional Interpolation for Relative Positions improves Long Context Transformers ICLR 2024 Think before you speak: Training Language Models With Pause Tokens ICLR 2024 Language Model Cascades: Token-Level Uncertainty And Beyond ICLR 2024 Plugin estimators for selective classification with out-of-distribution detection ICLR 2024 Learning to Reject Meets Long-tail Learning ICLR 2024 When Does Confidence-Based Cascade Deferral Suffice? NIPS 2023 Supervision Complexity and its Role in Knowledge Distillation ICLR 2023 Large Language Models with Controllable Working Memory ACL 2023 Automating Nearest Neighbor Search Configuration with Constrained Optimization ICLR 2023 Serving Graph Compression for Graph Neural Networks ICLR 2023 Efficient Training of Language Models using Few-Shot Learning ICML 2023 The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers ICLR 2023 Teacher Guided Training: An Efficient Framework for Knowledge Transfer ICLR 2023 Leveraging Importance Weights in Subset Selection ICLR 2023 SOAR: Improved Indexing for Approximate Nearest Neighbor Search NIPS 2023 On student-teacher deviations in distillation: does it pay to disobey? NIPS 2023 ResMem: Learn what you can and memorize the rest NIPS 2023 Robust Training of Neural Networks Using Scale Invariant Architectures ICML 2022 In defense of dual-encoders for neural ranking ICML 2022 TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s NIPS 2022 Decoupled Context Processing for Context Augmented Language Modeling NIPS 2022 Post-hoc estimators for learning to defer to an expert NIPS 2022 Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces ICML 2021 RankDistil: Knowledge Distillation for Ranking AISTATS 2021 Efficient Training of Retrieval Models using Negative Cache NIPS 2021 Long-tail learning via logit adjustment ICLR 2021 Coping with Label Shift via Distributionally Robust Optimisation ICLR 2021 Evaluations and Methods for Explanation through Robustness Analysis ICLR 2021 Batch Active Learning at Scale NIPS 2021 Overparameterisation and worst-case generalisation: friend or foe? ICLR 2021 A statistical perspective on distillation ICML 2021 Adaptive Federated Optimization ICLR 2021 Multi-Stage Influence Function NIPS 2020 O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers NIPS 2020 Why are Adaptive Methods Good for Attention Models? NIPS 2020 Robust large-margin learning in hyperbolic space NIPS 2020 Learning discrete distributions: user vs item-level privacy NIPS 2020 Tight Analysis of Privacy and Utility Tradeoff in Approximate Differential Privacy AISTATS 2020 How Does Noise Help Robustness? Explanation and Exploration under the Neural SDE Framework CVPR 2020 Semantic Label Smoothing for Sequence to Sequence Problems EMNLP 2020 Learning to Learn by Zeroth-Order Oracle ICLR 2020 Pre-training Tasks for Embedding-based Large-scale Retrieval ICLR 2020 Are Transformers universal approximators of sequence-to-sequence functions? ICLR 2020 Can gradient clipping mitigate label noise? ICLR 2020 Large Batch Optimization for Deep Learning: Training BERT in 76 minutes ICLR 2020 Low-Rank Bottleneck in Multi-head Attention Models ICML 2020 Accelerating Large-Scale Inference with Anisotropic Vector Quantization ICML 2020 Does label smoothing mitigate label noise? ICML 2020 Federated Learning with Only Positive Labels ICML 2020 Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces NIPS 2019 Stochastic Negative Mining for Learning with Large Output Spaces AISTATS 2019 Optimal Noise-Adding Mechanism in Additive Differential Privacy AISTATS 2019 Learning Adaptive Random Features AAAI 2019 Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks ICLR 2019 Escaping Saddle Points with Adaptive Gradient Methods ICML 2019 Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling ICML 2019 Sampled Softmax with Random Fourier Features NIPS 2019 Multilabel reductions: what is my loss optimising? NIPS 2019 cpSGD: Communication-efficient and differentially-private distributed SGD NIPS 2018 Loss Decomposition for Fast Learning in Large Output Spaces ICML 2018 On Binary Embedding using Circulant Matrices JMLR 2018 Adaptive Methods for Nonconvex Optimization NIPS 2018 On the Convergence of Adam and Beyond ICLR 2018 Multiscale Quantization for Fast Similarity Search NIPS 2017 Stochastic Generative Hashing ICML 2017 Distributed Mean Estimation with Limited Communication ICML 2017 Learning Spread-Out Local Feature Descriptors ICCV 2017 Fast Classification with Binary Prototypes AISTATS 2017 Binary embeddings with structured hashed projections ICML 2016 Orthogonal Random Features NIPS 2016 Quantization based Fast Inner Product Search AISTATS 2016 An Exploration of Parameter Redundancy in Deep Networks With Circulant Projections ICCV 2015 Spherical Random Features for Polynomial Kernels NIPS 2015 Structured Transforms for Small-Footprint Deep Learning NIPS 2015 Fast Orthogonal Projection Based on Kronecker Product ICCV 2015 Discrete Graph Hashing NIPS 2014 Circulant Binary Embedding ICML 2014 Large-scale SVD and Manifold Learning JMLR 2013 \proptoSVM for Learning with Label Proportions ICML 2013 Learning Binary Codes for High-Dimensional Data Using Bilinear Projections CVPR 2013 Sampling Methods for the Nyström Method JMLR 2012 Angular Quantization-based Binary Codes for Fast Similarity Search NIPS 2012 Ensemble Nystrom Method NIPS 2009