Sanjiv Kumar
103 papers · 2009–2025 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+17 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (18) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π£ Hot Topic Early Bird
π
Renaissance Researcher
(6)
π
Interdisciplinary Bridge
π
Conference Polyglot
(10)
π
Conference Loyalist
(28)
π
Keyword Trendsetter Combo
(7)
π€
Dynamic Duo
(24)
π
Triple Crown
π
Grand Slam
π¬
Deep Specialist
(16)
π
Keyword Champion
(3)
π
Conference Pioneer
π₯
Unstoppable
(14)
β‘
Prolific Year
(10)
ποΈ
Keyword Collector
(277)
π
Century Club
(103)
β
The Questioner
(10)
π
Trend Setter
Conferences
ICLR (31)
NIPS (28)
ICML (24)
AISTATS (6)
CVPR (4)
ICCV (3)
JMLR (3)
EMNLP (2)
AAAI (1)
ACL (1)
Top co-authors
Keywords
knowledge distillation
(5)
nearest neighbor search
(5)
dimensionality reduction
(5)
random fourier feature
(4)
model compression
(4)
stochastic gradient descent
(4)
binary embedding
(4)
kernel approximation
(3)
adaptive gradient method
(3)
negative sampling
(3)
differential privacy
(3)
language model
(3)
representation learning
(3)
fast fourier transform
(3)
information retrieval
(3)
image retrieval
(3)
low-rank approximation
(3)
vector quantization
(3)
approximate nearest neighbor
(3)
binary code
(3)
Papers
Structured Preconditioners in Adaptive Optimization: A Unified Analysis
ICML 2025
Reasoning with Latent Thoughts: On the Power of Looped Transformers
ICLR 2025
Better autoregressive regression with LLMs via regression-aware fine-tuning
ICLR 2025
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization
ICLR 2025
Efficient stagewise pretraining via progressive subnetworks
ICLR 2025
Faster Cascades via Speculative Decoding
ICLR 2025
Bipartite Ranking From Multiple Labels: On Loss Versus Label Aggregation
ICML 2025
LAuReL: Learned Augmented Residual Layer
ICML 2025
Rethinking FID: Towards a Better Evaluation Metric for Image Generation
CVPR 2024
Two-stage LLM Fine-tuning with Less Specialization and More Generalization
ICLR 2024
On Bias-Variance Alignment in Deep Models
ICLR 2024
Tandem Transformers for Inference Efficient LLMs
ICML 2024
Accelerating Blockwise Parallel Language Models with Draft Refinement
NIPS 2024
On the Inductive Bias of Stacking Towards Improving Reasoning
NIPS 2024
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines
ICML 2024
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
ICLR 2024
USTAD: Unified Single-model Training Achieving Diverse Scores for Information Retrieval
ICML 2024
MarkovGen: Structured Prediction for Efficient Text-to-Image Generation
CVPR 2024
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
ICML 2024
Regression Aware Inference with LLMs
EMNLP 2024
Functional Interpolation for Relative Positions improves Long Context Transformers
ICLR 2024
Think before you speak: Training Language Models With Pause Tokens
ICLR 2024
Language Model Cascades: Token-Level Uncertainty And Beyond
ICLR 2024
Plugin estimators for selective classification with out-of-distribution detection
ICLR 2024
Learning to Reject Meets Long-tail Learning
ICLR 2024
When Does Confidence-Based Cascade Deferral Suffice?
NIPS 2023
Supervision Complexity and its Role in Knowledge Distillation
ICLR 2023
Large Language Models with Controllable Working Memory
ACL 2023
Automating Nearest Neighbor Search Configuration with Constrained Optimization
ICLR 2023
Serving Graph Compression for Graph Neural Networks
ICLR 2023
Efficient Training of Language Models using Few-Shot Learning
ICML 2023
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers
ICLR 2023
Teacher Guided Training: An Efficient Framework for Knowledge Transfer
ICLR 2023
Leveraging Importance Weights in Subset Selection
ICLR 2023
SOAR: Improved Indexing for Approximate Nearest Neighbor Search
NIPS 2023
On student-teacher deviations in distillation: does it pay to disobey?
NIPS 2023
ResMem: Learn what you can and memorize the rest
NIPS 2023
Robust Training of Neural Networks Using Scale Invariant Architectures
ICML 2022
In defense of dual-encoders for neural ranking
ICML 2022
TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s
NIPS 2022
Decoupled Context Processing for Context Augmented Language Modeling
NIPS 2022
Post-hoc estimators for learning to defer to an expert
NIPS 2022
Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces
ICML 2021
RankDistil: Knowledge Distillation for Ranking
AISTATS 2021
Efficient Training of Retrieval Models using Negative Cache
NIPS 2021
Long-tail learning via logit adjustment
ICLR 2021
Coping with Label Shift via Distributionally Robust Optimisation
ICLR 2021
Evaluations and Methods for Explanation through Robustness Analysis
ICLR 2021
Batch Active Learning at Scale
NIPS 2021
Overparameterisation and worst-case generalisation: friend or foe?
ICLR 2021
A statistical perspective on distillation
ICML 2021
Adaptive Federated Optimization
ICLR 2021
Multi-Stage Influence Function
NIPS 2020
O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers
NIPS 2020
Why are Adaptive Methods Good for Attention Models?
NIPS 2020
Robust large-margin learning in hyperbolic space
NIPS 2020
Learning discrete distributions: user vs item-level privacy
NIPS 2020
Tight Analysis of Privacy and Utility Tradeoff in Approximate Differential Privacy
AISTATS 2020
How Does Noise Help Robustness? Explanation and Exploration under the Neural SDE Framework
CVPR 2020
Semantic Label Smoothing for Sequence to Sequence Problems
EMNLP 2020
Learning to Learn by Zeroth-Order Oracle
ICLR 2020
Pre-training Tasks for Embedding-based Large-scale Retrieval
ICLR 2020
Are Transformers universal approximators of sequence-to-sequence functions?
ICLR 2020
Can gradient clipping mitigate label noise?
ICLR 2020
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
ICLR 2020
Low-Rank Bottleneck in Multi-head Attention Models
ICML 2020
Accelerating Large-Scale Inference with Anisotropic Vector Quantization
ICML 2020
Does label smoothing mitigate label noise?
ICML 2020
Federated Learning with Only Positive Labels
ICML 2020
Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces
NIPS 2019
Stochastic Negative Mining for Learning with Large Output Spaces
AISTATS 2019
Optimal Noise-Adding Mechanism in Additive Differential Privacy
AISTATS 2019
Learning Adaptive Random Features
AAAI 2019
Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks
ICLR 2019
Escaping Saddle Points with Adaptive Gradient Methods
ICML 2019
Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling
ICML 2019
Sampled Softmax with Random Fourier Features
NIPS 2019
Multilabel reductions: what is my loss optimising?
NIPS 2019
cpSGD: Communication-efficient and differentially-private distributed SGD
NIPS 2018
Loss Decomposition for Fast Learning in Large Output Spaces
ICML 2018
On Binary Embedding using Circulant Matrices
JMLR 2018
Adaptive Methods for Nonconvex Optimization
NIPS 2018
On the Convergence of Adam and Beyond
ICLR 2018
Multiscale Quantization for Fast Similarity Search
NIPS 2017
Stochastic Generative Hashing
ICML 2017
Distributed Mean Estimation with Limited Communication
ICML 2017
Learning Spread-Out Local Feature Descriptors
ICCV 2017
Fast Classification with Binary Prototypes
AISTATS 2017
Binary embeddings with structured hashed projections
ICML 2016
Orthogonal Random Features
NIPS 2016
Quantization based Fast Inner Product Search
AISTATS 2016
An Exploration of Parameter Redundancy in Deep Networks With Circulant Projections
ICCV 2015
Spherical Random Features for Polynomial Kernels
NIPS 2015
Structured Transforms for Small-Footprint Deep Learning
NIPS 2015
Fast Orthogonal Projection Based on Kronecker Product
ICCV 2015
Discrete Graph Hashing
NIPS 2014
Circulant Binary Embedding
ICML 2014
Large-scale SVD and Manifold Learning
JMLR 2013
\proptoSVM for Learning with Label Proportions
ICML 2013
Learning Binary Codes for High-Dimensional Data Using Bilinear Projections
CVPR 2013
Sampling Methods for the NystrΓΆm Method
JMLR 2012
Angular Quantization-based Binary Codes for Fast Similarity Search
NIPS 2012
Ensemble Nystrom Method
NIPS 2009