Dale Schuurmans
124 papers · 2002–2025 · 15 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+19 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (43) π Interdisciplinary Bridge π Renaissance Researcher (7) π£ Hot Topic Early Bird
πΊοΈ
Taxonomy Completionist
(43)
π£
Hot Topic Early Bird
π
Academic Marathon
(23)
π
Conference Loyalist
(45)
π
Keyword Trendsetter Combo
(9)
π
The Namer
π
Keyword Champion
π
Triple Crown
π±
Topic Pioneer
π¬
Deep Specialist
(17)
π€
Dynamic Duo
(39)
π
Grand Slam
ποΈ
Keyword Collector
(204)
β
The Questioner
π
Trend Setter
π
Conference Pioneer
π₯
Unstoppable
(14)
β‘
Prolific Year
(14)
π
Century Club
(124)
Conferences
NIPS (45)
ICML (29)
ICLR (21)
AISTATS (11)
IJCAI (5)
ACL (2)
COLING (2)
CONLL (2)
AAAI (1)
ACML (1)
EACL (1)
ICCV (1)
JMLR (1)
NAACL (1)
UAI (1)
Top co-authors
Research topics
Keywords
convex optimization
(10)
reinforcement learning
(9)
neural network
(6)
representation learning
(5)
variational inference
(5)
policy learning
(5)
stochastic optimization
(4)
markov chain monte carlo
(4)
function approximation
(4)
policy optimization
(4)
value iteration
(4)
stationary distribution
(4)
markov decision process
(4)
sample efficiency
(4)
policy gradient
(4)
value function
(4)
off-policy evaluation
(4)
generative model
(4)
neural network optimization
(3)
deep learning
(3)
Papers
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
ICML 2025
Plastic Learning with Deep Fourier Features
ICLR 2025
Toward Understanding In-context vs. In-weight Learning
ICLR 2025
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment
AISTATS 2025
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
ICLR 2025
Improving Large Language Model Planning with Action Sequence Similarity
ICLR 2025
Learning Continually by Spectral Regularization
ICLR 2025
Position: Video as the New Language for Real-World Decision Making
ICML 2024
UQE: A Query Engine for Unstructured Databases
NIPS 2024
Generative Hierarchical Materials Search
NIPS 2024
Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates
NIPS 2024
Learning Interactive Real-World Simulators
ICLR 2024
Scalable Diffusion for Materials Generation
ICLR 2024
Probabilistic Adaptation of Black-Box Text-to-Video Models
ICLR 2024
Provable Representation with Efficient Planning for Partially Observable Reinforcement Learning
ICML 2024
Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation
ICML 2024
Energy-based Predictive Representations for Partially Observed Reinforcement Learning
UAI 2023
Any-scale Balanced Samplers for Discrete Space
ICLR 2023
Latent Variable Representation for Reinforcement Learning
ICLR 2023
Revisiting Sampling for Combinatorial Optimization
ICML 2023
TEMPERA: Test-Time Prompt Editing via Reinforcement Learning
ICLR 2023
Dichotomy of Control: Separating What You Can Control from What You Cannot
ICLR 2023
ββWhat learning algorithm is in-context learning? Investigations with linear models
ICLR 2023
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
ICLR 2023
Self-Consistency Improves Chain of Thought Reasoning in Language Models
ICLR 2023
DISCS: A Benchmark for Discrete Sampling
NIPS 2023
Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-off
NIPS 2023
Ordering-based Conditions for Global Convergence of Policy Gradient Methods
NIPS 2023
Learning Universal Policies via Text-Guided Video Generation
NIPS 2023
Discrete Langevin Samplers via Wasserstein Gradient Flow
AISTATS 2023
Learning to Optimize with Stochastic Dominance Constraints
AISTATS 2023
Gradient-Free Structured Pruning with Unlabeled Data
ICML 2023
Stochastic Gradient Succeeds for Bandits
ICML 2023
Spectral Decomposition Representation for Reinforcement Learning
ICLR 2023
Score-based Continuous-time Discrete Diffusion Models
ICLR 2023
Offline Policy Selection under Uncertainty
AISTATS 2022
Neural Stochastic Dual Dynamic Programming
ICLR 2022
Understanding and Leveraging Overparameterization in Recursive Value Estimation
ICLR 2022
Making Linear MDPs Practical via Contrastive Representation Learning
ICML 2022
A Parametric Class of Approximate Gradient Updates for Policy Optimization
ICML 2022
On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games
NIPS 2022
The Role of Baselines in Policy Gradient Optimization
NIPS 2022
Optimal Scaling for Locally Balanced Proposals in Discrete Spaces
NIPS 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
NIPS 2022
Chain of Thought Imitation with Procedure Cloning
NIPS 2022
A Simple Decentralized Cross-Entropy Method
NIPS 2022
Marginal Distribution Adaptation for Discrete Sets via Module-Oriented Divergence Minimization
ICML 2022
The Curse of Passive Data Collection in Batch Reinforcement Learning
AISTATS 2022
Characterizing the Gap Between Actor-Critic and Policy Gradient
ICML 2021
LEGO: Latent Execution-Guided Reasoning for Multi-Hop Question Answering on Knowledge Graphs
ICML 2021
Leveraging Non-uniformity in First-order Non-convex Optimization
ICML 2021
Deep Probabilistic Canonical Correlation Analysis
AAAI 2021
Combiner: Full Attention Transformer with Sparse Computation Cost
NIPS 2021
Understanding the Effect of Stochasticity in Policy Optimization
NIPS 2021
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
ICML 2021
On the Optimality of Batch Policy Optimization Algorithms
ICML 2021
GenDICE: Generalized Offline Estimation of Stationary Values
ICLR 2020
An Optimistic Perspective on Offline Reinforcement Learning
ICML 2020
Off-Policy Evaluation via the Regularized Lagrangian
NIPS 2020
CoinDICE: Off-Policy Confidence Interval Estimation
NIPS 2020
Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration
NIPS 2020
A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs
NIPS 2020
Escaping the Gravitational Pull of Softmax
NIPS 2020
Go Wide, Then Narrow: Efficient Training of Deep Thin Networks
ICML 2020
Energy-Based Processes for Exchangeable Data
ICML 2020
Domain Aggregation Networks for Multi-Source Domain Adaptation
ICML 2020
Batch Stationary Distribution Estimation
ICML 2020
ConQUR: Mitigating Delusional Bias in Deep Q-Learning
ICML 2020
On the Global Convergence Rates of Softmax Policy Gradient Methods
ICML 2020
Scalable Deep Generative Modeling for Sparse Graphs
ICML 2020
Understanding the Impact of Entropy on Policy Optimization
ICML 2019
The Value Function Polytope in Reinforcement Learning
ICML 2019
Kernel Exponential Family Estimation via Doubly Dual Embedding
AISTATS 2019
On Principled Entropy Exploration in Policy Optimization
IJCAI 2019
Invertible Convolutional Flow
NIPS 2019
Surrogate Objectives for Batch Policy Optimization in One-step Decision Making
NIPS 2019
Maximum Entropy Monte-Carlo Planning
NIPS 2019
Exponential Family Estimation via Adversarial Dynamics Embedding
NIPS 2019
A Geometric Perspective on Optimal Representations for Reinforcement Learning
NIPS 2019
Advantage Amplification in Slowly Evolving Latent-State Environments
IJCAI 2019
Learning to Generalize from Sparse and Underspecified Rewards
ICML 2019
Trust-PCL: An Off-Policy Trust Region Method for Continuous Control
ICLR 2018
Variational Rejection Sampling
AISTATS 2018
Non-delusional Q-learning and value-iteration
NIPS 2018
Planning and Learning with Stochastic Action Sets
IJCAI 2018
Smoothed Action Value Functions for Learning Gaussian Policies
ICML 2018
Multi-view Matrix Factorization for Linear Dynamical System Estimation
NIPS 2017
Generalized Conditional Gradient for Sparse Estimation
JMLR 2017
Bridging the Gap Between Value and Policy Based Reinforcement Learning
NIPS 2017
Logistic Markov Decision Processes
IJCAI 2017
Reward Augmented Maximum Likelihood for Neural Structured Prediction
NIPS 2016
Deep Learning Games
NIPS 2016
Stochastic Neural Networks with Monotonic Activation Functions
AISTATS 2016
Scalable and Sound Low-Rank Tensor Learning
AISTATS 2016
Variance Reduction via Antithetic Markov Chains
AISTATS 2015
Correcting Covariate Shift with the Frank-Wolfe Algorithm
IJCAI 2015
Semi-Supervised Zero-Shot Classification With Label Representation Learning
ICCV 2015
Embedding Inference for Structured Multilabel Prediction
NIPS 2015
Adaptive Monte Carlo via Bandit Allocation
ICML 2014
Convex Deep Learning via Normalized Kernels
NIPS 2014
Characterizing the Representer Theorem
ICML 2013
Learning a Metric Space for Neighbourhood Topology Estimation: Application to Manifold Learning
ACML 2013
Polar Operators for Structured Sparse Estimation
NIPS 2013
Convex Two-Layer Modeling
NIPS 2013
A Polynomial-time Form of Robust Regression
NIPS 2012
Generalized Optimal Reverse Prediction
AISTATS 2012
Convex Multi-view Subspace Learning
NIPS 2012
Accelerated Training for Matrix-norm Regularization: A Boosting Approach
NIPS 2012
Relaxed Clipping: A Global Training Method for Robust Regression and Classification
NIPS 2010
Improved Natural Language Learning via Variance-Regularization Support Vector Machines
CONLL 2010
Convex Relaxation of Mixture Regression with Efficient Algorithms
NIPS 2009
A General Projection Property for Distribution Families
NIPS 2009
Semi-Supervised Convex Training for Dependency Parsing
ACL 2008
Convex Relaxations of Latent Variable Training
NIPS 2007
Stable Dual Dynamic Programming
NIPS 2007
Discriminative Batch Mode Active Learning
NIPS 2007
Improved Large Margin Dependency Parsing via Local Constraints and Laplacian Regularization
CONLL 2006
Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling
ACL 2006
Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
NIPS 2006
implicit Online Learning with Kernels
NIPS 2006
Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling
COLING 2006
Language and Task Independent Text Categorization with Simple Language Models
NAACL 2003
Language Independent Authorship Attribution with Character Level N-Grams
EACL 2003
Investigating the Relationship between Word Segmentation Performance and Retrieval Performance in Chinese IR
COLING 2002