Aviral Kumar
58 papers · 2018–2025 · 6 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π§ Keyword Pioneer π Conference Polyglot (6) πΊοΈ Taxonomy Completionist (12) π Interdisciplinary Bridge π Academic Marathon (7)
π
Academic Marathon
(7)
π
Cross-Pollinator
(8)
π
Renaissance Researcher
(7)
π¬
Deep Specialist
(15)
π€
Dynamic Duo
(45)
π₯
Mega-Team
(25)
π
Keyword Champion
(2)
π
Triple Crown
ποΈ
Keyword Collector
(131)
β
The Questioner
(3)
β‘
Prolific Year
(5)
π₯
Unstoppable
(8)
π
Century Club
(58)
Conferences
ICLR (19)
NIPS (19)
ICML (12)
CORL (6)
NAACL (1)
RSS (1)
Top co-authors
Research topics
Keywords
offline reinforcement learning
(16)
reinforcement learning
(7)
model-based optimization
(5)
distributional shift
(4)
conservative q-learning
(4)
value function
(4)
policy learning
(4)
robot manipulation
(3)
robotic manipulation
(3)
large language model
(3)
distribution shift
(3)
imitation learning
(2)
multi-task learning
(2)
offline optimization
(2)
model-free learning
(2)
deep reinforcement learning
(2)
few-shot learning
(2)
dynamic programming
(2)
black-box optimization
(2)
continuous control
(2)
Papers
Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Parameters for Reasoning
ICLR 2025
What Do Learning Dynamics Reveal About Generalization in LLM Mathematical Reasoning?
ICML 2025
RRM: Robust Reward Model Training Mitigates Reward Hacking
ICLR 2025
Training Language Models to Self-Correct via Reinforcement Learning
ICLR 2025
Unfamiliar Finetuning Examples Control How Language Models Hallucinate
NAACL 2025
Scaling Test-Time Compute Without Verification or RL is Suboptimal
ICML 2025
Value-Based Deep RL Scales Predictably
ICML 2025
Generative Verifiers: Reward Modeling as Next-Token Prediction
ICLR 2025
Digi-Q: Learning VLM Q-Value Functions for Training Device-Control Agents
ICLR 2025
Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models
ICLR 2025
Optimizing Test-Time Compute via Meta Reinforcement Finetuning
ICML 2025
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
ICLR 2025
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data
ICLR 2025
Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance
CORL 2024
Recursive Introspection: Teaching Language Model Agents How to Self-Improve
NIPS 2024
Is Value Learning Really the Main Bottleneck in Offline RL?
NIPS 2024
Zero-Shot Robotic Manipulation with Pre-Trained Image-Editing Diffusion Models
ICLR 2024
Designing Cell-Type-Specific Promoter Sequences Using Conservative Model-Based Optimization
NIPS 2024
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
ICML 2024
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
ICML 2024
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
ICML 2024
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
NIPS 2024
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
NIPS 2024
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning
CORL 2023
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets
NIPS 2023
ReDS: Offline RL With Heteroskedastic Datasets via Support Constraints
NIPS 2023
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
NIPS 2023
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
CORL 2023
Efficient Deep Reinforcement Learning Requires Regulating Overfitting
ICLR 2023
Offline Q-learning on Diverse Multi-Task Data Both Scales And Generalizes
ICLR 2023
Confidence-Conditioned Value Functions for Offline Reinforcement Learning
ICLR 2023
Pre-Training for Robots: Offline RL Enables Learning New Tasks in a Handful of Trials
RSS 2023
DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization
ICLR 2022
DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning
NIPS 2022
Data-Driven Offline Decision-Making via Invariant Representation Learning
NIPS 2022
Data-Driven Offline Optimization for Architecting Hardware Accelerators
ICLR 2022
Donβt Start From Scratch: Leveraging Prior Data to Automate Robotic Reinforcement Learning
CORL 2022
Should I Run Offline Reinforcement Learning or Behavioral Cloning?
ICLR 2022
Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization
ICML 2022
How to Leverage Unlabeled Data in Offline Reinforcement Learning
ICML 2022
Conservative Objective Models for Effective Offline Model-Based Optimization
ICML 2021
A Workflow for Offline Model-Free Robotic Reinforcement Learning
CORL 2021
Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability
NIPS 2021
Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
NIPS 2021
Conservative Safety Critics for Exploration
ICLR 2021
Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning
ICLR 2021
Benchmarks for Deep Off-Policy Evaluation
ICLR 2021
COMBO: Conservative Offline Model-Based Policy Optimization
NIPS 2021
OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning
ICLR 2021
Model Inversion Networks for Model-Based Optimization
NIPS 2020
Conservative Q-Learning for Offline Reinforcement Learning
NIPS 2020
Chaining Behaviors from Data with Model-Free Reinforcement Learning
CORL 2020
DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction
NIPS 2020
One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL
NIPS 2020
Diagnosing Bottlenecks in Deep Q-learning Algorithms
ICML 2019
Graph Normalizing Flows
NIPS 2019
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
NIPS 2019
Trainable Calibration Measures for Neural Networks from Kernel Mean Embeddings
ICML 2018