conftrace_

Aviral Kumar

58 papers · 2018–2025 · 6 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓
+13 more ↓ 🧭 Keyword Pioneer 🌍 Conference Polyglot (6) πŸ—ΊοΈ Taxonomy Completionist (12) πŸŒ‰ Interdisciplinary Bridge πŸƒ Academic Marathon (7)
πŸƒ Academic Marathon (7) 🐝 Cross-Pollinator (8) 🌈 Renaissance Researcher (7) πŸ”¬ Deep Specialist (15) 🀝 Dynamic Duo (45) πŸ‘₯ Mega-Team (25) πŸ† Keyword Champion (2) πŸ‘‘ Triple Crown πŸ—ƒοΈ Keyword Collector (131) ❓ The Questioner (3) ⚑ Prolific Year (5) πŸ”₯ Unstoppable (8) πŸ’Ž Century Club (58)

Conferences

ICLR (19) NIPS (19) ICML (12) CORL (6) NAACL (1) RSS (1)

Papers

Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Parameters for Reasoning ICLR 2025 What Do Learning Dynamics Reveal About Generalization in LLM Mathematical Reasoning? ICML 2025 RRM: Robust Reward Model Training Mitigates Reward Hacking ICLR 2025 Training Language Models to Self-Correct via Reinforcement Learning ICLR 2025 Unfamiliar Finetuning Examples Control How Language Models Hallucinate NAACL 2025 Scaling Test-Time Compute Without Verification or RL is Suboptimal ICML 2025 Value-Based Deep RL Scales Predictably ICML 2025 Generative Verifiers: Reward Modeling as Next-Token Prediction ICLR 2025 Digi-Q: Learning VLM Q-Value Functions for Training Device-Control Agents ICLR 2025 Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models ICLR 2025 Optimizing Test-Time Compute via Meta Reinforcement Finetuning ICML 2025 Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning ICLR 2025 Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data ICLR 2025 Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance CORL 2024 Recursive Introspection: Teaching Language Model Agents How to Self-Improve NIPS 2024 Is Value Learning Really the Main Bottleneck in Offline RL? NIPS 2024 Zero-Shot Robotic Manipulation with Pre-Trained Image-Editing Diffusion Models ICLR 2024 Designing Cell-Type-Specific Promoter Sequences Using Conservative Model-Based Optimization NIPS 2024 ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL ICML 2024 Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data ICML 2024 Stop Regressing: Training Value Functions via Classification for Scalable Deep RL ICML 2024 DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning NIPS 2024 RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold NIPS 2024 Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning CORL 2023 Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets NIPS 2023 ReDS: Offline RL With Heteroskedastic Datasets via Support Constraints NIPS 2023 Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning NIPS 2023 Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions CORL 2023 Efficient Deep Reinforcement Learning Requires Regulating Overfitting ICLR 2023 Offline Q-learning on Diverse Multi-Task Data Both Scales And Generalizes ICLR 2023 Confidence-Conditioned Value Functions for Offline Reinforcement Learning ICLR 2023 Pre-Training for Robots: Offline RL Enables Learning New Tasks in a Handful of Trials RSS 2023 DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization ICLR 2022 DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning NIPS 2022 Data-Driven Offline Decision-Making via Invariant Representation Learning NIPS 2022 Data-Driven Offline Optimization for Architecting Hardware Accelerators ICLR 2022 Don’t Start From Scratch: Leveraging Prior Data to Automate Robotic Reinforcement Learning CORL 2022 Should I Run Offline Reinforcement Learning or Behavioral Cloning? ICLR 2022 Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization ICML 2022 How to Leverage Unlabeled Data in Offline Reinforcement Learning ICML 2022 Conservative Objective Models for Effective Offline Model-Based Optimization ICML 2021 A Workflow for Offline Model-Free Robotic Reinforcement Learning CORL 2021 Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability NIPS 2021 Conservative Data Sharing for Multi-Task Offline Reinforcement Learning NIPS 2021 Conservative Safety Critics for Exploration ICLR 2021 Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning ICLR 2021 Benchmarks for Deep Off-Policy Evaluation ICLR 2021 COMBO: Conservative Offline Model-Based Policy Optimization NIPS 2021 OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning ICLR 2021 Model Inversion Networks for Model-Based Optimization NIPS 2020 Conservative Q-Learning for Offline Reinforcement Learning NIPS 2020 Chaining Behaviors from Data with Model-Free Reinforcement Learning CORL 2020 DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction NIPS 2020 One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL NIPS 2020 Diagnosing Bottlenecks in Deep Q-learning Algorithms ICML 2019 Graph Normalizing Flows NIPS 2019 Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction NIPS 2019 Trainable Calibration Measures for Neural Networks from Kernel Mean Embeddings ICML 2018