Reinforcement Learning › Methods ›

Policy Learning

2068 directly classified papers

Papers per year

Papers

Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards EMNLP 2024

Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics RSS 2024

Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data EMNLP 2024

Mitigating Open-Vocabulary Caption Hallucinations EMNLP 2024

Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code EMNLP 2024

A Fairness-Driven Method for Learning Human-Compatible Negotiation Strategies EMNLP 2024

Reward Modeling Requires Automatic Adjustment Based on Data Quality EMNLP 2024

E2CL: Exploration-based Error Correction Learning for Embodied Agents EMNLP 2024

Exploiting Careful Design of SVM Solution for Aspect-term Sentiment Analysis EMNLP 2024

POLICEd RL: Learning Closed-Loop Robot Control Policies with Provable Satisfaction of Hard Constraints RSS 2024

Rating-Based Reinforcement Learning AAAI 2024

Fast two-time-scale stochastic gradient method with applications in reinforcement learning COLT 2024

TRIP NEGOTIATOR: A Travel Persona-aware Reinforced Dialogue Generation Model for Personalized Integrative Negotiation in Tourism EMNLP 2024

Towards Achieving Sub-linear Regret and Hard Constraint Violation in Model-free RL AISTATS 2024

On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization EMNLP 2024

Positivity-free Policy Learning with Observational Data AISTATS 2024

Enhancing Alignment using Curriculum Learning & Ranked Preferences EMNLP 2024

Improved High-Probability Bounds for the Temporal Difference Learning Algorithm via Exponential Stability COLT 2024

Any-point Trajectory Modeling for Policy Learning RSS 2024

Do no harm: A counterfactual approach to safe reinforcement learning L4DC 2024

Solving Long-run Average Reward Robust MDPs via Stochastic Games IJCAI 2024

Meta-learning linear quadratic regulators: a policy gradient MAML approach for model-free LQR L4DC 2024

Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment EMNLP 2024

Dynamic Multi-Reward Weighting for Multi-Style Controllable Generation EMNLP 2024

Minimax-optimal reward-agnostic exploration in reinforcement learning COLT 2024