conftrace_

reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

Towards Sample Efficient Agents through Algorithmic Alignment (Student Abstract) AAAI 2021

Global Fusion Attention for Vision and Language Understanding (Student Abstract) AAAI 2021

Reward based Hebbian Learning in Direct Feedback Alignment (Student Abstract) AAAI 2021

Automatic Curriculum Learning With Over-repetition Penalty for Dialogue Policy Learning AAAI 2021

Joint Semantic Analysis with Document-Level Cross-Task Coherence Rewards AAAI 2021

Minimax Regret Optimisation for Robust Planning in Uncertain Markov Decision Processes AAAI 2021

Learning Task-Distribution Reward Shaping with Meta-Learning AAAI 2021

Self-correcting Q-learning AAAI 2021

Augmenting Policy Learning with Routines Discovered from a Single Demonstration AAAI 2021

The Sample Complexity of Teaching by Reinforcement on Q-Learning AAAI 2021

Learning with Generated Teammates to Achieve Type-Free Ad-Hoc Teamwork IJCAI 2021

Transferable Dialogue Systems and User Simulators ACL 2021

Mitigating Bias in Session-based Cyberbullying Detection: A Non-Compromising Approach ACL 2021

Automated Concatenation of Embeddings for Structured Prediction ACL 2021

Search from History and Reason for Future: Two-stage Reasoning on Temporal Knowledge Graphs ACL 2021

Exploring Dynamic Selection of Branch Expansion Orders for Code Generation ACL 2021

THDA: Treasure Hunt Data Augmentation for Semantic Navigation ICCV 2021

Self-Motivated Communication Agent for Real-World Vision-Dialog Navigation ICCV 2021

PatchMatch-RL: Deep MVS With Pixelwise Depth, Normal, and Visibility ICCV 2021

Reinforcement Learning for Abstractive Question Summarization with Question-aware Semantic Rewards ACL 2021

Efficient Text-based Reinforcement Learning by Jointly Leveraging State and Commonsense Graph Representations ACL 2021

LOA: Logical Optimal Actions for Text-based Interaction Games ACL 2021

Turn-Level User Satisfaction Estimation in E-commerce Customer Service ACL 2021

A Proposal: Interactively Learning to Summarise Timelines by Reinforcement Learning ACL 2021

RewardsOfSum: Exploring Reinforcement Learning Rewards for Summarisation ACL 2021