conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Keywords
reinforcement learning
4352 papers
Explore in graph
Also known as
RL
REINFORCE
Co-occurring keywords
large language model
(13587)
policy learning
(702)
markov decision process
(790)
policy optimization
(657)
policy gradient
(520)
deep reinforcement learning
(903)
multi-agent system
(1819)
imitation learning
(744)
regret bound
(1926)
language model
(4599)
Papers
The Impact of Language Mixing on Bilingual LLM Reasoning
EMNLP 2025
Learning with Linear Function Approximations in Mean-Field Control
JMLR 2025
Reinforcement Active Client Selection for Federated Heterogeneous Graph Learning
AAAI 2025
Score-Aware Policy-Gradient and Performance Guarantees using Local Lyapunov Stability
JMLR 2025
DynaQuest: A Dynamic Question Answering Dataset Reflecting Real-World Knowledge Updates
ACL 2025
Statistical field theory for Markov decision processes under uncertainty
JMLR 2025
ControlMed: Adding Reasoning Control to Medical Language Model
IJCNLP 2025
VerIF: Verification Engineering for Reinforcement Learning in Instruction Following
EMNLP 2025
Compound AI Systems Optimization: A Survey of Methods, Challenges, and Future Directions
EMNLP 2025
Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization
IJCNLP 2025
Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning
EMNLP 2025
Reinforcement Learning for Infinite-Dimensional Systems
JMLR 2025
CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards
EMNLP 2025
Client Selection for Federated Policy Optimization with Environment Heterogeneity
JMLR 2025
KERLQA: Knowledge-Enhanced Reinforcement Learning for Question Answering in Low-resource Languages
IJCNLP 2025
BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving
ACL 2025
QA‐LIGN: Aligning LLMs through Constitutionally Decomposed QA
EMNLP 2025
Beyond Correctness: Confidence-Aware Reward Modeling for Enhancing Large Language Model Reasoning
EMNLP 2025
Exploring Chain-of-Thought Reasoning for Steerable Pluralistic Alignment
EMNLP 2025
GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning
EMNLP 2025
Structured Document Translation via Format Reinforcement Learning
IJCNLP 2025
LegalSim: Multi-Agent Simulation of Legal Systems for Discovering Procedural Exploits
EMNLP 2025
s3: You Don’t Need That Much Data to Train a Search Agent via RL
EMNLP 2025
PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving
EMNLP 2025
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
IJCNLP 2025
<
1
…
32
33
34
…
175
>