conftrace_

reinforcement learning

4352 papers

Explore in graph

Also known as

RL REINFORCE

Co-occurring keywords

large language model (13587) policy learning (702) markov decision process (790) policy optimization (657) policy gradient (520) deep reinforcement learning (903) multi-agent system (1819) imitation learning (744) regret bound (1926) language model (4599)

Papers

The Impact of Language Mixing on Bilingual LLM Reasoning EMNLP 2025

Learning with Linear Function Approximations in Mean-Field Control JMLR 2025

Reinforcement Active Client Selection for Federated Heterogeneous Graph Learning AAAI 2025

Score-Aware Policy-Gradient and Performance Guarantees using Local Lyapunov Stability JMLR 2025

DynaQuest: A Dynamic Question Answering Dataset Reflecting Real-World Knowledge Updates ACL 2025

Statistical field theory for Markov decision processes under uncertainty JMLR 2025

ControlMed: Adding Reasoning Control to Medical Language Model IJCNLP 2025

VerIF: Verification Engineering for Reinforcement Learning in Instruction Following EMNLP 2025

Compound AI Systems Optimization: A Survey of Methods, Challenges, and Future Directions EMNLP 2025

Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization IJCNLP 2025

Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning EMNLP 2025

Reinforcement Learning for Infinite-Dimensional Systems JMLR 2025

CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards EMNLP 2025

Client Selection for Federated Policy Optimization with Environment Heterogeneity JMLR 2025

KERLQA: Knowledge-Enhanced Reinforcement Learning for Question Answering in Low-resource Languages IJCNLP 2025

BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving ACL 2025

QA‐LIGN: Aligning LLMs through Constitutionally Decomposed QA EMNLP 2025

Beyond Correctness: Confidence-Aware Reward Modeling for Enhancing Large Language Model Reasoning EMNLP 2025

Exploring Chain-of-Thought Reasoning for Steerable Pluralistic Alignment EMNLP 2025

GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning EMNLP 2025

Structured Document Translation via Format Reinforcement Learning IJCNLP 2025

LegalSim: Multi-Agent Simulation of Legal Systems for Discovering Procedural Exploits EMNLP 2025

s3: You Don’t Need That Much Data to Train a Search Agent via RL EMNLP 2025

PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving EMNLP 2025

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs IJCNLP 2025