conftrace_

Zhiheng Xi

43 papers · 2022–2026 · 7 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+10 more ↓

🐝 Cross-Pollinator (14) 🌍 Conference Polyglot (6) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌈 Renaissance Researcher (7)

🌈 Renaissance Researcher (7) 🌍 Conference Polyglot (6) 👥 Mega-Team (27) 🤝 Dynamic Duo (31) 🔬 Deep Specialist (11) 💎 Century Club (33) ❓ The Questioner (2) ⚡ Prolific Year (11) 🗃️ Keyword Collector (155) 🔥 Unstoppable (5)

Conferences

ACL (18) EMNLP (13) AAAI (3) COLING (3) ICLR (3) NAACL (2) ICML (1)

Top co-authors

Qi Zhang (40) Tao Gui (40) Xuanjing Huang (38) Rui Zheng (20) Shihan Dou (13) Yuhao Zhou (12) Xiaoran Fan (11) Ming Zhang (8) Senjie Jin (8) Junjie Ye (8)

Keywords

large language model (18) reinforcement learning (10) language model (5) reinforcement learning from human feedback (4) mathematical reasoning (3) question answering (3) language model alignment (3) benchmark evaluation (2) policy optimization (2) reward model (2) few-shot learning (2) transfer learning (2) reward modeling (2) adversarial defense (2) instruction tuning (2) code generation (2) visual reasoning (2) knowledge distillation (2) model compression (2) adversarial training (2)

Papers

AgentGym2: Benchmarking Large Language Model Agents in De-Idealized Real-World Environments ACL 2026 Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training ACL 2026 VRPO: Rethinking Value Modeling for Robust RL under Noisy Supervision in LLM Post-Training ACL 2026 Counteracting the Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing ACL 2026 LLMEval-Fair: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models ACL 2026 MetaAct-RL: Training Language Models for Reasoning Through Meta-Action-Based Reinforcement Learning AAAI 2026 Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination AAAI 2026 What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study AAAI 2026 Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment ACL 2026 Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization ACL 2026 LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation EMNLP 2025 ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use ACL 2025 CritiQ: Mining Data Quality Criteria from Human Preferences ACL 2025 AgentGym: Evaluating and Training Large Language Model-based Agents across Diverse Environments ACL 2025 Multi-Programming Language Sandbox for LLMs ACL 2025 PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts ACL 2025 Better Process Supervision with Bi-directional Rewarding Signals ACL 2025 Are LLMs Rational Investors? A Study on the Financial Bias in LLMs ACL 2025 Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning EMNLP 2025 LoRACoE: Improving Large Language Model via Composition-based LoRA Expert EMNLP 2025 Toward Optimal LLM Alignments Using Two-Player Games EMNLP 2025 TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use EMNLP 2025 Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations EMNLP 2025 Distill Visual Chart Reasoning Ability from LLMs to MLLMs EMNLP 2025 Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs ICLR 2025 RMB: Comprehensively benchmarking reward models in LLM alignment ICLR 2025 Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling NAACL 2025 Improving Generalization of Alignment with Human Preferences through Group Invariant Learning ICLR 2024 StepCoder: Improving Code Generation with Reinforcement Learning from Compiler Feedback ACL 2024 LoRAMoE: Alleviating World Knowledge Forgetting in Large Language Models via MoE-Style Plugin ACL 2024 Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning ICML 2024 Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models NAACL 2024 Reward Modeling Requires Automatic Adjustment Based on Data Quality EMNLP 2024 Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data EMNLP 2024 Subspace Defense: Discarding Adversarial Perturbations by Learning a Subspace for Clean Signals COLING 2024 RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions COLING 2024 ORTicket: Let One Robust BERT Ticket Transfer across Different Tasks COLING 2024 Improving Discriminative Capability of Reward Models in RLHF Using Contrastive Learning EMNLP 2024 Connectivity Patterns are Task Embeddings ACL 2023 RealBehavior: A Framework for Faithfully Characterizing Foundation Models’ Human-like Behavior Mechanisms EMNLP 2023 Characterizing the Impacts of Instances on Robustness ACL 2023 Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement EMNLP 2023 Efficient Adversarial Training with Robust Early-Bird Tickets EMNLP 2022