Zirui Wang

37 papers · 2019–2026 · 15 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (6) 🌍 Conference Polyglot (14) 🌈 Renaissance Researcher (7) 🗺️ Taxonomy Completionist (55)

🗺️ Taxonomy Completionist (55) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏆 Grand Slam 👑 Triple Crown 🧬 Topic Evolution 👥 Mega-Team (29) 💎 Century Club (35) 📈 Trend Setter 🔥 Unstoppable (7) 🗃️ Keyword Collector (130) ⚡ Prolific Year (11) 🚀 Conference Pioneer

Conferences

ICLR (7) CVPR (6) ECCV (3) ICCV (3) NAACL (3) NIPS (3) AAAI (2) EMNLP (2) RSS (2) ACL (1) CORL (1) ICML (1) IJCAI (1) JMLR (1) WACV (1)

Top co-authors

Victor Adrian Prisacariu (5) Jiangmiao Pang (4) Ruoming Pang (3) Yue Deng (3) Bowen Zhang (3) Yulia Tsvetkov (3) Shuai Chen (3) Haotian Zhang (3) Zhuowen Tu (3) Yinfei Yang (3)

Research topics

Techniques (1)

Keywords

domain adaptation (3) large language model (3) knowledge distillation (2) multimodal large language model (2) multi-agent reinforcement learning (2) image captioning (2) text-to-image generation (2) multimodal learning (2) world model (2) benchmark evaluation (2) 3d reconstruction (2) tool use (2) camera pose estimation (2) policy learning (2) medical imaging (2) image restoration (1) reinforcement learning (1) transfer learning (1) sim-to-real transfer (1) deep reinforcement learning (1)

Papers

Disentangling for Transfer: Boosting Limited Modalities via Information-Theoretic Regularization and Cross-Modal Reconstruction AAAI 2026 AEGIS: A Holistic Benchmark for Evaluating Forensic Analysis of AI-Generated Academic Images ACL 2026 YOLO-Count: Differentiable Object Counting for Text-to-Image Generation ICCV 2025 DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution CVPR 2025 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning ICLR 2025 GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting ICLR 2025 ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities NAACL 2025 DSQG-Syn: Synthesizing High-quality Data for Text-to-SQL Parsing by Domain Specific Question Generation NAACL 2025 MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains NAACL 2025 Learning Humanoid Standing-up Control across Diverse Postures RSS 2025 BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds RSS 2025 Parallelizing Model-based Reinforcement Learning Over the Sequence Length NIPS 2024 CrossScore: A Multi-View Approach to Image Evaluation and Scoring ECCV 2024 "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training" ECCV 2024 Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response ICLR 2024 Ferret: Refer and Ground Anything Anywhere at Any Granularity ICLR 2024 CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs NIPS 2024 Language Models as Science Tutors ICML 2024 Improving Multi-agent Reinforcement Learning with Stable Prefix Policy IJCAI 2024 Learning H-Infinity Locomotion Control CORL 2024 TokenCompose: Text-to-Image Diffusion with Token-level Supervision CVPR 2024 Neural Refinement for Absolute Pose Regression with Feature Synthesis CVPR 2024 Boosting Multi-agent Reinforcement Learning via Contextual Prompting JMLR 2023 Guiding Image Captioning Models Toward More Specific Captions ICCV 2023 On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning ICLR 2023 Language Models Meet World Models: Embodied Experiences Enhance Language Models NIPS 2023 NoPe-NeRF: Optimising Neural Radiance Field With No Pose Prior CVPR 2023 REVEAL: Retrieval-Augmented Visual-Language Pre-Training With Multi-Source Multimodal Knowledge Memory CVPR 2023 DFNet: Enhance Absolute Pose Regression with Direct Feature Matching ECCV 2022 HarmoFL: Harmonizing Local and Global Drifts in Federated Learning on Heterogeneous Medical Images AAAI 2022 SimVLM: Simple Visual Language Model Pretraining with Weak Supervision ICLR 2022 Temporal Cue Guided Video Highlight Detection With Low-Rank Audio-Visual Fusion ICCV 2021 Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models ICLR 2021 Efficient Meta Lifelong-Learning with Limited Memory EMNLP 2020 FlowNet3D++: Geometric Losses For Deep Scene Flow Estimation WACV 2020 On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment EMNLP 2020 Characterizing and Avoiding Negative Transfer CVPR 2019