Boyi Li

28 papers · 2017–2026 · 10 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🏃 Academic Marathon (8) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (9) 🐝 Cross-Pollinator (7)

🌍 Conference Polyglot (9) 🏃 Academic Marathon (8) 🤝 Dynamic Duo (13) 🧬 Topic Evolution 💎 Century Club (27) 🗃️ Keyword Collector (71) ⚡ Prolific Year (10) 🔥 Unstoppable (5)

Conferences

ICLR (9) CVPR (4) ICCV (4) CORL (3) NIPS (3) AAAI (1) ACL (1) ECCV (1) EMNLP (1) NAACL (1)

Top co-authors

Marco Pavone (13) Boris Ivanovic (10) Yue Wang (8) Trevor Darrell (7) Yan Wang (5) Yurong You (4) Xinshuo Weng (4) Serge Belongie (4) Long Lian (3) Kilian Q. Weinberger (3)

Keywords

vision-language model (5) large language model (3) multimodal learning (2) autonomous driving (2) gender bia (2) image captioning (2) autonomous vehicle (2) image generation (2) semi-supervised learning (1) data augmentation (1) benchmark evaluation (1) motion planning (1) video captioning (1) explanation generation (1) domain adaptation (1) zero-shot learning (1) visual reasoning (1) image editing (1) text-to-image generation (1) multi-modal learning (1)

Papers

DyC-STG: Dynamic Causal Spatio-Temporal Graph Network for Real-time Data Credibility Analysis in IoT AAAI 2026 Extrapolated Urban View Synthesis Benchmark ICCV 2025 The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio CORL 2025 LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences ACL 2025 Scaling Vision Pre-Training to 4K Resolution CVPR 2025 Describe Anything: Detailed Localized Image and Video Captioning ICCV 2025 Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation ICCV 2025 LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation models ICLR 2025 Language-Image Models with 3D Understanding ICLR 2025 STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes ICLR 2025 PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding ICLR 2025 LLM-grounded Video Diffusion Models ICLR 2024 Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving CORL 2024 Promptable Closed-loop Traffic Simulation CORL 2024 EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision ICLR 2024 Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition ICLR 2024 Self-correcting LLM-controlled Diffusion Models CVPR 2024 Driving Everywhere with Large Language Model Policy Adaptation CVPR 2024 See and Think: Embodied Agent in Virtual Environment ECCV 2024 Re-evaluating the Need for Visual Signals in Unsupervised Grammar Induction NAACL 2024 DiffuBox: Refining 3D Object Detection with Point Diffusion NIPS 2024 Geometry-Informed Neural Operator for Large-Scale 3D PDEs NIPS 2023 From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation EMNLP 2023 Fixed Neural Network Steganography: Train the images, not the network ICLR 2022 Language-driven Semantic Segmentation ICLR 2022 On Feature Normalization and Data Augmentation CVPR 2021 Positional Normalization NIPS 2019 AOD-Net: All-In-One Dehazing Network ICCV 2017