Ziyu Guo

23 papers · 2022–2025 · 10 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🗺️ Taxonomy Completionist (44) 🌈 Renaissance Researcher (7) 🌍 Conference Polyglot (10) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏆 Grand Slam 🤝 Dynamic Duo (18) ❓ The Questioner ⚡ Prolific Year (10) 💎 Century Club (23) 🗃️ Keyword Collector (104)

Conferences

AAAI (5) CVPR (4) ICCV (4) ICLR (3) IJCAI (2) ACL (1) ECCV (1) ICML (1) NIPS (1) WACV (1)

Top co-authors

Renrui Zhang (18) peng gao (13) hongsheng Li (10) Dongzhi Jiang (5) Yu Qiao (5) Pheng-Ann Heng (4) Jiaming Liu (4) Hao Dong (3) Xupeng Miao (3) Bin CUI (3)

Keywords

point cloud (5) zero-shot learning (4) multimodal learning (4) contrastive learning (3) multi-modal learning (3) few-shot learning (3) 3d vision (3) large language model (3) visual reasoning (2) object detection (2) motion generation (2) masked autoencoder (2) autonomous driving (2) chain-of-thought reasoning (2) question answering (1) attention mechanism (1) self-supervised learning (1) direct preference optimization (1) transfer learning (1) depth estimation (1)

Papers

Let's Verify and Reinforce Image Generation Step by Step CVPR 2025 MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency ICML 2025 MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding AAAI 2025 LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding AAAI 2025 Less is More: Improving Motion Diffusion Models with Sparse Keyframes ICCV 2025 StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion ICCV 2025 SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems ACL 2025 MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine ICLR 2025 MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines ICLR 2025 EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights CVPR 2025 No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation CVPR 2024 X-former Elucidator: Reviving Efficient Attention for Long Context Language Modeling IJCAI 2024 Personalize Segment Anything Model with One Shot ICLR 2024 MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? ECCV 2024 Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation AAAI 2024 Spatio-Temporal Pivotal Graph Neural Networks for Traffic Flow Forecasting AAAI 2024 Nearest Neighbors Meet Deep Neural Networks for Point Cloud Analysis WACV 2023 CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention AAAI 2023 MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection ICCV 2023 PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning ICCV 2023 Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training IJCAI 2023 PointCLIP: Point Cloud Understanding by CLIP CVPR 2022 Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training NIPS 2022