Ziyu Guo
23 papers · 2022–2025 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+8 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (44) π Renaissance Researcher (7) π Conference Polyglot (10) π Interdisciplinary Bridge π§ Keyword Pioneer
π§
Keyword Pioneer
π£
Hot Topic Early Bird
π
Grand Slam
π€
Dynamic Duo
(18)
β
The Questioner
β‘
Prolific Year
(10)
π
Century Club
(23)
ποΈ
Keyword Collector
(104)
Conferences
AAAI (5)
CVPR (4)
ICCV (4)
ICLR (3)
IJCAI (2)
ACL (1)
ECCV (1)
ICML (1)
NIPS (1)
WACV (1)
Top co-authors
Keywords
point cloud
(5)
zero-shot learning
(4)
multimodal learning
(4)
contrastive learning
(3)
multi-modal learning
(3)
few-shot learning
(3)
3d vision
(3)
large language model
(3)
visual reasoning
(2)
object detection
(2)
motion generation
(2)
masked autoencoder
(2)
autonomous driving
(2)
chain-of-thought reasoning
(2)
question answering
(1)
attention mechanism
(1)
self-supervised learning
(1)
direct preference optimization
(1)
transfer learning
(1)
depth estimation
(1)
Papers
Let's Verify and Reinforce Image Generation Step by Step
CVPR 2025
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
ICML 2025
MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding
AAAI 2025
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
AAAI 2025
Less is More: Improving Motion Diffusion Models with Sparse Keyframes
ICCV 2025
StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion
ICCV 2025
SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems
ACL 2025
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine
ICLR 2025
MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines
ICLR 2025
EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights
CVPR 2025
No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation
CVPR 2024
X-former Elucidator: Reviving Efficient Attention for Long Context Language Modeling
IJCAI 2024
Personalize Segment Anything Model with One Shot
ICLR 2024
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
ECCV 2024
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
AAAI 2024
Spatio-Temporal Pivotal Graph Neural Networks for Traffic Flow Forecasting
AAAI 2024
Nearest Neighbors Meet Deep Neural Networks for Point Cloud Analysis
WACV 2023
CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention
AAAI 2023
MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
ICCV 2023
PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning
ICCV 2023
Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training
IJCAI 2023
PointCLIP: Point Cloud Understanding by CLIP
CVPR 2022
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
NIPS 2022