Rui Shao

23 papers · 2019–2026 · 9 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🏃 Academic Marathon (6) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (9) 🐣 Hot Topic Early Bird

🗺️ Taxonomy Completionist (51) 🌍 Conference Polyglot (9) 🏃 Academic Marathon (6) 🤝 Dynamic Duo (12) 🏆 Grand Slam 🧬 Topic Evolution 💎 Century Club (20) 📈 Trend Setter 🗃️ Keyword Collector (113) ⚡ Prolific Year (10) 🔥 Unstoppable (5)

Conferences

CVPR (6) AAAI (3) ECCV (3) ICCV (3) ACL (2) ICML (2) NIPS (2) ICLR (1) IJCAI (1)

Top co-authors

Liqiang Nie (14) Gongwei Chen (9) Kaiwen Zhou (5) Xiang Deng (4) Yinchuan Li (4) Hao Li (3) Leyang Shen (3) Weili Guan (3) Jianye Hao (3) Pong C. Yuen (3)

Keywords

multimodal large language model (7) multimodal learning (4) large language model (4) vision-language model (3) deep learning (2) gui agent (2) face recognition (2) video understanding (2) domain generalization (2) face anti-spoofing (2) agent system (2) biometric security (2) in-context learning (1) knowledge distillation (1) feature learning (1) attention mechanism (1) video prediction (1) multi-task learning (1) contrastive learning (1) hierarchical planning (1)

Papers

H-GAR: A Hierarchical Interaction Framework via Goal-Driven Observation-Action Refinement for Robotic Manipulation AAAI 2026 SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation AAAI 2026 PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records ACL 2026 SPA-BENCH: A COMPREHENSIVE BENCHMARK FOR SMARTPHONE AGENT EVALUATION ICLR 2025 LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant CVPR 2025 Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy CVPR 2025 Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation CVPR 2025 Incorporating Legal Logic into Deep Learning: An Intelligent Approach to Probation Prediction IJCAI 2025 STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization ICML 2025 Less is More: Empowering GUI Agent with Context-Aware Simplification ICCV 2025 GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent ACL 2025 Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation ICCV 2025 FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers ICCV 2025 MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models NIPS 2024 Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks NIPS 2024 LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge CVPR 2024 CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios ECCV 2024 RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models ICML 2024 Detecting and Grounding Multi-Modal Media Manipulation CVPR 2023 Detecting and Recovering Sequential DeepFake Manipulation ECCV 2022 Open-set Adversarial Defense ECCV 2020 Regularized Fine-Grained Meta Face Anti-Spoofing AAAI 2020 Multi-Adversarial Discriminative Deep Domain Generalization for Face Presentation Attack Detection CVPR 2019