Rui Shao
23 papers · 2019–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+11 more ↓ Show less ↑
π Academic Marathon (6) π Interdisciplinary Bridge π§ Keyword Pioneer π Conference Polyglot (9) π£ Hot Topic Early Bird
πΊοΈ
Taxonomy Completionist
(51)
π
Conference Polyglot
(9)
π
Academic Marathon
(6)
π€
Dynamic Duo
(12)
π
Grand Slam
π§¬
Topic Evolution
π
Century Club
(20)
π
Trend Setter
ποΈ
Keyword Collector
(113)
β‘
Prolific Year
(10)
π₯
Unstoppable
(5)
Conferences
CVPR (6)
AAAI (3)
ECCV (3)
ICCV (3)
ACL (2)
ICML (2)
NIPS (2)
ICLR (1)
IJCAI (1)
Top co-authors
Keywords
multimodal large language model
(7)
multimodal learning
(4)
large language model
(4)
vision-language model
(3)
deep learning
(2)
gui agent
(2)
face recognition
(2)
video understanding
(2)
domain generalization
(2)
face anti-spoofing
(2)
agent system
(2)
biometric security
(2)
in-context learning
(1)
knowledge distillation
(1)
feature learning
(1)
attention mechanism
(1)
video prediction
(1)
multi-task learning
(1)
contrastive learning
(1)
hierarchical planning
(1)
Papers
H-GAR: A Hierarchical Interaction Framework via Goal-Driven Observation-Action Refinement for Robotic Manipulation
AAAI 2026
SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation
AAAI 2026
PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records
ACL 2026
SPA-BENCH: A COMPREHENSIVE BENCHMARK FOR SMARTPHONE AGENT EVALUATION
ICLR 2025
LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
CVPR 2025
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
CVPR 2025
Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation
CVPR 2025
Incorporating Legal Logic into Deep Learning: An Intelligent Approach to Probation Prediction
IJCAI 2025
STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization
ICML 2025
Less is More: Empowering GUI Agent with Context-Aware Simplification
ICCV 2025
GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent
ACL 2025
Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation
ICCV 2025
FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
ICCV 2025
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
NIPS 2024
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks
NIPS 2024
LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
CVPR 2024
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
ECCV 2024
RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models
ICML 2024
Detecting and Grounding Multi-Modal Media Manipulation
CVPR 2023
Detecting and Recovering Sequential DeepFake Manipulation
ECCV 2022
Open-set Adversarial Defense
ECCV 2020
Regularized Fine-Grained Meta Face Anti-Spoofing
AAAI 2020
Multi-Adversarial Discriminative Deep Domain Generalization for Face Presentation Attack Detection
CVPR 2019