Fengyun Rao
16 papers · 2022–2026 · 6 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+7 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (43) π Conference Polyglot (6) π Renaissance Researcher (7) π Interdisciplinary Bridge π§ Keyword Pioneer
π
Interdisciplinary Bridge
π
Conference Polyglot
(6)
π¬
Deep Specialist
(10)
π
Trend Setter
π
Century Club
(15)
β‘
Prolific Year
(8)
ποΈ
Keyword Collector
(91)
Conferences
CVPR (6)
ICCV (4)
AAAI (3)
ECCV (1)
ICLR (1)
NIPS (1)
Top co-authors
Keywords
multimodal learning
(4)
multimodal large language model
(4)
video understanding
(3)
diffusion model
(3)
reinforcement learning
(2)
vision-language model
(2)
multi-modal learning
(2)
visual perception
(2)
multimodal reasoning
(2)
probabilistic modeling
(1)
data augmentation
(1)
preference alignment
(1)
temporal reasoning
(1)
question answering
(1)
image generation
(1)
visual question answering
(1)
video generation
(1)
action recognition
(1)
supervised learning
(1)
image captioning
(1)
Papers
MMhops-R1: Multimodal Multi-hop Reasoning
AAAI 2026
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
CVPR 2025
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
CVPR 2025
Number it: Temporal Grounding Videos like Flipping Manga
CVPR 2025
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
ICCV 2025
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
ICCV 2025
HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models
ICCV 2025
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
ICCV 2025
PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training
ICLR 2025
Inter-X: Towards Versatile Human-Human Interaction Analysis
CVPR 2024
Image Captioning with Multi-Context Synthetic Data
AAAI 2024
Visual Perception by Large Language Modelβs Weights
NIPS 2024
ReGenNet: Towards Human Action-Reaction Synthesis
CVPR 2024
Spatial-Semantic Collaborative Cropping for User Generated Content
AAAI 2024
Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation
CVPR 2022
CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation
ECCV 2022