Sipeng Zheng
14 papers · 2022–2025 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+4 more ↓ Show less ↑
π Cross-Pollinator (15) π Interdisciplinary Bridge πΊοΈ Taxonomy Completionist (27) π§ Keyword Pioneer π Conference Polyglot (8)
π
Renaissance Researcher
(6)
β‘
Prolific Year
(7)
π
Century Club
(14)
β
The Questioner
(2)
Conferences
ICCV (3)
ICLR (3)
CVPR (2)
ECCV (2)
AAAI (1)
EMNLP (1)
ICML (1)
NAACL (1)
Top co-authors
Keywords
multimodal large language model
(2)
large language model
(2)
vision-language model
(2)
video understanding
(2)
object detection
(1)
chain-of-thought reasoning
(1)
zero-shot learning
(1)
cross-modal learning
(1)
curriculum learning
(1)
multimodal learning
(1)
video retrieval
(1)
human-object interaction
(1)
visual grounding
(1)
object tracking
(1)
human motion synthesis
(1)
semantic representation
(1)
motion generation
(1)
audio processing
(1)
vision language model
(1)
language modeling
(1)
Papers
MotionCtrl: A Real-time Controllable Vision-Language-Motion Model
ICCV 2025
VideoOrion: Tokenizing Object Dynamics in Videos
ICCV 2025
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
ICLR 2025
Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions?
ICLR 2025
Scaling Large Motion Models with Million-Level Human Motions
ICML 2025
Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning
EMNLP 2025
Unified Multimodal Understanding via Byte-Pair Visual Encoding
ICCV 2025
LLaMA-Rider: Spurring Large Language Models to Explore the Open World
NAACL 2024
UniCode : Learning a Unified Codebook for Multimodal Large Language Models
ECCV 2024
Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
ICLR 2024
Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework
CVPR 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
AAAI 2023
Few-Shot Action Recognition with Hierarchical Matching and Contrastive Learning
ECCV 2022
VRDFormer: End-to-End Video Visual Relation Detection With Transformers
CVPR 2022