conftrace_

Sipeng Zheng

14 papers · 2022–2025 · 8 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+4 more ↓

🐝 Cross-Pollinator (15) 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (27) 🧭 Keyword Pioneer 🌍 Conference Polyglot (8)

🌈 Renaissance Researcher (6) ⚡ Prolific Year (7) 💎 Century Club (14) ❓ The Questioner (2)

Conferences

ICCV (3) ICLR (3) CVPR (2) ECCV (2) AAAI (1) EMNLP (1) ICML (1) NAACL (1)

Top co-authors

Zongqing Lu (9) Qin Jin (7) Yicheng Feng (6) Wanpeng Zhang (3) Jiazheng Liu (3) Yijiang Li (3) Ye Wang (3) Qianshan Wei (2) Bin Cao (2) Boshen Xu (2)

Keywords

multimodal large language model (2) large language model (2) vision-language model (2) video understanding (2) object detection (1) chain-of-thought reasoning (1) zero-shot learning (1) cross-modal learning (1) curriculum learning (1) multimodal learning (1) video retrieval (1) human-object interaction (1) visual grounding (1) object tracking (1) human motion synthesis (1) semantic representation (1) motion generation (1) audio processing (1) vision language model (1) language modeling (1)

Papers

MotionCtrl: A Real-time Controllable Vision-Language-Motion Model ICCV 2025 VideoOrion: Tokenizing Object Dynamics in Videos ICCV 2025 From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities ICLR 2025 Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions? ICLR 2025 Scaling Large Motion Models with Million-Level Human Motions ICML 2025 Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning EMNLP 2025 Unified Multimodal Understanding via Byte-Pair Visual Encoding ICCV 2025 LLaMA-Rider: Spurring Large Language Models to Explore the Open World NAACL 2024 UniCode : Learning a Unified Codebook for Multimodal Large Language Models ECCV 2024 Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds ICLR 2024 Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework CVPR 2023 Accommodating Audio Modality in CLIP for Multimodal Processing AAAI 2023 Few-Shot Action Recognition with Hierarchical Matching and Contrastive Learning ECCV 2022 VRDFormer: End-to-End Video Visual Relation Detection With Transformers CVPR 2022