Shengbang Tong

16 papers · 2022–2026 · 8 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🐝 Cross-Pollinator (8) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (7) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (5)

🗺️ Taxonomy Completionist (38) 🌍 Conference Polyglot (7) 🏆 Grand Slam 🏆 Keyword Champion (2) 🤝 Dynamic Duo (10) ⚡ Prolific Year (6) ❓ The Questioner (2) 📈 Trend Setter 💎 Century Club (15) 🗃️ Keyword Collector (73) 🔥 Unstoppable (5)

Conferences

NIPS (6) ICCV (3) ICLR (2) AAAI (1) ACL (1) CVPR (1) ICML (1) JMLR (1)

Top co-authors

Yi Ma (10) Saining Xie (6) Yann LeCun (5) Yuexiang Zhai (5) Tianzhe Chu (5) xili dai (4) Ziyang Wu (3) Zhuang Liu (3) Koustuv Sinha (2) Druv Pai (2)

Research topics

Core AI (1)

Keywords

multimodal learning (5) representation learning (4) self-supervised learning (4) large language model (3) vision-language model (3) vision language model (3) multimodal large language model (3) visual representation learning (3) contrastive learning (3) token compression (2) sparse rate reduction (2) benchmark evaluation (2) transformer architecture (2) visual representation (2) visual reasoning (2) visual grounding (2) decision making (1) manifold learning (1) adversarial robustness (1) vision transformer (1)

Papers

Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs AAAI 2026 MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark ACL 2025 SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training ICML 2025 MetaMorph: Multimodal Understanding and Generation via Instruction Tuning ICCV 2025 Scaling Language-Free Visual Representation Learning ICCV 2025 Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs CVPR 2024 Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning NIPS 2024 Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models ICLR 2024 White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is? JMLR 2024 Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs NIPS 2024 Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning NIPS 2024 Unsupervised Manifold Linearizing and Clustering ICCV 2023 White-Box Transformers via Sparse Rate Reduction NIPS 2023 Mass-Producing Failures of Multimodal Systems with Language Models NIPS 2023 Incremental Learning of Structured Memory via Closed-Loop Transcription ICLR 2023 Revisiting Sparse Convolutional Model for Visual Recognition NIPS 2022