Jianwei Yang
52 papers · 2016–2025 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
🌍 Conference Polyglot (10) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (5) 🏃 Academic Marathon (9)
🌉
Interdisciplinary Bridge
🏃
Academic Marathon
(9)
🧭
Keyword Pioneer
🌟
Keyword Trendsetter Combo
(4)
🤝
Dynamic Duo
(29)
🔬
Deep Specialist
(20)
🧬
Topic Evolution
🏆
Keyword Champion
(5)
🏆
Grand Slam
⚡
Prolific Year
(7)
❓
The Questioner
🗃️
Keyword Collector
(188)
💎
Century Club
(52)
📈
Trend Setter
🔥
Unstoppable
(5)
🚀
Conference Pioneer
Conferences
CVPR (14)
NIPS (13)
ICCV (7)
ECCV (6)
ICLR (5)
EMNLP (2)
ICML (2)
AAAI (1)
CORL (1)
MICCAI (1)
Top co-authors
Keywords
object detection
(13)
vision-language model
(10)
transfer learning
(9)
multimodal learning
(8)
convolutional neural network
(5)
semantic segmentation
(5)
few-shot learning
(5)
image segmentation
(5)
open-vocabulary segmentation
(5)
zero-shot learning
(5)
contrastive learning
(5)
vision transformer
(5)
image classification
(4)
representation learning
(3)
multi-modal learning
(3)
visual question answering
(3)
multimodal large language model
(3)
visual representation
(2)
visual grounding
(2)
knowledge distillation
(2)
Papers
Simplifying DINO via Coding Rate Regularization
ICML 2025
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding
ICML 2025
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
CVPR 2025
Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation
CVPR 2025
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
ICLR 2025
Latent Action Pretraining from Videos
ICLR 2025
Matryoshka Multimodal Models
ICLR 2025
SITE: towards Spatial Intelligence Thorough Evaluation
ICCV 2025
ProLongVid: A Simple but Strong Baseline for Long-context Video Instruction Tuning
EMNLP 2025
Magma: A Foundation Model for Multimodal AI Agents
CVPR 2025
Structure-Aware Cross-Modal Prompt Tuning for Autonomous Bronchoscopic Navigation
MICCAI 2025
Efficient Modulation for Vision Networks
ICLR 2024
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
NIPS 2024
Interfacing Foundation Models' Embeddings
NIPS 2024
Towards Flexible Visual Relationship Segmentation
NIPS 2024
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
CVPR 2024
Visual In-Context Prompting
CVPR 2024
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
ECCV 2024
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
ECCV 2024
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
ECCV 2024
Segment and Recognize Anything at Any Granularity
ECCV 2024
Pix2Gif: Motion-Guided Diffusion for GIF Generation
ECCV 2024
GLIGEN: Open-Set Grounded Text-to-Image Generation
CVPR 2023
Generalized Decoding for Pixel, Image, and Language
CVPR 2023
Parameter-Efficient Model Adaptation for Vision Transformers
AAAI 2023
Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection
NIPS 2023
LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following
EMNLP 2023
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
NIPS 2023
A Simple Framework for Open-Vocabulary Segmentation and Detection
ICCV 2023
Segment Everything Everywhere All at Once
NIPS 2023
Learning Customized Visual Models With Retrieval-Augmented Knowledge
CVPR 2023
Focal Modulation Networks
NIPS 2022
Grounded Language-Image Pre-Training
CVPR 2022
RegionCLIP: Region-Based Language-Image Pretraining
CVPR 2022
Unified Contrastive Learning in Image-Text-Label Space
CVPR 2022
Efficient Self-supervised Vision Transformers for Representation Learning
ICLR 2022
K-LITE: Learning Transferable Visual Models with External Knowledge
NIPS 2022
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
NIPS 2022
Focal Attention for Long-Range Interactions in Vision Transformers
NIPS 2021
TACo: Token-Aware Cascade Contrastive Learning for Video-Text Alignment
ICCV 2021
Dynamic DETR: End-to-End Object Detection With Dynamic Attention
ICCV 2021
Learning To Generate Scene Graph From Natural Language Supervision
ICCV 2021
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
ICCV 2021
VinVL: Revisiting Visual Representations in Vision-Language Models
CVPR 2021
Embodied Amodal Recognition: Learning to Move to Perceive Objects
ICCV 2019
Cross-channel Communication Networks
NIPS 2019
Neural Baby Talk
CVPR 2018
Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition
CORL 2018
Graph R-CNN for Scene Graph Generation
ECCV 2018
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
NIPS 2017
Joint Unsupervised Learning of Deep Representations and Image Clusters
CVPR 2016
Hierarchical Question-Image Co-Attention for Visual Question Answering
NIPS 2016