Shilong Liu

30 papers · 2021–2026 · 8 conferences · across top CS/AI conferences

Achievements

+9 more ↓

🐝 Cross-Pollinator (14) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (8) 🏃 Academic Marathon (5)

🌈 Renaissance Researcher (6) 🗺️ Taxonomy Completionist (36) 🌉 Interdisciplinary Bridge 🧬 Topic Evolution 🤝 Dynamic Duo (22) ⚡ Prolific Year (12) 💎 Century Club (28) 🗃️ Keyword Collector (99) 🔥 Unstoppable (5)

Conferences

CVPR (8) ECCV (6) ICLR (5) ICCV (4) AAAI (3) NIPS (2) ACL (1) EMNLP (1)

Top co-authors

Lei Zhang (23) Feng Li (21) Hao Zhang (18) Tianhe Ren (10) Jun Zhu (9) Hongyang Li (8) Hang Su (7) Jianwei Yang (7) Xueyan Zou (6) Chunyuan Li (6)

Keywords

object detection (6) semantic segmentation (4) instance segmentation (3) transformer architecture (3) deformable attention (3) image segmentation (3) multimodal learning (2) visual grounding (2) panoptic segmentation (2) detection transformer (2) open-vocabulary segmentation (2) agent system (2) information retrieval (1) pose estimation (1) in-context learning (1) attention mechanism (1) transfer learning (1) self-supervised learning (1) multi-view fusion (1) visual reasoning (1)

Papers

AMS-IO-Bench and AMS-IO-Agent: Benchmarking and Structured Reasoning for Analog and Mixed-Signal Integrated Circuit Input/Output Design AAAI 2026 SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features AAAI 2026 Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought CVPR 2025 CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents ACL 2025 Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection ECCV 2024 TAPTRv2: Attention-based Position Update Improves Tracking Any Point NIPS 2024 Visual In-Context Prompting CVPR 2024 TAPTR: Tracking Any Point with Transformers as Detection ECCV 2024 T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy ECCV 2024 LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models ECCV 2024 Interfacing Foundation Models' Embeddings NIPS 2024 LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents ECCV 2024 Segment and Recognize Anything at Any Granularity ECCV 2024 MMedAgent: Learning to Use Medical Tools with Multi-modal Agent EMNLP 2024 TOSS: High-quality Text-guided Novel View Synthesis from a Single Image ICLR 2024 InstructPix2NeRF: Instructed 3D Portrait Editing from a Single Image ICLR 2024 DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection ICLR 2023 Detection Transformer with Stable Matching ICCV 2023 Neural Interactive Keypoint Detection ICCV 2023 Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation ICLR 2023 DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding AAAI 2023 A Simple Framework for Open-Vocabulary Segmentation and Detection ICCV 2023 Mask DINO: Towards a Unified Transformer-Based Framework for Object Detection and Segmentation CVPR 2023 PREIM3D: 3D Consistent Precise Image Attribute Editing From a Single Image CVPR 2023 MP-Former: Mask-Piloted Transformer for Image Segmentation CVPR 2023 Lite DETR: An Interleaved Multi-Scale Encoder for Efficient DETR CVPR 2023 DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting ICCV 2023 DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR ICLR 2022 DN-DETR: Accelerate DETR Training by Introducing Query DeNoising CVPR 2022 Unsupervised Part Segmentation Through Disentangling Appearance and Shape CVPR 2021