Si Liu

93 papers · 2013–2026 · 14 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🌍 Conference Polyglot (13) 🏃 Academic Marathon (12) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (10) 🏃 Academic Marathon (12) 🗺️ Taxonomy Completionist (115) 🏠 Conference Loyalist (40) 🌱 Topic Pioneer 🔬 Deep Specialist (17) 🧬 Topic Evolution 🤝 Dynamic Duo (15) 🏆 Grand Slam 👥 Mega-Team (25) 🗃️ Keyword Collector (394) ⚡ Prolific Year (17) 🚀 Conference Pioneer 💎 Century Club (90) 🔥 Unstoppable (11) 📈 Trend Setter

Conferences

CVPR (40) ICCV (15) ECCV (8) AAAI (7) ICLR (6) NIPS (6) IJCAI (4) ACL (1) EMNLP (1) ICML (1) JMLR (1) MICCAI (1) NSDI (1) OSDI (1)

Top co-authors

Yue Liao (15) hongsheng Li (12) Shaofei Huang (11) Jizhong Han (10) Tianrui Hui (10) Chen Gao (9) Jinyu Chen (8) Guanbin Li (8) Xiaochun Cao (7) Linjiang Huang (6)

Keywords

semantic segmentation (9) object detection (8) convolutional neural network (7) multimodal learning (6) video understanding (5) knowledge distillation (5) attention mechanism (4) weakly supervised learning (4) feature extraction (4) autonomous driving (4) image generation (4) generative adversarial network (3) reinforcement learning (3) multimodal large language model (3) representation learning (3) object tracking (3) human-object interaction (3) pose estimation (3) object localization (3) human-object interaction detection (3)

Papers

AerialVLA: A Vision-Language-Action Model for Aerial Navigation with Online Dialogue AAAI 2026 VaccineRAG: Boosting Multimodal Large Language Models’ Immunity to Harmful RAG Samples AAAI 2026 MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning ACL 2026 ViPE: Visual Perception in Parameter Space for Efficient Video-Language Understanding EMNLP 2025 Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation AAAI 2025 GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance AAAI 2025 Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology ICLR 2025 Mixture Compressor for Mixture-of-Experts LLMs Gains More ICLR 2025 LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding CVPR 2025 VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection CVPR 2025 Revisiting Audio-Visual Segmentation with Vision-Centric Transformer CVPR 2025 Generative Map Priors for Collaborative BEV Semantic Segmentation CVPR 2025 FlexDrive: Toward Trajectory Flexibility in Driving Scene Gaussian Splatting Reconstruction and Rendering CVPR 2025 LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation ICLR 2025 Point Cluster: A Compact Message Unit for Communication-Efficient Collaborative Perception ICLR 2025 Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization ICCV 2025 CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective ICCV 2025 CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation ICCV 2025 Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs ICCV 2025 Image Understanding Makes for A Good Tokenizer for Image Generation NIPS 2024 Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT NIPS 2024 Communication-Efficient Collaborative Perception via Information Filling with Codebook CVPR 2024 SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection CVPR 2024 EASE-DETR: Easing the Competition among Object Queries CVPR 2024 Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training CVPR 2024 Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation MICCAI 2024 Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE ICLR 2024 ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation ICLR 2024 Asynchronous Large Language Model Enhanced Planner for Autonomous Driving ECCV 2024 Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection ECCV 2024 Controllable Navigation Instruction Generation with Chain of Thought Prompting ECCV 2024 LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction ECCV 2024 FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis ECCV 2024 Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection CVPR 2024 CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics NIPS 2024 Optimizing the Placement of Roadside LiDARs for Autonomous Driving ICCV 2023 Boosting Verification of Deep Reinforcement Learning via Piece-Wise Linear Decision Neural Networks NIPS 2023 MARBLE: Music Audio Representation Benchmark for Universal Evaluation NIPS 2023 Boosting Verified Training for Robust Image Classifications via Abstraction CVPR 2023 Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection CVPR 2023 Bridging Search Region Interaction With Template for RGB-T Tracking CVPR 2023 Adaptive Zone-Aware Hierarchical Planner for Vision-Language Navigation CVPR 2023 Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels CVPR 2023 DETR With Additional Global Aggregation for Cross-Domain Weakly Supervised Object Detection CVPR 2023 Anchor3DLane: Learning To Regress 3D Anchors for Monocular 3D Lane Detection CVPR 2023 Omnidirectional Information Gathering for Knowledge Transfer-Based Audio-Visual Navigation ICCV 2023 Video Background Music Generation: Dataset, Method and Evaluation ICCV 2023 Object as Query: Lifting Any 2D Object Detector to 3D Detection ICCV 2023 Discovering Sounding Objects by Audio Queries for Audio Visual Segmentation IJCAI 2023 Enriching Phrases with Coupled Pixel and Object Contexts for Panoptic Narrative Grounding IJCAI 2023 RHINE: Robust and High-performance Internet Naming with E2E Authenticity NSDI 2023 Detecting Transactional Bugs in Database Engines via Graph-Based Oracle Construction OSDI 2023 PAC Guarantees and Effective Algorithms for Detecting Novel Categories JMLR 2022 HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors ECCV 2022 PoseTrans: A Simple yet Effective Pose Transformation Augmentation for Human Pose Estimation ECCV 2022 Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation CVPR 2022 Reinforced Structured State-Evolution for Vision-Language Navigation CVPR 2022 GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection CVPR 2022 Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation CVPR 2022 3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection CVPR 2022 Language-Guided Global Image Editing via Cross-Modal Cyclic Mechanism ICCV 2021 General Instance Distillation for Object Detection CVPR 2021 Mining the Benefits of Two-stage and One-stage HOI Detection NIPS 2021 Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression CVPR 2021 Reformulating HOI Detection As Adaptive Set Prediction CVPR 2021 Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing CVPR 2021 Confidence-aware Non-repetitive Multimodal Transformers for TextCaps AAAI 2021 Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation CVPR 2021 Linguistic Structure Guided Context Modeling for Referring Image Segmentation ECCV 2020 A Real-Time Cross-Modality Correlation Filtering Method for Referring Expression Comprehension CVPR 2020 Referring Image Segmentation via Cross-Modal Progressive Comprehension CVPR 2020 Tree-Structured Policy Based Progressive Reinforcement Learning for Temporally Language Grounding in Video AAAI 2020 PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer CVPR 2020 AdversarialNAS: Adversarial Neural Architecture Search for GANs CVPR 2020 PPDM: Parallel Point Detection and Matching for Real-Time Human-Object Interaction Detection CVPR 2020 Rule-Guided Compositional Representation Learning on Knowledge Graphs AAAI 2020 RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment ICCV 2019 Building Detail-Sensitive Semantic Segmentation Networks With Polynomial Pooling CVPR 2019 Open Category Detection with PAC Guarantees ICML 2018 Ensemble Soft-Margin Softmax Loss for Image Classification IJCAI 2018 Surveillance Video Parsing With Single Frame Supervision CVPR 2017 Learning Adaptive Receptive Fields for Deep Image Parsing Network CVPR 2017 Makeup Like a Superstar: Deep Localized Makeup Transfer Network IJCAI 2016 SketchNet: Sketch Classification With Web Images CVPR 2016 Structural Correlation Filter for Robust Visual Tracking CVPR 2016 Matching-CNN Meets KNN: Quasi-Parametric Human Parsing CVPR 2015 Structural Sparse Tracking CVPR 2015 Diversity-Induced Multi-View Subspace Clustering CVPR 2015 Low-Rank Tensor Constrained Multiview Subspace Clustering ICCV 2015 Human Parsing With Contextualized Convolutional Neural Network ICCV 2015 Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection ICCV 2015 Low-Rank Sparse Coding for Image Classification ICCV 2013 SYM-FISH: A Symmetry-Aware Flip Invariant Sketch Histogram Shape Descriptor ICCV 2013