Renrui Zhang
65 papers · 2021–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π Conference Polyglot (12) π Academic Marathon (5) π§ Keyword Pioneer π Interdisciplinary Bridge π Cross-Pollinator (5)
π
Cross-Pollinator
(5)
π
Renaissance Researcher
(6)
πΊοΈ
Taxonomy Completionist
(86)
π¬
Deep Specialist
(16)
π₯
Mega-Team
(22)
π€
Dynamic Duo
(31)
π
Triple Crown
π
Grand Slam
β‘
Prolific Year
(5)
π₯
Unstoppable
(6)
β
The Questioner
ποΈ
Keyword Collector
(223)
π
Century Club
(63)
Conferences
CVPR (17)
AAAI (9)
ICCV (9)
ICLR (9)
ECCV (6)
NIPS (5)
ICML (4)
WACV (2)
ACL (1)
CORL (1)
EMNLP (1)
IJCAI (1)
Top co-authors
Keywords
point cloud
(12)
multimodal learning
(6)
masked autoencoder
(6)
few-shot learning
(5)
3d vision
(5)
self-supervised learning
(5)
domain adaptation
(5)
zero-shot learning
(5)
multi-modal learning
(5)
foundation model
(5)
large language model
(4)
contrastive learning
(4)
3d object detection
(4)
autonomous driving
(3)
transfer learning
(3)
robotic manipulation
(3)
model compression
(3)
object detection
(3)
multimodal large language model
(3)
continual learning
(2)
Papers
TIDE: Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation
AAAI 2026
NL2CA: Auto-formalizing Cognitive Decision-Making from Natural Language Using an Unsupervised CriticNL2LTL Framework
AAAI 2026
PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models
WACV 2026
3DS-VLA: A 3D Spatial-Aware Vision Language Action Model for Robust Multi-Task Manipulation
CORL 2025
Let's Verify and Reinforce Image Generation Step by Step
CVPR 2025
Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation
CVPR 2025
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
CVPR 2025
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
ICML 2025
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine
ICLR 2025
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
ICLR 2025
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
ICLR 2025
LLaVA-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
ICLR 2025
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
ICCV 2025
MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding
AAAI 2025
MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines
ICLR 2025
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
AAAI 2025
Chimera: Improving Generalist Model with Domain-Specific Experts
ICCV 2025
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
ICCV 2025
Detect Anything 3D in the Wild
ICCV 2025
SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems
ACL 2025
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
ICCV 2025
Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models
EMNLP 2024
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
ICML 2024
FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection
AAAI 2024
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
ICLR 2024
LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention
ICLR 2024
Personalize Segment Anything Model with One Shot
ICLR 2024
ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation
ICLR 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
ICML 2024
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
NIPS 2024
RoboMamba: Efficient Vision-Language-Action Model for Robotic Reasoning and Manipulation
NIPS 2024
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
AAAI 2024
Parsing All Adverse Scenes: Severity-Aware Semantic Segmentation with Mask-Enhanced Cross-Domain Consistency
AAAI 2024
Gradient-based Parameter Selection for Efficient Fine-Tuning
CVPR 2024
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
CVPR 2024
No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation
CVPR 2024
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
CVPR 2024
NTO3D: Neural Target Object 3D Reconstruction with Segment Anything
CVPR 2024
OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
CVPR 2024
Cloud-Device Collaborative Learning for Multimodal Large Language Models
CVPR 2024
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
ICML 2024
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
ECCV 2024
PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation
ECCV 2024
"SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models"
ECCV 2024
Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement
ICCV 2023
JourneyDB: A Benchmark for Generative Image Understanding
NIPS 2023
CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention
AAAI 2023
Decorate the Newcomers: Visual Domain Prompt for Continual Test Time Adaptation
AAAI 2023
Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners
CVPR 2023
Starting From Non-Parametric Networks for 3D Point Cloud Analysis
CVPR 2023
Learning 3D Representations From 2D Pre-Trained Models via Image-to-Point Masked Autoencoders
CVPR 2023
iQuery: Instruments As Queries for Audio-Visual Sound Separation
CVPR 2023
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
CVPR 2023
PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection
CVPR 2023
MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
ICCV 2023
PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning
ICCV 2023
SparseMAE: Sparse Training Meets Masked Autoencoders
ICCV 2023
Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training
IJCAI 2023
Nearest Neighbors Meet Deep Neural Networks for Point Cloud Analysis
WACV 2023
Frozen CLIP Models Are Efficient Video Learners
ECCV 2022
PointCLIP: Point Cloud Understanding by CLIP
CVPR 2022
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
NIPS 2022
Exploring Resolution and Degradation Clues As Self-Supervised Signal for Low Quality Object Detection
ECCV 2022
Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification
ECCV 2022
Dual-stream Network for Visual Recognition
NIPS 2021