peng gao
88 papers · 2018–2026 · 17 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
π Conference Polyglot (16) π Academic Marathon (7) π§ Keyword Pioneer π Interdisciplinary Bridge π Cross-Pollinator (11)
π
Cross-Pollinator
(11)
π
Renaissance Researcher
(9)
πΊοΈ
Taxonomy Completionist
(105)
π
Grand Slam
π¬
Deep Specialist
(16)
π§¬
Topic Evolution
π₯
Mega-Team
(23)
π
Triple Crown
π€
Dynamic Duo
(37)
ποΈ
Keyword Collector
(278)
β‘
Prolific Year
(19)
π
Conference Pioneer
π₯
Unstoppable
(8)
π
Century Club
(85)
β
The Questioner
(2)
Conferences
ICCV (13)
CVPR (13)
ECCV (11)
AAAI (10)
ICLR (9)
ICML (7)
NIPS (7)
EMNLP (4)
CORL (2)
ACL (2)
IJCAI (2)
INTERSPEECH (2)
RSS (2)
EACL (1)
NAACL (1)
SEMEVAL (1)
WACV (1)
Top co-authors
Keywords
multimodal learning
(9)
self-supervised learning
(7)
few-shot learning
(7)
vision transformer
(6)
object detection
(6)
masked autoencoder
(6)
model compression
(6)
representation learning
(5)
point cloud
(5)
knowledge distillation
(5)
large language model
(4)
contrastive learning
(4)
transfer learning
(4)
diffusion model
(4)
diffusion transformer
(4)
image generation
(3)
image classification
(3)
text-to-image generation
(3)
image segmentation
(3)
graph matching
(3)
Papers
Remember Me: Bridging the Long-Range Gap in LVLMs with Three-Step Inference-Only Decay Resilience Strategies
AAAI 2026
NL2Logic: AST-Guided Translation of Natural Language into First-Order Logic with Large Language Models
EACL 2026
TIDE: Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation
AAAI 2026
Spatial Preference Rewarding for MLLMs Spatial Understanding
ICCV 2025
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
ICLR 2025
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
ICLR 2025
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
CVPR 2025
Let's Verify and Reinforce Image Generation Step by Step
CVPR 2025
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
ICCV 2025
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
ICCV 2025
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
ICCV 2025
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
ICML 2025
MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines
ICLR 2025
A Multi-Focus-Driven Multi-Branch Network for Robust Multimodal Sentiment Analysis
AAAI 2025
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
AAAI 2025
Subteaming and Adaptive Formation Control for Coordinated Multi-Robot Navigation
CORL 2025
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
ACL 2025
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
ICLR 2025
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
ICCV 2025
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine
ICLR 2025
How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation?
ICCV 2025
FontAnimate: High Quality Few-shot Font Generation via Animating Font Transfer Process
ICCV 2025
InstructSpeech: Following Speech Editing Instructions via Large Language Models
ICML 2024
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
ICML 2024
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
ICML 2024
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
ICML 2024
Phased Consistency Models
NIPS 2024
Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT
NIPS 2024
A3VLM: Actionable Articulation-Aware Vision Language Model
CORL 2024
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
AAAI 2024
ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
ACL 2024
Efficient MAE Towards Large-Scale Vision Transformers
WACV 2024
No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation
CVPR 2024
Digital Life Project: Autonomous 3D Characters with Social Intelligence
CVPR 2024
OneLLM: One Framework to Align All Modalities with Language
CVPR 2024
Masked AutoDecoder is Effective Multi-Task Vision Generalist
CVPR 2024
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
ECCV 2024
SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding
ECCV 2024
Any2Point: Empowering Any-modality Transformers for Efficient 3D Understanding
ECCV 2024
"SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models"
ECCV 2024
Speaker Change Detection with Weighted-sum Knowledge Distillation based on Self-supervised Pre-trained Models
INTERSPEECH 2024
Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models
EMNLP 2024
E-Commerce Product Categorization with LLM-based Dual-Expert Classification Paradigm
EMNLP 2024
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
ICLR 2024
LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention
ICLR 2024
Personalize Segment Anything Model with One Shot
ICLR 2024
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
ICLR 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
ICML 2024
MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
ICCV 2023
PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning
ICCV 2023
Starting From Non-Parametric Networks for 3D Point Cloud Analysis
CVPR 2023
Resilient Binary Neural Network
AAAI 2023
Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement
ICCV 2023
Stare at What You See: Masked Image Modeling Without Reconstruction
CVPR 2023
Auxiliary Modality Learning with Generalized Curriculum Distillation
ICML 2023
Learning 3D Representations From 2D Pre-Trained Models via Image-to-Point Masked Autoencoders
CVPR 2023
Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners
CVPR 2023
Q-DETR: An Efficient Low-Bit Quantized Detection Transformer
CVPR 2023
SparseMAE: Sparse Training Meets Masked Autoencoders
ICCV 2023
Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer
NIPS 2022
Recurrent Bilinear Optimization for Binary Neural Networks
ECCV 2022
IDa-Det: An Information Discrepancy-Aware Distillation for 1-Bit Detectors
ECCV 2022
SFE-AI at SemEval-2022 Task 11: Low-Resource Named Entity Recognition using Large Pre-trained Language Models
NAACL 2022
SFE-AI at SemEval-2022 Task 11: Low-Resource Named Entity Recognition using Large Pre-trained Language Models
SEMEVAL 2022
Prototypical Contrast Adaptation for Domain Adaptive Semantic Segmentation
ECCV 2022
Frozen CLIP Models Are Efficient Video Learners
ECCV 2022
Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification
ECCV 2022
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
NIPS 2022
MCMAE: Masked Convolution Meets Masked Autoencoders
NIPS 2022
Exploring representation learning for small-footprint keyword spotting
INTERSPEECH 2022
PointCLIP: Point Cloud Understanding by CLIP
CVPR 2022
Container: Context Aggregation Networks
NIPS 2021
Dual-stream Network for Visual Recognition
NIPS 2021
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers
AAAI 2021
Fast Convergence of DETR With Spatially Modulated Co-Attention
ICCV 2021
Pairwise Half-graph Discrimination: A Simple Graph-level Self-supervised Strategy for Pre-training Graph Neural Networks
IJCAI 2021
Bayesian Deep Graph Matching for Correspondence Identification in Collaborative Perception
RSS 2021
Region Focus Network for Joint Optic Disc and Cup Segmentation
AAAI 2020
Long-Term Loop Closure Detection through Visual-Spatial Information Preserving Multi-Order Graph Matching
AAAI 2020
Pre-training Entity Relation Encoder with Intra-span and Inter-span Information
EMNLP 2020
Learning Where to Focus for Efficient Video Object Detection
ECCV 2020
Regularized Graph Matching for Correspondence Identification under Uncertainty in Collaborative Perception
RSS 2020
Video Object Detection with Locally-Weighted Deformable Neighbors
AAAI 2019
Pingan Smart Health and SJTU at COIN - Shared Task: utilizing Pre-trained Language Models and Common-sense Knowledge in Machine Reading Tasks
EMNLP 2019
Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering
CVPR 2019
Multi-Modality Latent Interaction Network for Visual Question Answering
ICCV 2019
Dynamic Bayesian Logistic Matrix Factorization for Recommendation with Implicit Feedback
IJCAI 2018
Question-Guided Hybrid Convolution for Visual Question Answering
ECCV 2018