conftrace_

peng gao

88 papers · 2018–2026 · 17 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+15 more ↓

🌍 Conference Polyglot (16) 🏃 Academic Marathon (7) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (11)

🐝 Cross-Pollinator (11) 🌈 Renaissance Researcher (9) 🗺️ Taxonomy Completionist (105) 🏆 Grand Slam 🔬 Deep Specialist (16) 🧬 Topic Evolution 👥 Mega-Team (23) 👑 Triple Crown 🤝 Dynamic Duo (37) 🗃️ Keyword Collector (278) ⚡ Prolific Year (19) 🚀 Conference Pioneer 🔥 Unstoppable (8) 💎 Century Club (85) ❓ The Questioner (2)

Conferences

ICCV (13) CVPR (13) ECCV (11) AAAI (10) ICLR (9) ICML (7) NIPS (7) EMNLP (4) CORL (2) ACL (2) IJCAI (2) INTERSPEECH (2) RSS (2) EACL (1) NAACL (1) SEMEVAL (1) WACV (1)

Top co-authors

hongsheng Li (38) Renrui Zhang (32) Yu Qiao (23) Ziyu Guo (13) Kaipeng Zhang (8) Wenqi Shao (8) Le Zhuo (7) Xiaogang Wang (7) Aojun Zhou (7) Siyuan Huang (6)

Keywords

multimodal learning (9) self-supervised learning (7) few-shot learning (7) vision transformer (6) object detection (6) masked autoencoder (6) model compression (6) representation learning (5) point cloud (5) knowledge distillation (5) large language model (4) contrastive learning (4) transfer learning (4) diffusion model (4) diffusion transformer (4) image generation (3) image classification (3) text-to-image generation (3) image segmentation (3) graph matching (3)

Papers

Remember Me: Bridging the Long-Range Gap in LVLMs with Three-Step Inference-Only Decay Resilience Strategies AAAI 2026 NL2Logic: AST-Guided Translation of Natural Language into First-Order Logic with Large Language Models EACL 2026 TIDE: Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation AAAI 2026 Spatial Preference Rewarding for MLLMs Spatial Understanding ICCV 2025 Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want ICLR 2025 PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions ICLR 2025 Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding CVPR 2025 Let's Verify and Reinforce Image Generation Step by Step CVPR 2025 TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction ICCV 2025 From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning ICCV 2025 VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning ICCV 2025 MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency ICML 2025 MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines ICLR 2025 A Multi-Focus-Driven Multi-Branch Network for Robust Multimodal Sentiment Analysis AAAI 2025 LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding AAAI 2025 Subteaming and Adaptive Formation Control for Coordinated Multi-Robot Navigation CORL 2025 EfficientQAT: Efficient Quantization-Aware Training for Large Language Models ACL 2025 Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation ICLR 2025 Lumina-Image 2.0: A Unified and Efficient Image Generative Framework ICCV 2025 MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine ICLR 2025 How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation? ICCV 2025 FontAnimate: High Quality Few-shot Font Generation via Animating Font Transfer Process ICCV 2025 InstructSpeech: Following Speech Editing Instructions via Large Language Models ICML 2024 SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models ICML 2024 FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion ICML 2024 MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI ICML 2024 Phased Consistency Models NIPS 2024 Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT NIPS 2024 A3VLM: Actionable Articulation-Aware Vision Language Model CORL 2024 Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation AAAI 2024 ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning ACL 2024 Efficient MAE Towards Large-Scale Vision Transformers WACV 2024 No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation CVPR 2024 Digital Life Project: Autonomous 3D Characters with Social Intelligence CVPR 2024 OneLLM: One Framework to Align All Modalities with Language CVPR 2024 Masked AutoDecoder is Effective Multi-Task Vision Generalist CVPR 2024 MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? ECCV 2024 SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding ECCV 2024 Any2Point: Empowering Any-modality Transformers for Efficient 3D Understanding ECCV 2024 "SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models" ECCV 2024 Speaker Change Detection with Weighted-sum Knowledge Distillation based on Self-supervised Pre-trained Models INTERSPEECH 2024 Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models EMNLP 2024 E-Commerce Product Categorization with LLM-based Dual-Expert Classification Paradigm EMNLP 2024 BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation ICLR 2024 LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention ICLR 2024 Personalize Segment Anything Model with One Shot ICLR 2024 OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models ICLR 2024 SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models ICML 2024 MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection ICCV 2023 PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning ICCV 2023 Starting From Non-Parametric Networks for 3D Point Cloud Analysis CVPR 2023 Resilient Binary Neural Network AAAI 2023 Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement ICCV 2023 Stare at What You See: Masked Image Modeling Without Reconstruction CVPR 2023 Auxiliary Modality Learning with Generalized Curriculum Distillation ICML 2023 Learning 3D Representations From 2D Pre-Trained Models via Image-to-Point Masked Autoencoders CVPR 2023 Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners CVPR 2023 Q-DETR: An Efficient Low-Bit Quantized Detection Transformer CVPR 2023 SparseMAE: Sparse Training Meets Masked Autoencoders ICCV 2023 Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer NIPS 2022 Recurrent Bilinear Optimization for Binary Neural Networks ECCV 2022 IDa-Det: An Information Discrepancy-Aware Distillation for 1-Bit Detectors ECCV 2022 SFE-AI at SemEval-2022 Task 11: Low-Resource Named Entity Recognition using Large Pre-trained Language Models NAACL 2022 SFE-AI at SemEval-2022 Task 11: Low-Resource Named Entity Recognition using Large Pre-trained Language Models SEMEVAL 2022 Prototypical Contrast Adaptation for Domain Adaptive Semantic Segmentation ECCV 2022 Frozen CLIP Models Are Efficient Video Learners ECCV 2022 Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification ECCV 2022 Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training NIPS 2022 MCMAE: Masked Convolution Meets Masked Autoencoders NIPS 2022 Exploring representation learning for small-footprint keyword spotting INTERSPEECH 2022 PointCLIP: Point Cloud Understanding by CLIP CVPR 2022 Container: Context Aggregation Networks NIPS 2021 Dual-stream Network for Visual Recognition NIPS 2021 Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers AAAI 2021 Fast Convergence of DETR With Spatially Modulated Co-Attention ICCV 2021 Pairwise Half-graph Discrimination: A Simple Graph-level Self-supervised Strategy for Pre-training Graph Neural Networks IJCAI 2021 Bayesian Deep Graph Matching for Correspondence Identification in Collaborative Perception RSS 2021 Region Focus Network for Joint Optic Disc and Cup Segmentation AAAI 2020 Long-Term Loop Closure Detection through Visual-Spatial Information Preserving Multi-Order Graph Matching AAAI 2020 Pre-training Entity Relation Encoder with Intra-span and Inter-span Information EMNLP 2020 Learning Where to Focus for Efficient Video Object Detection ECCV 2020 Regularized Graph Matching for Correspondence Identification under Uncertainty in Collaborative Perception RSS 2020 Video Object Detection with Locally-Weighted Deformable Neighbors AAAI 2019 Pingan Smart Health and SJTU at COIN - Shared Task: utilizing Pre-trained Language Models and Common-sense Knowledge in Machine Reading Tasks EMNLP 2019 Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering CVPR 2019 Multi-Modality Latent Interaction Network for Visual Question Answering ICCV 2019 Dynamic Bayesian Logistic Matrix Factorization for Recommendation with Implicit Feedback IJCAI 2018 Question-Guided Hybrid Convolution for Visual Question Answering ECCV 2018