Jiuxiang Gu
60 papers · 2017–2026 · 14 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
🌍 Conference Polyglot (13) 🏃 Academic Marathon (8) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (15)
🧭
Keyword Pioneer
🐣
Hot Topic Early Bird
🌍
Conference Polyglot
(13)
🤝
Dynamic Duo
(18)
🏆
Grand Slam
👥
Mega-Team
(34)
🔬
Deep Specialist
(12)
🧬
Topic Evolution
🗃️
Keyword Collector
(253)
📈
Trend Setter
⚡
Prolific Year
(5)
🚀
Conference Pioneer
🔥
Unstoppable
(9)
💎
Century Club
(57)
Conferences
CVPR (12)
AAAI (7)
EMNLP (6)
ICCV (6)
ECCV (5)
ICLR (5)
ACL (4)
NAACL (4)
NIPS (4)
WACV (3)
COLING (1)
EACL (1)
ICML (1)
INTERSPEECH (1)
Top co-authors
Keywords
large language model
(6)
vision-language model
(6)
multimodal learning
(5)
contrastive learning
(5)
zero-shot learning
(3)
semantic segmentation
(3)
text-to-image generation
(3)
document understanding
(3)
multimodal large language model
(3)
self-supervised learning
(3)
model compression
(3)
diffusion model
(3)
cross-modal retrieval
(2)
instruction tuning
(2)
multi-modal learning
(2)
document analysis
(2)
question answering
(2)
representation learning
(2)
image generation
(2)
knowledge distillation
(2)
Papers
A Survey on LLM-based Conversational User Simulation
EACL 2026
OIDA-QA: A Multimodal Benchmark for Analyzing the Opioid Industry Documents Archive
AAAI 2026
VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use
AAAI 2026
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers
AAAI 2025
Refer to Any Segmentation Mask Group With Vision-Language Prompts
ICCV 2025
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
ICCV 2025
MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data
CVPR 2025
DiffIP: Representation Fingerprints for Robust IP Protection of Diffusion Models
ICCV 2025
CoMMIT: Coordinated Multimodal Instruction Tuning
EMNLP 2025
ImageFolder: Autoregressive Image Generation with Folded Tokens
ICLR 2025
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding
ICLR 2025
From Selection to Generation: A Survey of LLM-based Active Learning
ACL 2025
METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling
ACL 2025
ARTIST: Improving the Generation of Text-Rich Images with Disentangled Diffusion Models and Large Language Models
WACV 2025
Numerical Pruning for Efficient Autoregressive Models
AAAI 2025
Differential Privacy Mechanisms in Neural Tangent Kernel Regression
WACV 2025
Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes
NAACL 2025
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
CVPR 2025
TextLap: Customizing Language Models for Text-to-Layout Planning
EMNLP 2024
SOHES: Self-supervised Open-world Hierarchical Entity Segmentation
ICLR 2024
ADOPD: A Large-Scale Document Page Decomposition Dataset
ICLR 2024
LRM: Large Reconstruction Model for Single Image to 3D
ICLR 2024
DocScript: Document-level Script Event Prediction
COLING 2024
Customization Assistant for Text-to-Image Generation
CVPR 2024
TRINS: Towards Multimodal Language Models that Can Read
CVPR 2024
Self-Cleaning: Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances
NAACL 2024
Category-Aware Active Domain Adaptation
ICML 2024
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
ACL 2024
Advancing Vision-Language Models with Adapter Ensemble Strategies
EMNLP 2024
High Quality Entity Segmentation
ICCV 2023
Learning the Visualness of Text Using Large Vision-Language Models
EMNLP 2023
DocEdit: Language-Guided Document Editing
AAAI 2023
AIMS: All-Inclusive Multi-Level Segmentation for Anything
NIPS 2023
A Critical Analysis of Document Out-of-Distribution Detection
EMNLP 2023
LayerDoc: Layer-Wise Extraction of Spatial Hierarchical Structure in Visually-Rich Documents
WACV 2023
CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation
ECCV 2022
Delving into Out-of-Distribution Detection with Vision-Language Representations
NIPS 2022
TiGAN: Text-Based Interactive Image Generation and Manipulation
AAAI 2022
UNISON: Unpaired Cross-Lingual Image Captioning
AAAI 2022
Learning Adaptive Axis Attentions in Fine-tuning: Beyond Fixed Sparse Attention Patterns
ACL 2022
Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling
CVPR 2022
EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval
CVPR 2022
Towards Language-Free Training for Text-to-Image Generation
CVPR 2022
Meta Spatio-Temporal Debiasing for Video Scene Graph Generation
ECCV 2022
Improving the Reliability for Confidence Estimation
ECCV 2022
MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding
EMNLP 2022
DocLayoutTTS: Dataset and Baselines for Layout-informed Document-level Neural Speech Synthesis
INTERSPEECH 2022
DocTime: A Document-level Temporal Dependency Graph Parser
NAACL 2022
Multi-Scale Aligned Distillation for Low-Resolution Detection
CVPR 2021
SelfDoc: Self-Supervised Document Representation Learning
CVPR 2021
Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models
NAACL 2021
UniDoc: Unified Pretraining Framework for Document Understanding
NIPS 2021
Exploiting Semantic Embedding and Visual Feature for Facial Action Unit Detection
CVPR 2021
Self-Supervised Relationship Probing
NIPS 2020
Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning
ECCV 2020
Scene Graph Generation With External Knowledge and Image Reconstruction
CVPR 2019
Unpaired Image Captioning via Scene Graph Alignments
ICCV 2019
Unpaired Image Captioning by Language Pivoting
ECCV 2018
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval With Generative Models
CVPR 2018
An Empirical Study of Language CNN for Image Captioning
ICCV 2017