Jiuxiang Gu

60 papers · 2017–2026 · 14 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌍 Conference Polyglot (13) 🏃 Academic Marathon (8) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (15)

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌍 Conference Polyglot (13) 🤝 Dynamic Duo (18) 🏆 Grand Slam 👥 Mega-Team (34) 🔬 Deep Specialist (12) 🧬 Topic Evolution 🗃️ Keyword Collector (253) 📈 Trend Setter ⚡ Prolific Year (5) 🚀 Conference Pioneer 🔥 Unstoppable (9) 💎 Century Club (57)

Conferences

CVPR (12) AAAI (7) EMNLP (6) ICCV (6) ECCV (5) ICLR (5) ACL (4) NAACL (4) NIPS (4) WACV (3) COLING (1) EACL (1) ICML (1) INTERSPEECH (1)

Top co-authors

Ruiyi Zhang (19) Jason Kuen (19) Tong Sun (16) Tong Yu (13) Ani Nenkova (11) Franck Dernoncourt (11) Handong Zhao (10) Yufan Zhou (9) Rajiv Jain (9) Zhe Lin (9)

Keywords

large language model (6) vision-language model (6) multimodal learning (5) contrastive learning (5) zero-shot learning (3) semantic segmentation (3) text-to-image generation (3) document understanding (3) multimodal large language model (3) self-supervised learning (3) model compression (3) diffusion model (3) cross-modal retrieval (2) instruction tuning (2) multi-modal learning (2) document analysis (2) question answering (2) representation learning (2) image generation (2) knowledge distillation (2)

Papers

A Survey on LLM-based Conversational User Simulation EACL 2026 OIDA-QA: A Multimodal Benchmark for Analyzing the Opioid Industry Documents Archive AAAI 2026 VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use AAAI 2026 LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers AAAI 2025 Refer to Any Segmentation Mask Group With Vision-Language Prompts ICCV 2025 Multimodal LLMs as Customized Reward Models for Text-to-Image Generation ICCV 2025 MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data CVPR 2025 DiffIP: Representation Fingerprints for Robust IP Protection of Diffusion Models ICCV 2025 CoMMIT: Coordinated Multimodal Instruction Tuning EMNLP 2025 ImageFolder: Autoregressive Image Generation with Folded Tokens ICLR 2025 SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding ICLR 2025 From Selection to Generation: A Survey of LLM-based Active Learning ACL 2025 METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling ACL 2025 ARTIST: Improving the Generation of Text-Rich Images with Disentangled Diffusion Models and Large Language Models WACV 2025 Numerical Pruning for Efficient Autoregressive Models AAAI 2025 Differential Privacy Mechanisms in Neural Tangent Kernel Regression WACV 2025 Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes NAACL 2025 QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge CVPR 2025 TextLap: Customizing Language Models for Text-to-Layout Planning EMNLP 2024 SOHES: Self-supervised Open-world Hierarchical Entity Segmentation ICLR 2024 ADOPD: A Large-Scale Document Page Decomposition Dataset ICLR 2024 LRM: Large Reconstruction Model for Single Image to 3D ICLR 2024 DocScript: Document-level Script Event Prediction COLING 2024 Customization Assistant for Text-to-Image Generation CVPR 2024 TRINS: Towards Multimodal Language Models that Can Read CVPR 2024 Self-Cleaning: Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances NAACL 2024 Category-Aware Active Domain Adaptation ICML 2024 Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning ACL 2024 Advancing Vision-Language Models with Adapter Ensemble Strategies EMNLP 2024 High Quality Entity Segmentation ICCV 2023 Learning the Visualness of Text Using Large Vision-Language Models EMNLP 2023 DocEdit: Language-Guided Document Editing AAAI 2023 AIMS: All-Inclusive Multi-Level Segmentation for Anything NIPS 2023 A Critical Analysis of Document Out-of-Distribution Detection EMNLP 2023 LayerDoc: Layer-Wise Extraction of Spatial Hierarchical Structure in Visually-Rich Documents WACV 2023 CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation ECCV 2022 Delving into Out-of-Distribution Detection with Vision-Language Representations NIPS 2022 TiGAN: Text-Based Interactive Image Generation and Manipulation AAAI 2022 UNISON: Unpaired Cross-Lingual Image Captioning AAAI 2022 Learning Adaptive Axis Attentions in Fine-tuning: Beyond Fixed Sparse Attention Patterns ACL 2022 Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling CVPR 2022 EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval CVPR 2022 Towards Language-Free Training for Text-to-Image Generation CVPR 2022 Meta Spatio-Temporal Debiasing for Video Scene Graph Generation ECCV 2022 Improving the Reliability for Confidence Estimation ECCV 2022 MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding EMNLP 2022 DocLayoutTTS: Dataset and Baselines for Layout-informed Document-level Neural Speech Synthesis INTERSPEECH 2022 DocTime: A Document-level Temporal Dependency Graph Parser NAACL 2022 Multi-Scale Aligned Distillation for Low-Resolution Detection CVPR 2021 SelfDoc: Self-Supervised Document Representation Learning CVPR 2021 Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models NAACL 2021 UniDoc: Unified Pretraining Framework for Document Understanding NIPS 2021 Exploiting Semantic Embedding and Visual Feature for Facial Action Unit Detection CVPR 2021 Self-Supervised Relationship Probing NIPS 2020 Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning ECCV 2020 Scene Graph Generation With External Knowledge and Image Reconstruction CVPR 2019 Unpaired Image Captioning via Scene Graph Alignments ICCV 2019 Unpaired Image Captioning by Language Pivoting ECCV 2018 Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval With Generative Models CVPR 2018 An Empirical Study of Language CNN for Image Captioning ICCV 2017