Jing Shi

34 papers · 2016–2026 · 12 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🌍 Conference Polyglot (12) 🏃 Academic Marathon (9) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (12)

🗺️ Taxonomy Completionist (65) 🌍 Conference Polyglot (12) 🐣 Hot Topic Early Bird 🧬 Topic Evolution 👥 Mega-Team (30) 🔥 Unstoppable (8) ⚡ Prolific Year (6) 🚀 Conference Pioneer 💎 Century Club (33) ❓ The Questioner (2) 📈 Trend Setter 🗃️ Keyword Collector (158)

Conferences

CVPR (8) ICCV (5) INTERSPEECH (5) AAAI (3) ECCV (3) ACL (2) IJCAI (2) WACV (2) COLING (1) ICML (1) MICCAI (1) NIPS (1)

Top co-authors

Bo Xu (8) Chenliang Xu (8) Jiaming Xu (6) Zhe Lin (4) Simon Jenni (4) Trung Bui (4) John Collomosse (4) Kushal Kafle (4) Ning Xu (4) Yifei Fan (4)

Keywords

multimodal learning (4) large language model (4) speech separation (4) image editing (3) attention mechanism (3) diffusion model (3) weakly supervised learning (3) image generation (3) sequence-to-sequence model (3) visual question answering (2) zero-shot learning (2) scene graph generation (2) generative adversarial network (2) text-to-image generation (2) representation learning (2) image captioning (2) personalized generation (2) self-supervised learning (2) knowledge distillation (2) contrastive learning (2)

Papers

Plot’n Polish: Zero-Shot Story Visualization and Disentangled Editing with Text-to-Image Diffusion Models AAAI 2026 DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes ICCV 2025 Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage ICML 2025 GUI Agents: A Survey ACL 2025 MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities ACL 2025 Yo'Chameleon: Personalized Vision and Language Generation CVPR 2025 Visual Persona: Foundation Model for Full-Body Human Customization CVPR 2025 The Photographer's Eye: Teaching Multimodal Large Language Models to See, and Critique Like Photographers CVPR 2025 FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity CVPR 2025 Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters AAAI 2025 Improving Large Vision and Language Models by Learning from a Panel of Peers ICCV 2025 Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images MICCAI 2024 Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models ECCV 2024 VIXEN: Visual Text Comparison Network for Image Difference Captioning AAAI 2024 InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning CVPR 2024 FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction ECCV 2024 Content-Aware Image Color Editing With Auxiliary Color Restoration Tasks WACV 2024 Enhancing Visual Question Answering via Deconstructing Questions and Explicating Answers INTERSPEECH 2023 Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation INTERSPEECH 2023 SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing CVPR 2022 A Simple Baseline for Weakly-Supervised Scene Graph Generation ICCV 2021 Learning by Planning: Language-Guided Global Image Editing CVPR 2021 Learning To Generate Scene Graph From Natural Language Supervision ICCV 2021 Language-Guided Global Image Editing via Cross-Modal Cyclic Mechanism ICCV 2021 How to Make a BLT Sandwich? Learning VQA Towards Understanding Web Instructional Videos WACV 2021 Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals NIPS 2020 A Unified Framework for Low-Latency Speaker Extraction in Cocktail Party Environments INTERSPEECH 2020 Speaker-Conditional Chain Model for Speech Separation and Extraction INTERSPEECH 2020 GAN-EM: GAN Based EM Learning Framework IJCAI 2019 Not All Frames Are Equal: Weakly-Supervised Video Grounding With Contextual Similarity and Visual Clustering Losses CVPR 2019 Which Ones Are Speaking? Speaker-Inferred Model for Multi-Talker Speech Separation INTERSPEECH 2019 Listen, Think and Listen Again: Capturing Top-down Auditory Attention for Speaker-independent Speech Separation IJCAI 2018 Audio-Visual Event Localization in Unconstrained Videos ECCV 2018 Hierarchical Memory Networks for Answer Selection on Unknown Words COLING 2016