Jing Shi
34 papers · 2016–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+12 more ↓ Show less ↑
π Conference Polyglot (12) π Academic Marathon (9) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (12)
πΊοΈ
Taxonomy Completionist
(65)
π
Conference Polyglot
(12)
π£
Hot Topic Early Bird
π§¬
Topic Evolution
π₯
Mega-Team
(30)
π₯
Unstoppable
(8)
β‘
Prolific Year
(6)
π
Conference Pioneer
π
Century Club
(33)
β
The Questioner
(2)
π
Trend Setter
ποΈ
Keyword Collector
(158)
Conferences
CVPR (8)
ICCV (5)
INTERSPEECH (5)
AAAI (3)
ECCV (3)
ACL (2)
IJCAI (2)
WACV (2)
COLING (1)
ICML (1)
MICCAI (1)
NIPS (1)
Top co-authors
Keywords
multimodal learning
(4)
large language model
(4)
speech separation
(4)
image editing
(3)
attention mechanism
(3)
diffusion model
(3)
weakly supervised learning
(3)
image generation
(3)
sequence-to-sequence model
(3)
visual question answering
(2)
zero-shot learning
(2)
scene graph generation
(2)
generative adversarial network
(2)
text-to-image generation
(2)
representation learning
(2)
image captioning
(2)
personalized generation
(2)
self-supervised learning
(2)
knowledge distillation
(2)
contrastive learning
(2)
Papers
Plotβn Polish: Zero-Shot Story Visualization and Disentangled Editing with Text-to-Image Diffusion Models
AAAI 2026
DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes
ICCV 2025
Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage
ICML 2025
GUI Agents: A Survey
ACL 2025
MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
ACL 2025
Yo'Chameleon: Personalized Vision and Language Generation
CVPR 2025
Visual Persona: Foundation Model for Full-Body Human Customization
CVPR 2025
The Photographer's Eye: Teaching Multimodal Large Language Models to See, and Critique Like Photographers
CVPR 2025
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
CVPR 2025
Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters
AAAI 2025
Improving Large Vision and Language Models by Learning from a Panel of Peers
ICCV 2025
Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images
MICCAI 2024
Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
ECCV 2024
VIXEN: Visual Text Comparison Network for Image Difference Captioning
AAAI 2024
InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
CVPR 2024
FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
ECCV 2024
Content-Aware Image Color Editing With Auxiliary Color Restoration Tasks
WACV 2024
Enhancing Visual Question Answering via Deconstructing Questions and Explicating Answers
INTERSPEECH 2023
Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation
INTERSPEECH 2023
SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing
CVPR 2022
A Simple Baseline for Weakly-Supervised Scene Graph Generation
ICCV 2021
Learning by Planning: Language-Guided Global Image Editing
CVPR 2021
Learning To Generate Scene Graph From Natural Language Supervision
ICCV 2021
Language-Guided Global Image Editing via Cross-Modal Cyclic Mechanism
ICCV 2021
How to Make a BLT Sandwich? Learning VQA Towards Understanding Web Instructional Videos
WACV 2021
Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals
NIPS 2020
A Unified Framework for Low-Latency Speaker Extraction in Cocktail Party Environments
INTERSPEECH 2020
Speaker-Conditional Chain Model for Speech Separation and Extraction
INTERSPEECH 2020
GAN-EM: GAN Based EM Learning Framework
IJCAI 2019
Not All Frames Are Equal: Weakly-Supervised Video Grounding With Contextual Similarity and Visual Clustering Losses
CVPR 2019
Which Ones Are Speaking? Speaker-Inferred Model for Multi-Talker Speech Separation
INTERSPEECH 2019
Listen, Think and Listen Again: Capturing Top-down Auditory Attention for Speaker-independent Speech Separation
IJCAI 2018
Audio-Visual Event Localization in Unconstrained Videos
ECCV 2018
Hierarchical Memory Networks for Answer Selection on Unknown Words
COLING 2016