Jaemin Cho
27 papers · 2018–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+10 more ↓ Show less ↑
π Academic Marathon (7) π Interdisciplinary Bridge π§ Keyword Pioneer π Conference Polyglot (11) π Cross-Pollinator (11)
π
Cross-Pollinator
(11)
π
Renaissance Researcher
(6)
πΊοΈ
Taxonomy Completionist
(51)
π
Grand Slam
π€
Dynamic Duo
(21)
π§¬
Topic Evolution
ποΈ
Keyword Collector
(109)
π
Century Club
(26)
π₯
Unstoppable
(8)
β‘
Prolific Year
(5)
Conferences
NIPS (7)
CVPR (3)
EMNLP (3)
ICLR (3)
ECCV (2)
ICCV (2)
NAACL (2)
AAAI (1)
EACL (1)
ICML (1)
IJCNLP (1)
WACV (1)
Top co-authors
Keywords
multimodal learning
(8)
visual reasoning
(3)
text-to-image generation
(3)
visual question answering
(3)
image captioning
(3)
parameter efficient transfer learning
(2)
content selection
(2)
text generation
(2)
transfer learning
(2)
mixture of expert
(2)
diverse generation
(2)
video-language model
(2)
vision-language model
(2)
multi-task learning
(2)
diffusion model
(2)
video understanding
(2)
vision language model
(2)
spatial reasoning
(2)
video captioning
(1)
knowledge extraction
(1)
Papers
RotBench: Evaluating Multi-modal Large Language Models on Identifying Image Rotation
EACL 2026
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
ICLR 2025
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
ICLR 2025
CAPTURE: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting
ICCV 2025
Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning
EMNLP 2025
Rethinking Interactive Image Segmentation with Low Latency High Quality and Diverse Prompts
CVPR 2024
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data
NIPS 2024
DOCCI: Descriptions of Connected and Contrasting Images
ECCV 2024
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
ECCV 2024
Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation
ICLR 2024
Hierarchical Video-Moment Retrieval and Step-Captioning
CVPR 2023
Self-Chained Image-Language Model for Video Localization and Question Answering
NIPS 2023
Perceiver-VL: Efficient Vision-and-Language Modeling With Iterative Latent Attention
WACV 2023
Paxion: Patching Action Knowledge in Video-Language Foundation Models
NIPS 2023
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
ICCV 2023
Visual Programming for Step-by-Step Text-to-Image Generation and Evaluation
NIPS 2023
Fine-grained Image Captioning with CLIP Reward
NAACL 2022
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
NIPS 2022
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
CVPR 2022
TVLT: Textless Vision-Language Transformer
NIPS 2022
MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding
AAAI 2022
VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer
NIPS 2021
Unifying Vision-and-Language Tasks via Text Generation
ICML 2021
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
EMNLP 2020
Mixture Content Selection for Diverse Sequence Generation
IJCNLP 2019
Mixture Content Selection for Diverse Sequence Generation
EMNLP 2019
A Hierarchical Latent Structure for Variational Conversation Modeling
NAACL 2018