Jiebo Luo
134 papers · 2015–2026 · 15 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+19 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (15) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (5) π Conference Polyglot (15)
π
Renaissance Researcher
(5)
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(15)
π
Keyword Trendsetter Combo
(4)
π
Conference Loyalist
(44)
π€
Dynamic Duo
(12)
π
Grand Slam
π₯
Mega-Team
(34)
π
Triple Crown
π¬
Deep Specialist
(23)
π§¬
Topic Evolution
π
Keyword Champion
(2)
ποΈ
Keyword Collector
(534)
π
Trend Setter
π
Conference Pioneer
β‘
Prolific Year
(15)
π₯
Unstoppable
(11)
β
The Questioner
π
Century Club
(130)
Conferences
CVPR (44)
ICCV (18)
AAAI (17)
ECCV (13)
EMNLP (8)
IJCAI (7)
NIPS (7)
ACL (5)
ICLR (4)
NAACL (4)
ICML (2)
WACV (2)
COLING (1)
IJCNLP (1)
MIDL (1)
Top co-authors
Research topics
Keywords
multimodal learning
(10)
image captioning
(9)
domain adaptation
(9)
image generation
(9)
diffusion model
(8)
knowledge distillation
(8)
representation learning
(7)
attention mechanism
(7)
metric learning
(6)
graph neural network
(6)
video understanding
(6)
video generation
(6)
multi-modal learning
(6)
recurrent neural network
(5)
large language model
(5)
few-shot learning
(5)
style transfer
(5)
video captioning
(5)
convolutional neural network
(5)
unsupervised learning
(4)
Papers
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
AAAI 2026
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension
AAAI 2026
Learning Cell-Aware Hierarchical Multi-Modal Representations for Robust Molecular Modeling
AAAI 2026
RealUHR: Harnessing Patch-Cascade Flows for Photorealistic Ultra-High-Resolution Synthesis
AAAI 2026
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
CVPR 2025
From Selection to Generation: A Survey of LLM-based Active Learning
ACL 2025
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
EMNLP 2025
Facilitating Long Context Understanding via Supervised Chain-of-Thought Reasoning
EMNLP 2025
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
ICCV 2025
Latent-Reframe: Enabling Camera Control for Video Diffusion Models without Training
ICCV 2025
Aligning Global Semantics and Local Textures in Generative Video Enhancement
ICCV 2025
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics
ICCV 2025
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance
ICCV 2025
HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation
CVPR 2025
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
AAAI 2025
Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion
AAAI 2025
On Path to Multimodal Generalist: General-Level and General-Bench
ICML 2025
Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection
COLING 2025
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
CVPR 2025
DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance
CVPR 2024
LLM-Rec: Personalized Recommendation via Prompting Large Language Models
NAACL 2024
Bring Metric Functions into Diffusion Models
IJCAI 2024
SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation
AAAI 2024
Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation
ACL 2024
SoMeLVLM: A Large Vision Language Model for Social Media Processing
ACL 2024
Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach
ICLR 2024
Mixture of Weak and Strong Experts on Graphs
ICLR 2024
Deceptive Fairness Attacks on Graphs via Meta Learning
ICLR 2024
BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis
EMNLP 2024
FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
ECCV 2024
Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution
CVPR 2024
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
NIPS 2024
PromptFix: You Prompt and We Fix the Photo
NIPS 2024
FeDXL: Provable Federated Learning for Deep X-Risk Optimization
ICML 2023
AnchorFormer: Point Cloud Completion From Discriminative Nodes
CVPR 2023
Stare at What You See: Masked Image Modeling Without Reconstruction
CVPR 2023
Meta-Causal Learning for Single Domain Generalization
CVPR 2023
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Alignment
ICLR 2023
Wyze Rule: Federated Rule Dataset for Rule Recommendation Benchmarking
NIPS 2023
SegPrompt: Using Segmentation Map as a Better Prompt to Finetune Deep Models for Kidney Stone Classification
MIDL 2023
Is Bigger Always Better? An Empirical Study on Efficient Architectures for Style Transfer and Beyond
WACV 2023
Event-Guided Person Re-Identification via Sparse-Dense Complementary Learning
CVPR 2023
Grounding 3D Object Affordance from 2D Interactions in Images
ICCV 2023
Spatial-Aware Token for Weakly Supervised Object Localization
ICCV 2023
PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3
ICCV 2023
QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity
CVPR 2023
Facial Attribute Transformers for Precise and Robust Makeup Transfer
WACV 2022
Learning a Grammar Inducer from Massive Uncurated Instructional Videos
EMNLP 2022
Localized Adversarial Domain Generalization
CVPR 2022
Image Inpainting with Cascaded Modulation GAN and Object-Aware Training
ECCV 2022
Automatic Relation-Aware Graph Network Proliferation
CVPR 2022
Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning
CVPR 2022
Stand-Alone Inter-Frame Attention in Video Models
CVPR 2022
SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing
CVPR 2022
Structured Multi-Level Interaction Network for Video Moment Localization via Language Query
CVPR 2021
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training
NIPS 2021
Multi-modal Dependency Tree for Video Captioning
NIPS 2021
XraySyn: Realistic View Synthesis From a Single Radiograph Through CT Priors
AAAI 2021
Spatial-temporal Causal Inference for Partial Image-to-video Adaptation
AAAI 2021
ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows
CVPR 2021
Improving OCR-Based Image Captioning by Incorporating Geometrical Relationship
CVPR 2021
Group-aware Label Transfer for Domain Adaptive Person Re-identification
CVPR 2021
TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption
CVPR 2021
Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization
ICCV 2021
Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment
ICCV 2021
SAT: 2D Semantics Assisted Training for 3D Visual Grounding
ICCV 2021
Procedure Planning in Instructional Videos via Contextual Modeling and Model-Based Policy Learning
ICCV 2021
Video-aided Unsupervised Grammar Induction
NAACL 2021
Noise Stability Regularization for Improving BERT Fine-tuning
NAACL 2021
Joint Commonsense and Relation Reasoning for Image and Video Captioning
AAAI 2020
Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language
AAAI 2020
Adaptive Offline Quintuplet Loss for Image-Text Matching
ECCV 2020
Improving One-stage Visual Grounding by Recursive Sub-query Construction
ECCV 2020
Example-Guided Image Synthesis using Masked Spatial-Channel Attention and Self-Supervision
ECCV 2020
Ultrafast Photorealistic Style Transfer via Neural Architecture Search
AAAI 2020
Neural Simile Recognition with Cyclic Multitask Learning and Local Attention
AAAI 2020
Enhancing Pointer Network for Sentence Ordering with Pairwise Ordering Predictions
AAAI 2020
Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-Identification
ECCV 2020
Fine-Grained Image-to-Image Transformation Towards Visual Recognition
CVPR 2020
On Vocabulary Reliance in Scene Text Recognition
CVPR 2020
TransMatch: A Transfer-Learning Scheme for Semi-Supervised Few-Shot Learning
CVPR 2020
An Iterative Multi-Source Mutual Knowledge Transfer Framework for Machine Reading Comprehension
IJCAI 2020
Asymmetric Distribution Measure for Few-shot Learning
IJCAI 2020
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation
ACL 2020
Expressing Objects Just Like Words: Recurrent Visual Embedding for Image-Text Matching
AAAI 2020
Learning a Weakly-Supervised Video Actor-Action Segmentation Model With a Wise Selection
CVPR 2020
Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation
EMNLP 2020
Self-Supervised Domain-Aware Generative Network for Generalized Zero-Shot Learning
CVPR 2020
Learning to Localize Actions from Moments
ECCV 2020
TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images
ECCV 2020
Structured Landmark Detection via Topology-Adapting Deep Graph Learning
ECCV 2020
Learning Semantic-aware Normalization for Generative Adversarial Networks
NIPS 2020
Iterative Dual Domain Adaptation for Neural Machine Translation
EMNLP 2019
DuDoNet: Dual Domain Network for CT Metal Artifact Reduction
CVPR 2019
Revisiting Local Descriptor Based Image-To-Class Measure for Few-Shot Learning
CVPR 2019
Multiview 2D/3D Rigid Registration via a Point-Of-Interest Network for Tracking and Triangulation
CVPR 2019
Foreground-Aware Image Inpainting
CVPR 2019
Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition
CVPR 2019
Unsupervised Image Captioning
CVPR 2019
Attentive Relational Networks for Mapping Images to Scene Graphs
CVPR 2019
A Fast and Accurate One-Stage Approach to Visual Grounding
ICCV 2019
Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning
ICCV 2019
Large-Scale Tag-Based Font Retrieval With Generative Feature Learning
ICCV 2019
Distribution Consistency Based Covariance Metric Networks for Few-Shot Learning
AAAI 2019
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations Rather Than Data
CVPR 2019
Spatio-Temporal Video Re-Localization by Warp LSTM
CVPR 2019
Gaussian Temporal Awareness Networks for Action Localization
CVPR 2019
Progressive Self-Supervised Attention Learning for Aspect-Level Sentiment Analysis
ACL 2019
Graph-based Neural Sentence Ordering
IJCAI 2019
Iterative Dual Domain Adaptation for Neural Machine Translation
IJCNLP 2019
Localizing Natural Language in Videos
AAAI 2019
Learning Deep Bilinear Transformation for Fine-grained Image Representation
NIPS 2019
Determining Code Words in Euphemistic Hate Speech Using Word Embedding Networks
EMNLP 2018
Learning to Generate Time-Lapse Videos Using Multi-Stage Dynamic Generative Adversarial Networks
CVPR 2018
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions
EMNLP 2018
VizWiz Grand Challenge: Answering Visual Questions From Blind People
CVPR 2018
Fast Factorization-free Kernel Learning for Unlabeled Chunk Data Streams
IJCAI 2018
Multi-Task Clustering with Model Relation Learning
IJCAI 2018
Video Re-localization
ECCV 2018
DOTA: A Large-Scale Dataset for Object Detection in Aerial Images
CVPR 2018
End-to-End Convolutional Semantic Embeddings
CVPR 2018
stagNet: An Attentive Semantic RNN for Group Activity Recognition
ECCV 2018
``Factual'' or ``Emotional'': Stylized Image Captioning with Adaptive Learning and Attention
ECCV 2018
VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions
ECCV 2018
Improving Pairwise Ranking for Multi-Label Image Classification
CVPR 2017
Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition
ICCV 2017
Deep Multimodal Representation Learning From Temporal Data
CVPR 2017
Learning From Noisy Labels With Distillation
ICCV 2017
TGIF: A New Dataset and Benchmark on Animated GIF Description
CVPR 2016
Unsupervised Alignment of Actions in Video with Text Descriptions
IJCAI 2016
Image Captioning With Semantic Attention
CVPR 2016
Discriminative Unsupervised Alignment of Natural Language Instructions with Corresponding Video Segments
NAACL 2015
Multi-Task Deep Visual-Semantic Embedding for Video Thumbnail Selection
CVPR 2015
Semantic Video Entity Linking Based on Visual Content and Metadata
ICCV 2015