Jianfeng Wang

48 papers · 2017–2026 · 11 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌍 Conference Polyglot (11) 🏃 Academic Marathon (9) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (14)

🐝 Cross-Pollinator (14) 🌈 Renaissance Researcher (7) 🗺️ Taxonomy Completionist (74) 🏆 Keyword Champion (2) 👑 Triple Crown 🔬 Deep Specialist (11) 🤝 Dynamic Duo (36) 🏆 Grand Slam 💎 Century Club (48) ⚡ Prolific Year (10) 🔥 Unstoppable (8) 📈 Trend Setter 🚀 Conference Pioneer 🗃️ Keyword Collector (175)

Conferences

CVPR (16) ICLR (7) ECCV (6) NIPS (6) ICML (3) AAAI (2) ICCV (2) IJCAI (2) WACV (2) ACL (1) EMNLP (1)

Top co-authors

Lijuan Wang (36) Zhengyuan Yang (23) Zicheng Liu (21) Linjie Li (21) Kevin Lin (13) Zhe Gan (12) Chung-Ching Lin (7) Xiaowei Hu (6) Lei Zhang (5) Xiaolin Hu (4)

Keywords

multimodal learning (9) object detection (7) image captioning (5) semi-supervised learning (4) image segmentation (4) zero-shot learning (4) vision-language model (4) video generation (4) diffusion model (4) semantic segmentation (3) transfer learning (3) image classification (3) visual question answering (3) image generation (2) convolutional neural network (2) few-shot learning (2) weak supervision (2) open-vocabulary segmentation (2) in-context learning (2) autoregressive generation (2)

Papers

Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising WACV 2026 EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing ICLR 2025 SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation ICLR 2025 GenXD: Generating Any 3D and 4D Scenes ICLR 2025 MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos ICLR 2025 LiVOS: Light Video Object Segmentation with Gated Linear Matching CVPR 2025 MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos CVPR 2024 Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning ICLR 2024 GRiT: A Generative Region-to-text Transformer for Object Understanding ECCV 2024 Idea2Img: Iterative Self-Refinement with GPT-4V for Automatic Image Design and Generation ECCV 2024 IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation ECCV 2024 Bring Metric Functions into Diffusion Models IJCAI 2024 Interfacing Foundation Models' Embeddings NIPS 2024 Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation NIPS 2024 MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning CVPR 2024 Segment and Caption Anything CVPR 2024 MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities ICML 2024 Prompting GPT-3 To Be Reliable ICLR 2023 Segment Everything Everywhere All at Once NIPS 2023 NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation ACL 2023 ReCo: Region-Controlled Text-to-Image Generation CVPR 2023 Generalized Decoding for Pixel, Image, and Language CVPR 2023 Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding CVPR 2023 NP-SemiSeg: When Neural Processes meet Semi-Supervised Semantic Segmentation ICML 2023 Learning 3D Photography Videos via Self-supervised Diffusion on Single Images IJCAI 2023 Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation CVPR 2022 Scaling Up Vision-Language Pre-Training for Image Captioning CVPR 2022 UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling ECCV 2022 Injecting Semantic Concepts Into End-to-End Image Captioning CVPR 2022 Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone NIPS 2022 An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA AAAI 2022 NP-Match: When Neural Processes meet Semi-Supervised Learning ICML 2022 NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis NIPS 2022 An Empirical Study of Training End-to-End Vision-and-Language Transformers CVPR 2022 "A Simple Approach and Benchmark for 21,000-Category Object Detection" ECCV 2022 TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption CVPR 2021 DAP: Detection-Aware Pre-Training With Weak Supervision CVPR 2021 NICE: Neural Image Commenting with Empathy EMNLP 2021 Compressing Visual-Linguistic Model via Knowledge Distillation ICCV 2021 End-to-End Semi-Supervised Object Detection With Soft Teacher ICCV 2021 SEED: Self-supervised Distillation For Visual Representation ICLR 2021 RSG: A Simple but Effective Module for Learning Imbalanced Datasets CVPR 2021 End-to-End Object Detection With Fully Convolutional Network CVPR 2021 Anchor Box Optimization for Object Detection WACV 2020 Label Distribution Learning on Auxiliary Label Space Graphs for Facial Expression Recognition CVPR 2020 Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer ECCV 2020 Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation AAAI 2019 Gated Recurrent Convolution Neural Network for OCR NIPS 2017