Xintao Wang

76 papers · 2018–2026 · 11 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🏃 Academic Marathon (7) 🌍 Conference Polyglot (11) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (11)

🐝 Cross-Pollinator (11) 🌈 Renaissance Researcher (6) 🗺️ Taxonomy Completionist (88) 🏠 Conference Loyalist (22) 🧬 Topic Evolution 🤝 Dynamic Duo (37) 🏆 Keyword Champion (7) 👑 Triple Crown 🏆 Grand Slam 🔬 Deep Specialist (25) ⚡ Prolific Year (12) 🗃️ Keyword Collector (279) 🔥 Unstoppable (5) ❓ The Questioner 💎 Century Club (74)

Conferences

CVPR (22) AAAI (10) NIPS (8) ECCV (7) ICCV (7) ACL (6) ICLR (6) EMNLP (5) ICML (3) IJCAI (1) NAACL (1)

Top co-authors

Ying Shan (37) Chao Dong (15) Yanghua Xiao (13) Yong Zhang (10) Xiaodong Cun (10) Di Zhang (9) Menghan Xia (9) Liangbin Xie (9) Pengfei Wan (9) Jiaqing Liang (8)

Keywords

diffusion model (21) video generation (12) image restoration (8) large language model (8) video super-resolution (7) role-playing agent (6) text-to-image generation (6) image super-resolution (6) text-to-video generation (5) image generation (5) generative adversarial network (5) image editing (4) convolutional neural network (3) video editing (3) latent space (3) attention mechanism (3) zero-shot learning (3) language model (3) image synthesis (3) motion control (3)

Papers

Can LLMs Learn to Map the World from Local Descriptions? ACL 2026 HumanLLM: Benchmarking and Improving LLM Anthropomorphism via Human Cognitive Patterns ACL 2026 ReCamMaster: Camera-Controlled Generative Rendering from A Single Video ICCV 2025 Guess What I am Thinking: A Benchmark for Inner Thought Reasoning of Role-Playing Language Agents EMNLP 2025 Character is Destiny: Can Persona-assigned Language Models Make Personal Choices? EMNLP 2025 Curse of Knowledge: Your Guidance and Provided Knowledge are biasing LLM Judges in Complex Evaluation EMNLP 2025 PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution CVPR 2025 StyleMaster: Stylize Your Video with Artistic Generation and Translation CVPR 2025 SketchVideo: Sketch-based Video Generation and Editing CVPR 2025 SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints ICLR 2025 FullDiT: Video Generative Foundation Models with Multimodal Control via Full Attention ICCV 2025 GameFactory: Creating New Games with Generative Interactive Videos ICCV 2025 CoSER: Coordinating LLM-Based Persona Simulation of Established Roles ICML 2025 Image Conductor: Precision Control for Interactive Video Synthesis AAAI 2025 CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities AAAI 2025 Anti-Diffusion: Preventing Abuse of Modifications of Diffusion-Based Models AAAI 2025 Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation NAACL 2025 BOOKWORLD: From Novels to Interactive Agent Societies for Story Creation ACL 2025 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation ICLR 2025 Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals ACL 2025 FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling ICLR 2024 ReVideo: Remake a Video with Motion and Content Control NIPS 2024 VideoTetris: Towards Compositional Text-to-Video Generation NIPS 2024 MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions NIPS 2024 Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos AAAI 2024 T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion Models AAAI 2024 SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model AAAI 2024 InCharacter: Evaluating Personality Fidelity in Role-Playing Agents through Psychological Interviews ACL 2024 Light Up the Shadows: Enhance Long-Tailed Entity Grounding with Concept-Guided Vision-Language Models ACL 2024 Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners CVPR 2024 Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis CVPR 2024 PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding CVPR 2024 SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models CVPR 2024 Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild CVPR 2024 X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model CVPR 2024 DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing CVPR 2024 EvalCrafter: Benchmarking and Evaluating Large Video Generation Models CVPR 2024 VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models CVPR 2024 MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model ECCV 2024 BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion ECCV 2024 DreamDiffusion: High-Quality EEG-to-Image Generation with Temporal Masked Signal Modeling and CLIP Alignment ECCV 2024 Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation ECCV 2024 DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors ECCV 2024 Evaluating Character Understanding of Large Language Models via Character Profiling from Fictional Works EMNLP 2024 Capturing Minds, Not Just Words: Enhancing Role-Playing Language Models with Personality-Indicative Data EMNLP 2024 ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models ICLR 2024 Making LLaMA SEE and Draw with SEED Tokenizer ICLR 2024 DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models ICLR 2024 Unifying Image Processing as Visual Prompting Question Answering ICML 2024 Accelerating the Training of Video Super-resolution Models AAAI 2023 DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models ICML 2023 Inserting Anybody in Diffusion Models via Celeb Basis NIPS 2023 OSRT: Omnidirectional Image Super-Resolution With Distortion-Aware Transformer CVPR 2023 Activating More Pixels in Image Super-Resolution Transformer CVPR 2023 Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models CVPR 2023 Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation ICCV 2023 MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing ICCV 2023 FateZero: Fusing Attentions for Zero-shot Text-based Video Editing ICCV 2023 MAPS-KB: A Million-Scale Probabilistic Simile Knowledge Base AAAI 2023 Mitigating Artifacts in Real-World Video Super-resolution Models AAAI 2023 Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models NIPS 2023 AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos NIPS 2022 Language Models as Knowledge Embeddings IJCAI 2022 Metric Learning Based Interactive Modulation for Real-World Super-Resolution ECCV 2022 Rethinking Alignment in Video Super-Resolution Transformers NIPS 2022 VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder ECCV 2022 Towards Real-World Blind Face Restoration With Generative Facial Prior CVPR 2021 Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution NIPS 2021 Positional Encoding As Spatial Inductive Bias in GANs CVPR 2021 BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond CVPR 2021 Robust Reference-Based Super-Resolution via C2-Matching CVPR 2021 Towards Vivid and Diverse Image Colorization With Generative Color Prior ICCV 2021 Understanding Deformable Alignment in Video Super-Resolution AAAI 2021 GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution CVPR 2021 Deep Network Interpolation for Continuous Imagery Effect Transition CVPR 2019 Recovering Realistic Texture in Image Super-Resolution by Deep Spatial Feature Transform CVPR 2018