Ying Shan
150 papers · 2020–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (14) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (5) π Conference Polyglot (11)
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(14)
π§
Keyword Pioneer
π
Conference Loyalist
(22)
π
Grand Slam
π
Triple Crown
π€
Dynamic Duo
(43)
π¬
Deep Specialist
(37)
π§¬
Topic Evolution
π
Keyword Champion
(4)
β
The Questioner
(2)
ποΈ
Keyword Collector
(559)
π
Century Club
(149)
π₯
Unstoppable
(6)
β‘
Prolific Year
(49)
Conferences
CVPR (55)
ICCV (22)
ECCV (17)
NIPS (16)
AAAI (14)
ICLR (9)
ACL (5)
ICML (5)
IJCAI (3)
INTERSPEECH (3)
NAACL (1)
Top co-authors
Research topics
Keywords
diffusion model
(26)
neural radiance field
(13)
novel view synthesis
(12)
3d reconstruction
(11)
multimodal learning
(10)
video generation
(9)
image generation
(7)
multi-modal learning
(7)
object detection
(7)
representation learning
(6)
video understanding
(6)
text-to-image generation
(5)
zero-shot learning
(5)
neural network
(5)
transfer learning
(5)
vision transformer
(5)
multimodal large language model
(5)
image synthesis
(4)
depth estimation
(4)
generative adversarial network
(4)
Papers
MMhops-R1: Multimodal Multi-hop Reasoning
AAAI 2026
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction
ICCV 2025
Image Conductor: Precision Control for Interactive Video Synthesis
AAAI 2025
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
AAAI 2025
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
ICCV 2025
GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers
ICCV 2025
Taming Rectified Flow for Inversion and Editing
ICML 2025
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
NAACL 2025
FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction
ICCV 2025
HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
ICML 2025
LoRA-Gen: Specializing Large Language Model via Online LoRA Generation
ICML 2025
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
ICCV 2025
DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation
CVPR 2025
Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh
CVPR 2025
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
CVPR 2025
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
CVPR 2025
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images
CVPR 2025
VisionMath: Vision-Form Mathematical Problem-Solving
ICCV 2025
Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion
CVPR 2025
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
CVPR 2025
DepthSync: Diffusion Guidance-Based Depth Synchronization for Scale- and Geometry-Consistent Video Depth Estimation
ICCV 2025
Scalable Image Tokenization with Index Backpropagation Quantization
ICCV 2025
Mamba-3VL: Taming State Space Model for 3D Vision Language Learning
ICCV 2025
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
ICCV 2025
AFL-Net: Integrating Audio, Facial, and Lip Modalities with a Two-step Cross-attention for Robust Speaker Diarization in the Wild
INTERSPEECH 2024
CV-VAE: A Compatible Video VAE for Latent Generative Video Models
NIPS 2024
ReVideo: Remake a Video with Motion and Content Control
NIPS 2024
SparseGNV: Generating Novel Views of Indoor Scenes with Sparse RGB-D Images
AAAI 2024
SC-NeuS: Consistent Neural Surface Reconstruction from Sparse and Noisy Views
AAAI 2024
T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion Models
AAAI 2024
SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model
AAAI 2024
A Pre-convolved Representation for Plug-and-Play Neural Illumination Fields
AAAI 2024
Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views
AAAI 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
NIPS 2024
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
NIPS 2024
MambaTree: Tree Topology is All You Need in State Space Model
NIPS 2024
LLaMA Pro: Progressive LLaMA with Block Expansion
ACL 2024
Programmable Motion Generation for Open-Set Motion Control Tasks
CVPR 2024
ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis
CVPR 2024
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
CVPR 2024
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
CVPR 2024
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
CVPR 2024
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
CVPR 2024
Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs
CVPR 2024
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models
CVPR 2024
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
CVPR 2024
HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion
CVPR 2024
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing
CVPR 2024
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
CVPR 2024
ViT-Lens: Towards Omni-modal Representations
CVPR 2024
YOLO-World: Real-Time Open-Vocabulary Object Detection
CVPR 2024
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
CVPR 2024
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
CVPR 2024
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
CVPR 2024
GS-IR: 3D Gaussian Splatting for Inverse Rendering
CVPR 2024
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition
CVPR 2024
SEED-Bench: Benchmarking Multimodal Large Language Models
CVPR 2024
How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?
CVPR 2024
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
ECCV 2024
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
ECCV 2024
Texture-GS: Disentangle the Geometry and Texture for 3D Gaussian Splatting Editing
ECCV 2024
DreamDiffusion: High-Quality EEG-to-Image Generation with Temporal Masked Signal Modeling and CLIP Alignment
ECCV 2024
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
ECCV 2024
Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models
ECCV 2024
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
ECCV 2024
EA-VTR: Event-Aware Video-Text Retrieval
ECCV 2024
DMiT: Deformable Mipmapped Tri-Plane Representation for Dynamic Scenes
ECCV 2024
ST-LLM: Large Language Models Are Effective Temporal Learners
ECCV 2024
HiFi-123: Towards High-fidelity One Image to 3D Content Generation
ECCV 2024
FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling
ICLR 2024
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models
ICLR 2024
TapMo: Shape-aware Motion Generation of Skeleton-free Characters
ICLR 2024
Making LLaMA SEE and Draw with SEED Tokenizer
ICLR 2024
ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models
ICLR 2024
Masked Image Modeling with Denoising Contrast
ICLR 2023
PanoGRF: Generalizable Spherical Radiance Fields for Wide-baseline Panoramas
NIPS 2023
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
NIPS 2023
CL-NeRF: Continual Learning of Neural Radiance Fields for Evolving Scene Representation
NIPS 2023
Exploiting Contextual Objects and Relations for 3D Visual Grounding
NIPS 2023
Meta-Adapter: An Online Few-shot Learner for Vision-Language Model
NIPS 2023
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
NIPS 2023
Inserting Anybody in Diffusion Models via Celeb Basis
NIPS 2023
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval
AAAI 2023
Accelerating the Training of Video Super-resolution Models
AAAI 2023
Mitigating Artifacts in Real-World Video Super-resolution Models
AAAI 2023
Darwinian Model Upgrades: Model Evolving with Selective Compatibility
AAAI 2023
What Does Your Face Sound Like? 3D Face Shape towards Voice
AAAI 2023
DSRM: Boost Textual Adversarial Training with Distribution Shift Risk Minimization
ACL 2023
A Confidence-based Partial Label Learning Model for Crowd-Annotated Named Entity Recognition
ACL 2023
Characterizing the Impacts of Instances on Robustness
ACL 2023
On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection
ACL 2023
Accelerating Vision-Language Pretraining With Free Language Modeling
CVPR 2023
3D GAN Inversion With Facial Symmetry Prior
CVPR 2023
Generating Human Motion From Textual Descriptions With Discrete Representations
CVPR 2023
DPE: Disentanglement of Pose and Expression for General Video Portrait Editing
CVPR 2023
DropMAE: Masked Autoencoders With Spatial-Attention Dropout for Tracking Tasks
CVPR 2023
Improved Test-Time Adaptation for Domain Generalization
CVPR 2023
HRDFuse: Monocular 360deg Depth Estimation by Collaboratively Learning Holistic-With-Regional Depth Distributions
CVPR 2023
High-Fidelity Facial Avatar Reconstruction From Monocular Video With Generative Priors
CVPR 2023
All in One: Exploring Unified Video-Language Pre-Training
CVPR 2023
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
CVPR 2023
Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields
CVPR 2023
LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation
CVPR 2023
OSRT: Omnidirectional Image Super-Resolution With Distortion-Aware Transformer
CVPR 2023
Learning Anchor Transformations for 3D Garment Animation
CVPR 2023
ViLEM: Visual-Language Error Modeling for Image-Text Retrieval
CVPR 2023
RILS: Masked Visual Reconstruction in Language Semantic Space
CVPR 2023
SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes
CVPR 2023
Skinned Motion Retargeting With Residual Perception of Motion Semantics & Geometry
CVPR 2023
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models
CVPR 2023
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
ICCV 2023
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
ICCV 2023
Order-Prompted Tag Sequence Generation for Video Tagging
ICCV 2023
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing
ICCV 2023
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
ICCV 2023
Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video
ICCV 2023
HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video
ICCV 2023
OmniZoomer: Learning to Move and Zoom in on Sphere at High-Resolution
ICCV 2023
Exploring Model Transferability through the Lens of Potential Energy
ICCV 2023
$\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation
ICML 2023
DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models
ICML 2023
SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation
IJCAI 2023
Prosody Modeling with 3D Visual Information for Expressive Video Dubbing
INTERSPEECH 2023
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval
ECCV 2022
Mc-BEiT: Multi-Choice Discretization for Image BERT Pre-training
ECCV 2022
VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder
ECCV 2022
Metric Learning Based Interactive Modulation for Real-World Super-Resolution
ECCV 2022
UMT: Unified Multi-Modal Transformers for Joint Video Moment Retrieval and Highlight Detection
CVPR 2022
Temporally Efficient Vision Transformer for Video Instance Segmentation
CVPR 2022
BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild
CVPR 2022
Object-Aware Video-Language Pre-Training for Retrieval
CVPR 2022
Bridging Video-Text Retrieval With Multiple Choice Questions
CVPR 2022
DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes
NIPS 2022
Towards Universal Backward-Compatible Representation Learning
IJCAI 2022
AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos
NIPS 2022
A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion
INTERSPEECH 2022
Not All Models Are Equal: Predicting Model Transferability in a Self-Challenging Fisher Space
ECCV 2022
Dynamic Token Normalization improves Vision Transformers
ICLR 2022
Uncertainty Modeling for Out-of-Distribution Generalization
ICLR 2022
Hot-Refresh Model Upgrades with Regression-Free Compatible Training in Image Retrieval
ICLR 2022
Crossover Learning for Fast Online Video Instance Segmentation
ICCV 2021
Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution
NIPS 2021
Instances As Queries
ICCV 2021
Towards Real-World Blind Face Restoration With Generative Facial Prior
CVPR 2021
Open-Book Video Captioning With Retrieve-Copy-Generate Network
CVPR 2021
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
CVPR 2021
Towards Vivid and Diverse Image Colorization With Generative Color Prior
ICCV 2021
Detecting Interactions from Neural Networks via Topological Analysis
NIPS 2020
Feature Augmented Memory with Global Attention Network for VideoQA
IJCAI 2020
Fast Video Object Segmentation using the Global Context Module
ECCV 2020