Di Zhang
59 papers · 2023–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+10 more ↓ Show less ↑
🌍 Conference Polyglot (12) 🐝 Cross-Pollinator (7) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (7)
🧭
Keyword Pioneer
🌈
Renaissance Researcher
(7)
🏆
Grand Slam
👑
Triple Crown
🤝
Dynamic Duo
(20)
🔬
Deep Specialist
(11)
⚡
Prolific Year
(16)
🚀
Conference Pioneer
🗃️
Keyword Collector
(244)
💎
Century Club
(54)
Conferences
CVPR (10)
ACL (8)
EMNLP (8)
ICCV (8)
ICLR (7)
AAAI (4)
ICML (4)
COLING (3)
NIPS (3)
NAACL (2)
CORL (1)
MICCAI (1)
Top co-authors
Keywords
large language model
(11)
video generation
(10)
diffusion model
(9)
instruction tuning
(5)
reinforcement learning
(4)
multimodal large language model
(3)
text-to-video generation
(3)
preference optimization
(3)
diffusion transformer
(3)
image generation
(3)
text-to-image generation
(3)
video diffusion
(2)
benchmark evaluation
(2)
language model
(2)
temporal coherence
(2)
gaussian splatting
(2)
video captioning
(2)
instruction following
(2)
prompt engineering
(2)
video understanding
(2)
Papers
FilmWeaver: Weaving Consistent Multi-Shot Videos with Cache-Guided Autoregressive Diffusion
AAAI 2026
Boosting Resolution Generalization of Diffusion Transformers with Randomized Positional Encodings
AAAI 2026
From Detection to Understanding: Multi-Turn Reasoning for Video Misinformation Analysis
ACL 2026
Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts
ACL 2026
TIME: Temporal-Sensitive Multi-Dimensional Instruction Tuning and Robust Benchmarking for Video-LLMs
AAAI 2026
iMOVE : Instance-Motion-Aware Video Understanding
ACL 2025
Stable Segment Anything Model
ICLR 2025
KineDex: Learning Tactile-Informed Visuomotor Policies via Kinesthetic Teaching for Dexterous Manipulation
CORL 2025
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
AAAI 2025
HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
ACL 2025
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation
ACL 2025
How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach
ICCV 2025
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
ICCV 2025
GameFactory: Creating New Games with Generative Interactive Videos
ICCV 2025
Imbalance in Balance: Online Concept Balancing in Generation Models
ICCV 2025
GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation
ICCV 2025
FullDiT: Video Generative Foundation Models with Multimodal Control via Full Attention
ICCV 2025
MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion
ICCV 2025
Scene Graph Guided Generation: Enable Accurate Relations Generation in Text-to-Image Models via Textural Rectification
ICCV 2025
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
ICLR 2025
TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
ICLR 2025
Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control
ICLR 2025
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
ICLR 2025
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation
ICLR 2025
MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding
ICML 2025
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
ICML 2025
CERTAIN: Context Uncertainty-aware One-Shot Adaptation for Context-based Offline Meta Reinforcement Learning
ICML 2025
LLaMA-Berry: Pairwise Optimization for Olympiad-level Mathematical Reasoning via O1-like Monte Carlo Tree Search
NAACL 2025
Chain-of-Specificity: Enhancing Task-Specific Constraint Adherence in Large Language Models
COLING 2025
Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models
COLING 2025
SketchVideo: Sketch-based Video Generation and Editing
CVPR 2025
StyleMaster: Stylize Your Video with Artistic Generation and Translation
CVPR 2025
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content
CVPR 2025
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
CVPR 2025
PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution
CVPR 2025
GPAvatar: High-fidelity Head Avatars by Learning Efficient Gaussian Projections
CVPR 2025
Libra-Merging: Importance-redundancy and Pruning-merging Trade-off for Acceleration Plug-in in Large Vision-Language Model
CVPR 2025
Towards Precise Scaling Laws for Video Diffusion Transformers
CVPR 2025
Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation
CVPR 2025
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs
EMNLP 2025
SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin
EMNLP 2025
Biology-Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models
EMNLP 2025
Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs
COLING 2024
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors
EMNLP 2024
Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models
ACL 2024
Learning Multi-Dimensional Human Preference for Text-to-Image Generation
CVPR 2024
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
ICLR 2024
Focus On What Matters: Separated Models For Visual-Based RL Generalization
NIPS 2024
Hierarchical multiple instance learning for COPD grading with relatively specific similarity
MICCAI 2024
DialogBench: Evaluating LLMs as Human-like Dialogue Systems
NAACL 2024
Be a Multitude to Itself: A Prompt Evolution Framework for Red Teaming
EMNLP 2024
VideoTetris: Towards Compositional Text-to-Video Generation
NIPS 2024
Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint
ACL 2024
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
ICML 2024
Just Ask One More Time! Self-Agreement Improves Reasoning of Language Models in (Almost) All Scenarios
ACL 2024
Evaluating Readability and Faithfulness of Concept-based Explanations
EMNLP 2024
Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector
EMNLP 2024
Inductive-Deductive Strategy Reuse for Multi-Turn Instructional Dialogues
EMNLP 2024
How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization
NIPS 2023