Yuki Mitsufuji
39 papers · 2018–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+11 more ↓ Show less ↑
π Conference Polyglot (10) π Academic Marathon (7) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (7)
πΊοΈ
Taxonomy Completionist
(71)
π
Conference Polyglot
(10)
π
Academic Marathon
(7)
π€
Dynamic Duo
(16)
π
Triple Crown
π
Grand Slam
π¬
Deep Specialist
(10)
β‘
Prolific Year
(9)
π
Century Club
(37)
ποΈ
Keyword Collector
(149)
π₯
Unstoppable
(5)
Conferences
ICLR (10)
ICML (6)
CVPR (4)
INTERSPEECH (4)
ACL (3)
EMNLP (3)
NIPS (3)
AAAI (2)
ICCV (2)
IJCAI (1)
NAACL (1)
Top co-authors
Research topics
Keywords
diffusion model
(10)
multimodal learning
(5)
commonsense knowledge
(3)
generative model
(3)
zero-shot learning
(2)
image generation
(2)
music editing
(2)
text-to-image model
(2)
commonsense reasoning
(2)
novel view synthesis
(2)
knowledge graph
(2)
large language model
(2)
speech enhancement
(2)
audio source separation
(2)
text-to-video diffusion
(2)
dialogue generation
(1)
knowledge distillation
(1)
image restoration
(1)
personalized generation
(1)
attention mechanism
(1)
Papers
SteerMusic: Enhanced Musical Consistency for Zero-shot Text-Guided and Personalized Music Editing
AAAI 2026
Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry
AAAI 2026
Weighted Point Set Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric
ICLR 2025
VinaBench: Benchmark for Faithful and Consistent Visual Narratives
CVPR 2025
Classifier-Free Guidance Inside the Attraction Basin May Cause Memorization
CVPR 2025
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
CVPR 2025
DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning
EMNLP 2025
CARE: Multilingual Human Preference Learning for Cultural Awareness
EMNLP 2025
Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models
ICCV 2025
TITAN-Guide: Taming Inference-Time Alignment for Guided Text-to-Video Diffusion Models
ICCV 2025
SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation
ICLR 2025
MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation
ICLR 2025
HERO: Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning
ICLR 2025
Jump Your Steps: Optimizing Sampling Schedule of Discrete Diffusion Models
ICLR 2025
Mining your own secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
ICLR 2025
Distillation of Discrete Diffusion through Dimensional Correlations
ICML 2025
Supervised Contrastive Learning from Weakly-Labeled Audio Segments for Musical Version Matching
ICML 2025
VCT: Training Consistency Models with Variational Noise Coupling
ICML 2025
Cross-Modal Learning for Music-to-Music-Video Description Generation
NAACL 2025
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
IJCAI 2024
PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher
NIPS 2024
SilentCipher: Deep Audio Watermarking
INTERSPEECH 2024
SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer
ICLR 2024
GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping
NIPS 2024
DiffuCOMET: Contextual Commonsense Knowledge Diffusion
ACL 2024
On the Language Encoder of Contrastive Cross-modal Models
ACL 2024
Manifold Preserving Guided Diffusion
ICLR 2024
Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion
ICLR 2024
STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
NIPS 2023
PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives
ACL 2023
Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement
INTERSPEECH 2023
FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation
ICML 2023
GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration
ICML 2023
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
ICLR 2023
SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization
ICML 2022
ComFact: A Benchmark for Linking Contextual Commonsense Knowledge
EMNLP 2022
Densely Connected Multi-Dilated Convolutional Networks for Dense Prediction Tasks
CVPR 2021
Recursive Speech Separation for Unknown Number of Speakers
INTERSPEECH 2019
PhaseNet: Discretized Phase Modeling with Deep Neural Networks for Audio Source Separation
INTERSPEECH 2018