Yuki Mitsufuji

39 papers · 2018–2026 · 11 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🌍 Conference Polyglot (10) 🏃 Academic Marathon (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (7)

🗺️ Taxonomy Completionist (71) 🌍 Conference Polyglot (10) 🏃 Academic Marathon (7) 🤝 Dynamic Duo (16) 👑 Triple Crown 🏆 Grand Slam 🔬 Deep Specialist (10) ⚡ Prolific Year (9) 💎 Century Club (37) 🗃️ Keyword Collector (149) 🔥 Unstoppable (5)

Conferences

ICLR (10) ICML (6) CVPR (4) INTERSPEECH (4) ACL (3) EMNLP (3) NIPS (3) AAAI (2) ICCV (2) IJCAI (1) NAACL (1)

Top co-authors

Yuhta Takida (16) Naoki Murata (15) Chieh-Hsin Lai (15) Takashi Shibuya (12) Hiromi Wakaki (9) Toshimitsu Uesaka (9) Wei-Hsiang Liao (8) Mengjie Zhao (6) Naoya Takahashi (6) Dongjun Kim (5)

Research topics

Privacy (1)

Keywords

diffusion model (10) multimodal learning (5) commonsense knowledge (3) generative model (3) zero-shot learning (2) image generation (2) music editing (2) text-to-image model (2) commonsense reasoning (2) novel view synthesis (2) knowledge graph (2) large language model (2) speech enhancement (2) audio source separation (2) text-to-video diffusion (2) dialogue generation (1) knowledge distillation (1) image restoration (1) personalized generation (1) attention mechanism (1)

Papers

SteerMusic: Enhanced Musical Consistency for Zero-shot Text-Guided and Personalized Music Editing AAAI 2026 Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry AAAI 2026 Weighted Point Set Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric ICLR 2025 VinaBench: Benchmark for Faithful and Consistent Visual Narratives CVPR 2025 Classifier-Free Guidance Inside the Attraction Basin May Cause Memorization CVPR 2025 MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis CVPR 2025 DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning EMNLP 2025 CARE: Multilingual Human Preference Learning for Cultural Awareness EMNLP 2025 Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models ICCV 2025 TITAN-Guide: Taming Inference-Time Alignment for Guided Text-to-Video Diffusion Models ICCV 2025 SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation ICLR 2025 MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation ICLR 2025 HERO: Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning ICLR 2025 Jump Your Steps: Optimizing Sampling Schedule of Discrete Diffusion Models ICLR 2025 Mining your own secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models ICLR 2025 Distillation of Discrete Diffusion through Dimensional Correlations ICML 2025 Supervised Contrastive Learning from Weakly-Labeled Audio Segments for Musical Version Matching ICML 2025 VCT: Training Consistency Models with Variational Noise Coupling ICML 2025 Cross-Modal Learning for Music-to-Music-Video Description Generation NAACL 2025 MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models IJCAI 2024 PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher NIPS 2024 SilentCipher: Deep Audio Watermarking INTERSPEECH 2024 SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer ICLR 2024 GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping NIPS 2024 DiffuCOMET: Contextual Commonsense Knowledge Diffusion ACL 2024 On the Language Encoder of Contrastive Cross-modal Models ACL 2024 Manifold Preserving Guided Diffusion ICLR 2024 Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion ICLR 2024 STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events NIPS 2023 PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives ACL 2023 Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement INTERSPEECH 2023 FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation ICML 2023 GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration ICML 2023 CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos ICLR 2023 SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization ICML 2022 ComFact: A Benchmark for Linking Contextual Commonsense Knowledge EMNLP 2022 Densely Connected Multi-Dilated Convolutional Networks for Dense Prediction Tasks CVPR 2021 Recursive Speech Separation for Unknown Number of Speakers INTERSPEECH 2019 PhaseNet: Discretized Phase Modeling with Deep Neural Networks for Audio Source Separation INTERSPEECH 2018