Papers
8,506 papers found
MultiModal Action Conditioned Video Simulation
Yichen Li, Antonio Torralba
Multi-Modal Few-Shot Temporal Action Segmentation
Zijia Lu, Ehsan Elhamifar
Multi-modal Identity Extraction
Ryan Webster, Teddy Furon
Multimodal Large Language Model-Guided ISP Hyperparameter Optimization with Dynamic Preference Learning
Xinyu Sun, Zhikun Zhao, Congyan Lang et al.
Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation
Shengqi Liu, Yuhao Cheng, Zhuo Chen et al.
Multimodal LLM Guided Exploration and Active Mapping using Fisher Information
Wen Jiang, Boshu Lei, Katrina Ashton et al.
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
Shijie Zhou, Ruiyi Zhang, Huaisheng Zhu et al.
Multi-modal Multi-platform Person Re-Identification: Benchmark and Method
Ruiyang Ha, Songyi Jiang, Bin Li et al.
Multi-Modal Multi-Task Unified Embedding Model (M3T-UEM): A Task-Adaptive Representation Learning Framework
Rohan Sharma, Changyou Chen, Feng-Ju Chang et al.
Multimodal Prompt Alignment for Facial Expression Recognition
Fuyan Ma, Yiran He, Bin Sun et al.
Multi-modal Segment Anything Model for Camouflaged Scene Segmentation
Guangyu Ren, Hengyan Liu, Michalis Lazarou et al.
Multi-Object Sketch Animation by Scene Decomposition and Motion Planning
Jingyu Liu, Zijie Xin, Yuhan Fu et al.
Multi-scenario Overlapping Text Segmentation with Depth Awareness
Yang Liu, Xudong Xie, Yuliang Liu et al.
Multi-Schema Proximity Network for Composed Image Retrieval
Jiangming Shi, Xiangbo Yin, Yeyun Chen et al.
Multispectral Demosaicing via Dual Cameras
SaiKiran Tedla, Junyong Lee, Beixuan Yang et al.
Multi-turn Consistent Image Editing
Zijun Zhou, Yingying Deng, Xiangyu He et al.
MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models
Young-Jun Lee, Byung-Kwan Lee, Jianshu Zhang et al.
MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance
Hallee E. Wong, Jose Javier Gonzalez Ortiz, John Guttag et al.
Multi-View 3D Point Tracking
Frano Rajič, Haofei Xu, Marko Mihajlovic et al.
Multi-view Gaze Target Estimation
Qiaomu Miao, Vivek Raju Golani, Jingyi Xu et al.
Multi-View Slot Attention Using Paraphrased Texts for Face Anti-Spoofing
Jeongmin Yu, Susang Kim, Kisu Lee et al.
MUNBa: Machine Unlearning via Nash Bargaining
Jing Wu, Mehrtash Harandi
MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion
Fei Peng, Junqiang Wu, Yan Li et al.
MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding
Rongchang Xie, Chen Du, Ping Song et al.
Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling
Xiaojie Li, Ronghui Li, Shukai Fang et al.