Papers
18,421 papers found
MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception
Wenzhuo Liu, Wenshuo Wang, Yicheng Qiao et al.
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Yilun Zhao, Haowei Zhang, Lujing Xie et al.
MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots
Tianchen Deng, Guole Shen, Chen Xun et al.
MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data
Zifan Wang, Ziqing Chen, Junyu Chen et al.
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Haoyang He, Jiangning Zhang, Yuxuan Cai et al.
MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices
Jianwen Jiang, Gaojie Lin, Zhengkun Rong et al.
MODA: Motion-Drift Augmentation for Inertial Human Motion Analysis
Yinghao Wu, Shihui Guo, Yipeng Qin
MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting
Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong et al.
Model Diagnosis and Correction via Linguistic and Implicit Attribute Editing
Xuanbai Chen, Xiang Xu, Zhihua Li et al.
Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks
Wei-Jin Huang, Yuan-Ming Li, Zhi-Wei Xia et al.
Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification
Jiayu Jiang, Changxing Ding, Wentao Tan et al.
Model Poisoning Attacks to Federated Learning via Multi-Round Consistency
Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong
ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling
Zikang Zhou, Hengjian Zhou, Haibo Hu et al.
MODfinity: Unsupervised Domain Adaptation with Multimodal Information Flow Intertwining
Shanglin Liu, Jianming Lv, Jingdan Kang et al.
MoEdit: On Learning Quantity Perception for Multi-object Image Editing
Yanfeng Li, Kahou Chan, Yue Sun et al.
MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation
Huaize Liu, Wenzhang Sun, Donglin Di et al.
MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation based Distillation
Yuxiang Fu, Qi Yan, Lele Wang et al.
MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Ruicheng Wang, Sicheng Xu, Cassie Dai et al.
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
Matt Deitke, Christopher Clark, Sangho Lee et al.
MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation
Zhenyu Wu, Yuheng Zhou, Xiuwei Xu et al.
Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion
Songsong Yu, Yuxin Chen, Zhongang Qi et al.
Mono3DVLT: Monocular-Video-Based 3D Visual Language Tracking
Hongkai Wei, Yang Yang, Shijie Sun et al.
Monocular and Generalizable Gaussian Talking Head Animation
Shengjie Gong, Haojie Li, Jiapeng Tang et al.
MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors
Fanqi Pu, Yifan Wang, Jiru Deng et al.
MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction
Wenyuan Zhang, Yixiao Yang, Han Huang et al.