Papers
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
Xize Cheng, Tao Jin, Rongjie Huang et al.
MixSynthFormer: A Transformer Encoder-like Structure with Mixed Synthetic Self-attention for Efficient Human Pose Estimation
Yuran Sun, Alan William Dougherty, Zhuoying Zhang et al.
MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer
Fudong Lin, Summer Crawford, Kaleb Guillot et al.
MMVP: Motion-Matrix-Based Video Prediction
Yiqi Zhong, Luming Liang, Ilya Zharkov et al.
Modality Unifying Network for Visible-Infrared Person Re-Identification
Hao Yu, Xu Cheng, Wei Peng et al.
MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions
Yunfei Liu, Lijian Lin, Fei Yu et al.
Model Calibration in Dense Classification with Adaptive Label Perturbation
Jiawei Liu, Changkun Ye, Shan Wang et al.
ModelGiF: Gradient Fields for Model Functional Distance
Jie Song, Zhengqi Xu, Sai Wu et al.
Modeling the Relative Visual Tempo for Self-supervised Skeleton-based Action Recognition
Yisheng Zhu, Hu Han, Zhengtao Yu et al.
MolGrapher: Graph-based Visual Recognition of Chemical Structures
Lucas Morin, Martin Danelljan, Maria Isabel Agea et al.
Moment Detection in Long Tutorial Videos
Ioana Croitoru, Simion-Vlad Bogolin, Samuel Albanie et al.
Monocular 3D Object Detection with Bounding Box Denoising in 3D by Perceiver
Xianpeng Liu, Ce Zheng, Kelvin B Cheng et al.
MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
Renrui Zhang, Han Qiu, Tai Wang et al.
MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection
Junkai Xu, Liang Peng, Haoran Cheng et al.
MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular Videos
Fengrui Tian, Shaoyi Du, Yueqi Duan
Monte Carlo Linear Clustering with Single-Point Supervision is Enough for Infrared Small Target Detection
Boyang Li, Yingqian Wang, Longguang Wang et al.
MoreauGrad: Sparse and Robust Interpretation of Neural Networks via Moreau Envelope
Jingwei Zhang, Farzan Farnia
MosaiQ: Quantum Generative Adversarial Networks for Image Generation on NISQ Computers
Daniel Silver, Tirthak Patel, William Cutler et al.
MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
Henghui Ding, Chang Liu, Shuting He et al.
Most Important Person-Guided Dual-Branch Cross-Patch Attention for Group Affect Recognition
Hongxia Xie, Ming-Xian Lee, Tzu-Jui Chen et al.
MOST: Multiple Object Localization with Self-Supervised Transformers for Object Discovery
Sai Saketh Rambhatla, Ishan Misra, Rama Chellappa et al.
MoTIF: Learning Motion Trajectories with Local Implicit Neural Functions for Continuous Space-Time Video Super-Resolution
Yi-Hsin Chen, Si-Cun Chen, Yi-Hsin Chen et al.
MotionBERT: A Unified Perspective on Learning Human Motion Representations
Wentao Zhu, Xiaoxuan Ma, Zhaoyang Liu et al.
MotionDeltaCNN: Sparse CNN Inference of Frame Differences in Moving Camera Videos with Spherical Buffers and Padded Convolutions
Mathias Parger, Chengcheng Tang, Thomas Neff et al.
Motion-Guided Masking for Spatiotemporal Representation Learning
David Fan, Jue Wang, Shuai Liao et al.