Papers
MixA: A Mixed Attention approach with Stable Lightweight Linear Attention to enhance Efficiency of Vision Transformers at the Edge
Sabbir Ahmed, Jingtao Li, Weiming Zhuang et al.
MixANT: Observation-dependent Memory Propagation for Stochastic Dense Action Anticipation
Syed Talal Wasim, Hamid Suleman, Olga Zatsarynna et al.
MixA-Q: Revisiting Activation Sparsity for Vision Transformers from a Mixed-Precision Quantization Perspective
Weitian Wang, Rai Shubham, Cecilia De La Parra et al.
Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration
Katie Z Luo, Minh-Quan Dao, Zhenzhen Liu et al.
MixRI: Mixing Features of Reference Images for Novel Object Pose Estimation
Xinhang Liu, Jiawei Shi, Zheng Dang et al.
Mixture of Experts Guided by Gaussian Splatters Matters: A new Approach to Weakly-Supervised Video Anomaly Detection
Giacomo D' Amicantonio, Snehashis Majhi, Quan Kong et al.
Mixture-of-Scores: Robust Image-Text Data Valuation via Three Lines of Code
Sitong Wu, Haoru Tan, Yukang Chen et al.
MMAD: Multi-label Micro-Action Detection in Videos
Kun Li, Pengyu Liu, Dan Guo et al.
MMAIF: Multi-task and Multi-degradation All-in-One for Image Fusion with Language Guidance
Zihan Cao, Yu Zhong, Ziqi Wang et al.
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning
Tianhong Gao, Yannian Fu, Weiqun Wu et al.
mmCooper: A Multi-agent Multi-stage Communication-efficient and Collaboration-robust Cooperative Perception Framework
Bingyi Liu, Jian Teng, Hongfei Xue et al.
MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers
Yang Tian, Zheng Lu, Mingqi Gao et al.
MMGeo: Multimodal Compositional Geo-Localization for UAVs
Yuxiang Ji, Boyong He, Zhuoyue Tan et al.
MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding, Shenxi Wu, Xiangyu Zhao et al.
MMOne: Representing Multiple Modalities in One Scene
Zhifeng Gu, Bing Wang
MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI
Huanjin Yao, Jiaxing Huang, Yawen Qiu et al.
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger, Nina Wenzel, David Griffiths et al.
M-Net: MRI Brain Tumor Sequential Segmentation Network via Mesh-Cast
Jiacheng Lu, Hui Ding, Shiyu Zhang et al.
MobileIE: An Extremely Lightweight and Effective ConvNet for Real-Time Image Enhancement on Mobile Devices
Hailong Yan, Ao Li, Xiangtao Zhang et al.
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
Min Yang, Zihan Jia, Zhilin Dai et al.
Mobile Video Diffusion
Haitam Ben Yahia, Denis Korzhenkov, Ioannis Lelekas et al.
MOBIUS: Big-to-Mobile Universal Instance Segmentation via Multi-modal Bottleneck Fusion and Calibrated Decoder Pruning
Mattia Segu, Marta Tintore Gazulla, Yongqin Xian et al.
ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology
Vishwesh Ramanathan, Tony Xu, Pushpak Pati et al.
Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction
Giuseppe Cartella, Vittorio Cuculo, Alessandro D'Amelio et al.
Modeling Saliency Dataset Bias
Matthias Kümmerer, Harneet Singh Khanuja, Matthias Bethge