Papers
498 papers found
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
Bin Wang, Chunyu Xie, Dawei Leng et al.
A Large-Scale Chinese Multimodal NER Dataset with Speech Clues
Dianbo Sui, Zhengkun Tian, Yubo Chen et al.
Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models
Weihong Zhong, Xiaocheng Feng, Liang Zhao et al.
Zhoumou at SemEval-2025 Task 1: Leveraging Multimodal Data Augmentation and Large Language Models for Enhanced Idiom Understanding
Yingzhou Zhao, Bowen Guan, Liang Yang et al.
JNLP at SemEval-2025 Task 1: Multimodal Idiomaticity Representation with Large Language Models
Blake Matheny, Phuong Minh Nguyen, Minh Le Nguyen
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
Fengxiang Wang, Hongzhen Wang, Zonghao Guo et al.
Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation
Anastasia Kritharoula, Maria Lymperaiou, Giorgos Stamou
Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models
Hongzhan Lin, Ziyang Luo, Jing Ma et al.
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning
Tianhong Gao, Yannian Fu, Weiqun Wu et al.
Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning
Zeyu Xi, Haoying Sun, Yaofei Wu et al.
TerraMind: Large-Scale Generative Multimodality for Earth Observation
Johannes Jakubik, Felix Yang, Benedikt Blumenstiel et al.
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
Zehan Wang, Ziang Zhang, Minjie Hong et al.
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
Zhenyu Pan, Haozheng Luo, Manling Li et al.
A Large-Scale Chinese Multimodal NER Dataset with Speech Clues
Dianbo Sui, Zhengkun Tian, Yubo Chen et al.
MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning
Fuxiao Liu, Xiaoyang Wang, Wenlin Yao et al.
Detect, Disambiguate, and Translate: On-Demand Visual Reasoning for Multimodal Machine Translation with Large Vision-Language Models
Danyang Liu, Fanjie Kong, Xiaohang Sun et al.
Zhoumou at SemEval-2025 Task 1: Leveraging Multimodal Data Augmentation and Large Language Models for Enhanced Idiom Understanding
Yingzhou Zhao, Bowen Guan, Liang Yang et al.
JNLP at SemEval-2025 Task 1: Multimodal Idiomaticity Representation with Large Language Models
Blake Matheny, Phuong Minh Nguyen, Minh Le Nguyen
MEVA: A Large-Scale Multiview, Multimodal Video Dataset for Activity Detection
Kellie Corona, Katie Osterdahl, Roderic Collins et al.
Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark
Yu Wu, Ke Shu, Jonas Fischer et al.
TraveLLaMA: A Multimodal Travel Assistant with Large-Scale Dataset and Structured Reasoning
Meng Chu, Yukang Chen, Haokun Gui et al.
From Dialogue to Destination: Geography-Aware Large Language Models with Multimodal Fusion for Conversational Recommendation
Yeming Li, Chenxi Liu, Jie Zou et al.
FAM: Fine-Grained Alignment Matters in Multimodal Embedding Learning with Large Vision-Language Models
Tianhang Xiang, Yirui Li, Lizhao Liu et al.
Multimodal and Multilingual Embeddings for Large-Scale Speech Mining
Paul-Ambroise Duquenne, Hongyu Gong, Holger Schwenk
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models
Wenxuan Zhang, Mahani Aljunied, Chang Gao et al.