Papers
352 papers found
Detect, Disambiguate, and Translate: On-Demand Visual Reasoning for Multimodal Machine Translation with Large Vision-Language Models
Danyang Liu, Fanjie Kong, Xiaohang Sun et al.
FAM: Fine-Grained Alignment Matters in Multimodal Embedding Learning with Large Vision-Language Models
Tianhang Xiang, Yirui Li, Lizhao Liu et al.
Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Jianzong Wu, Xiangtai Li, Chenyang Si et al.
LIFTED: Multimodal Clinical Trial Outcome Prediction via Large Language Models and Mixture-of-Experts
Wenhao Zheng, Liaoyaqi Wang, Dongshen Peng et al.
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model
Mingni Tang, Jiajia Li, Lu Yang et al.
AutoProteinEngine: A Large Language Model Driven Agent Framework for Multimodal AutoML in Protein Engineering
Yungeng Liu, Zan Chen, Yuguang Wang et al.
MIND: Multimodal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding
Baixuan Xu, Weiqi Wang, Haochen Shi et al.
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Peng Xia, Siwei Han, Shi Qiu et al.
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and a Comprehensive Multimodal Dataset Towards General Medical AI
Tianbin Li, Yanzhou Su, Wei Li et al.
CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models
Shangda Wu, Yashan Wang, Ruibin Yuan et al.
Improving Large Molecular Language Model via Relation-aware Multimodal Collaboration
Jinyoung Park, Minseong Bae, Jeehye Na et al.
SCITUNE: Aligning Large Language Models with Human-Curated Scientific Multimodal Instructions
Sameera Horawalavithana, Sai Munikoti, Ian Stewart et al.
AdaptMerge: Inference Time Adaptive Visual and Language-Guided Token Merging for Efficient Large Multimodal Models
Zahidul Islam, Mrigank Rochan
T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering
Lei Wang, Yi Hu, Jiabang He et al.
Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization
Jiulong Wu, Zhengliang Shi, Shuaiqiang Wang et al.
M5 – A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks
Florian Schneider, Sunayana Sitaram
UMUTeam at SemEval-2024 Task 4: Multimodal Identification of Persuasive Techniques in Memes through Large Language Models
Ronghao Pan, José Antonio García-díaz, Rafael Valencia-garcía
UMUTeam at SemEval-2024 Task 4: Multimodal Identification of Persuasive Techniques in Memes through Large Language Models
Ronghao Pan, José Antonio García-díaz, Rafael Valencia-garcía
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Jinyi Hu, Yuan Yao, Chongyi Wang et al.
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Fanqing Meng, Jin Wang, Chuanhao Li et al.
Leveraging Generative Large Language Models with Visual Instruction and Demonstration Retrieval for Multimodal Sarcasm Detection
Binghao Tang, Boda Lin, Haolong Yan et al.
VisualCoder: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning
Cuong Le Chi, Chau Truong Vinh Hoang, Phan Nhật Huy et al.
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models
Lei Li, Yuqi Wang, Runxin Xu et al.
Causal-ERC: A Multimodal Framework with Causal Prompting for Emotion Recognition in Conversations with Large Language Models
Ran Jing, Geng Tu, Yice Zhang et al.