Papers
498 papers found
Beyond Guardrails: Advanced Safety for Large Language Models — Monolingual, Multilingual and Multimodal Frontiers
Somnath Banerjee, Rima Hazra, Animesh Mukherjee
T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering
Lei Wang, Yi Hu, Jiabang He et al.
AutoProteinEngine: A Large Language Model Driven Agent Framework for Multimodal AutoML in Protein Engineering
Yungeng Liu, Zan Chen, Yuguang Wang et al.
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Jinyi Hu, Yuan Yao, Chongyi Wang et al.
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Fanqing Meng, Jin Wang, Chuanhao Li et al.
HOH: Markerless Multimodal Human-Object-Human Handover Dataset with Large Object Count
Noah Wiederhold, Ava Megyeri, DiMaggio Paris et al.
BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation
Liyan Kang, Luyang Huang, Ningxin Peng et al.
SCITUNE: Aligning Large Language Models with Human-Curated Scientific Multimodal Instructions
Sameera Horawalavithana, Sai Munikoti, Ian Stewart et al.
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
Yi Wang, Yinan He, Yizhuo Li et al.
UMUTeam at SemEval-2024 Task 4: Multimodal Identification of Persuasive Techniques in Memes through Large Language Models
Ronghao Pan, José Antonio García-díaz, Rafael Valencia-garcía
CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models
Shangda Wu, Yashan Wang, Ruibin Yuan et al.
UMUTeam at SemEval-2024 Task 4: Multimodal Identification of Persuasive Techniques in Memes through Large Language Models
Ronghao Pan, José Antonio García-díaz, Rafael Valencia-garcía
Improving Large Molecular Language Model via Relation-aware Multimodal Collaboration
Jinyoung Park, Minseong Bae, Jeehye Na et al.
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and a Comprehensive Multimodal Dataset Towards General Medical AI
Tianbin Li, Yanzhou Su, Wei Li et al.
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models
Lei Li, Yuqi Wang, Runxin Xu et al.
Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization
Jiulong Wu, Zhengliang Shi, Shuaiqiang Wang et al.
Leveraging Generative Large Language Models with Visual Instruction and Demonstration Retrieval for Multimodal Sarcasm Detection
Binghao Tang, Boda Lin, Haolong Yan et al.
VisualCoder: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning
Cuong Le Chi, Chau Truong Vinh Hoang, Phan Nhật Huy et al.
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Zhengfeng Lai, Vasileios Saveris, Chen Chen et al.
Causal-ERC: A Multimodal Framework with Causal Prompting for Emotion Recognition in Conversations with Large Language Models
Ran Jing, Geng Tu, Yice Zhang et al.
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Sukmin Yun, Haokun Lin, Rusiru Thushara et al.
Large Language Models Know What is Key Visual Entity: An LLM-assisted Multimodal Retrieval for VQA
Pu Jian, Donglei Yu, Jiajun Zhang
Multimodal Machine Translation for Low-Resource Indic Languages: A Chain-of-Thought Approach Using Large Language Models
Pawan Rajpoot, Nagaraj Bhat, Ashish Shrivastava