Co-occurring keywords
Papers
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
INTERSPEECH 2024
Leveraging Language Model Capabilities for Sound Event Detection
INTERSPEECH 2024
2DP-2MRC: 2-Dimensional Pointer-based Machine Reading Comprehension Method for Multimodal Moment Retrieval
INTERSPEECH 2024
Pandora's Box: Towards Building Universal Attackers against Real-World Large Vision-Language Models
NIPS 2024
Born a BabyNet with Hierarchical Parental Supervision for End-to-End Text Image Machine Translation
COLING 2024
Bridging Textual and Tabular Worlds for Fact Verification: A Lightweight, Attention-Based Model
COLING 2024
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
NIPS 2024