Co-occurring keywords
Papers
cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers
NIPS 2024
CVcoders on Semeval-2024 Task 4
NAACL 2024
VIEWS: Entity-Aware News Video Captioning
EMNLP 2024
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
NIPS 2024
Multimodal Clickbait Detection by De-confounding Biases Using Causal Representation Inference
EMNLP 2024