conftrace_

multimodal learning

4645 papers

Explore in graph

Co-occurring keywords

large language model (13587) vision-language model (2348) visual question answering (1017) video understanding (1658) multi-modal learning (1278) contrastive learning (4032) representation learning (6206) transfer learning (5449) zero-shot learning (3650) vision language model (767)

Papers

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities CVPR 2024

Prot2Text: Multimodal Protein’s Function Generation with GNNs and Transformers AAAI 2024

Well, Now We Know! Unveiling Sarcasm: Initiating and Exploring Multimodal Conversations with Reasoning AAAI 2024

ViLT-CLIP: Video and Language Tuning CLIP with Multimodal Prompt Learning and Scenario-Guided Optimization AAAI 2024

msLPCC: A Multimodal-Driven Scalable Framework for Deep LiDAR Point Cloud Compression AAAI 2024

Narrative Action Evaluation with Prompt-Guided Multimodal Interaction CVPR 2024

All in One Framework for Multimodal Re-identification in the Wild CVPR 2024

Octopi: Object Property Reasoning with Large Tactile-Language Models RSS 2024

Digital Life Project: Autonomous 3D Characters with Social Intelligence CVPR 2024

Speech ReaLLM – Real-time Speech Recognition with Multimodal Language Models by Teaching the Flow of Time INTERSPEECH 2024

MWSIS: Multimodal Weakly Supervised Instance Segmentation with 2D Box Annotations for Autonomous Driving AAAI 2024

MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval CVPR 2024

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis AAAI 2024

Exploring Curriculum Learning for Vision-Language Tasks: A Study on Small-Scale Multimodal Training CONLL 2024

Prompting Multi-Modal Image Segmentation with Semantic Grouping AAAI 2024

Data Roaming and Quality Assessment for Composed Image Retrieval AAAI 2024

Just Add ?! Pose Induced Video Transformers for Understanding Activities of Daily Living CVPR 2024

Leveraging Next-Active Objects for Context-Aware Anticipation in Egocentric Videos WACV 2024

Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora CONLL 2024

Local-Global Multi-Modal Distillation for Weakly-Supervised Temporal Video Grounding AAAI 2024

A Multimodal Large Language Model “Foresees” Objects Based on Verb Information but Not Gender CONLL 2024

MHGRL: An Effective Representation Learning Model for Electronic Health Records COLING 2024

COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning INTERSPEECH 2024

Bi-directional Adapter for Multimodal Tracking AAAI 2024

MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling AAAI 2024