conftrace_

multimodal learning

4645 papers

Explore in graph

Co-occurring keywords

large language model (13587) vision-language model (2348) visual question answering (1017) video understanding (1658) multi-modal learning (1278) contrastive learning (4032) representation learning (6206) transfer learning (5449) zero-shot learning (3650) vision language model (767)

Papers

Lecture Presentations Multimodal Dataset: Towards Understanding Multimodality in Educational Videos ICCV 2023

Audio-Visual Class-Incremental Learning ICCV 2023

LIMITR: Leveraging Local Information for Medical Image-Text Representation ICCV 2023

Multi3DRefer: Grounding Text Description to Multiple 3D Objects ICCV 2023

Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment ICCV 2023

Multimodal High-order Relation Transformer for Scene Boundary Detection ICCV 2023

Unmasked Teacher: Towards Training-Efficient Video Foundation Models ICCV 2023

HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation ICCV 2023

Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images ICCV 2023

Do DALL-E and Flamingo Understand Each Other? ICCV 2023

Multimodal Distillation for Egocentric Action Recognition ICCV 2023

Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation ICCV 2023

Controllable Visual-Tactile Synthesis ICCV 2023

Discovering Spatio-Temporal Rationales for Video Question Answering ICCV 2023

Distribution-Consistent Modal Recovering for Incomplete Multimodal Learning ICCV 2023

CTP:Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation ICCV 2023

CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No ICCV 2023

Attentive Mask CLIP ICCV 2023

CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos ICCV 2023

Video Background Music Generation: Dataset, Method and Evaluation ICCV 2023

Generative Action Description Prompts for Skeleton-based Action Recognition ICCV 2023

Cross-Domain Product Representation Learning for Rich-Content E-Commerce ICCV 2023

PIDRo: Parallel Isomeric Attention with Dynamic Routing for Text-Video Retrieval ICCV 2023

Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation ICCV 2023

ReactioNet: Learning High-Order Facial Behavior from Universal Stimulus-Reaction by Dyadic Relation Reasoning ICCV 2023