conftrace_

multimodal learning

4622 papers

Explore in graph

Co-occurring keywords

large language model (12755) vision-language model (2235) visual question answering (1000) video understanding (1647) multi-modal learning (1276) contrastive learning (3979) representation learning (6174) transfer learning (5442) zero-shot learning (3637) vision language model (752)

Papers

Large-scale Pre-training for Grounded Video Caption Generation ICCV 2025

Beyond Visual Understanding Introducing PARROT-360V for Vision Language Model Benchmarking COLING 2025

GestureHYDRA: Semantic Co-speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation ICCV 2025

Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP AAAI 2025

Retrieval-Augmented Dynamic Prompt Tuning for Incomplete Multimodal Learning AAAI 2025

Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition ICCV 2025

Utilizing Vision-Language Models for Detection of Leaf-Based Diseases in Tomatoes AAAI 2025

Multimodal Argumentative Fallacy Classification in Political Debates ACL 2025

PALI-NLP at SemEval 2025 Task 1: Multimodal Idiom Recognition and Alignment ACL 2025

Multimodal Commonsense Knowledge Distillation for Visual Question Answering (Student Abstract) AAAI 2025

Multimodal Variational Autoencoder: A Barycentric View AAAI 2025

DiffCLIP: Few-shot Language-driven Multimodal Classifier AAAI 2025

Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization AAAI 2025

SHIFT: Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models ICCV 2025

Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning AAAI 2025

DisenQ: Disentangling Q-Former for Activity-Biometrics ICCV 2025

Ambiguity-aware Multi-level Incongruity Fusion Network for Multi-Modal Sarcasm Detection COLING 2025

MEPNet: Medical Entity-Balanced Prompting Network for Brain CT Report Generation AAAI 2025

Towards Scientific Discovery with Generative AI: Progress, Opportunities, and Challenges AAAI 2025

MLAN: Language-Based Instruction Tuning Preserves and Transfers Knowledge in Multimodal Language Models ACL 2025

Multi-View Collaborative Learning Network for Speech Deepfake Detection AAAI 2025

Understanding Figurative Meaning through Explainable Visual Entailment NAACL 2025

M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving AAAI 2025

What If LLMs Can Smell: A Prototype IJCAI 2025

Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison NAACL 2025