conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Keywords
multimodal learning
4645 papers
Explore in graph
Co-occurring keywords
large language model
(13587)
vision-language model
(2348)
visual question answering
(1017)
video understanding
(1658)
multi-modal learning
(1278)
contrastive learning
(4032)
representation learning
(6206)
transfer learning
(5449)
zero-shot learning
(3650)
vision language model
(767)
Papers
Local-Global Multi-Modal Distillation for Weakly-Supervised Temporal Video Grounding
AAAI 2024
Bi-directional Adapter for Multimodal Tracking
AAAI 2024
Improving Audio-Visual Segmentation with Bidirectional Generation
AAAI 2024
Prompting Multi-Modal Image Segmentation with Semantic Grouping
AAAI 2024
A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis
AAAI 2024
MWSIS: Multimodal Weakly Supervised Instance Segmentation with 2D Box Annotations for Autonomous Driving
AAAI 2024
Data Roaming and Quality Assessment for Composed Image Retrieval
AAAI 2024
AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis
AAAI 2024
ViLT-CLIP: Video and Language Tuning CLIP with Multimodal Prompt Learning and Scenario-Guided Optimization
AAAI 2024
msLPCC: A Multimodal-Driven Scalable Framework for Deep LiDAR Point Cloud Compression
AAAI 2024
Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer
AAAI 2024
Mono3DVG: 3D Visual Grounding in Monocular Images
AAAI 2024
Learning Task-Aware Language-Image Representation for Class-Incremental Object Detection
AAAI 2024
Vulnerabilities of Large Language Models to Adversarial Attacks
ACL 2024
Towards Artwork Explanation in Large-scale Vision Language Models
ACL 2024
Naming, Describing, and Quantifying Visual Objects in Humans and LLMs
ACL 2024
SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark
ACL 2024
REFINESUMM: Self-Refining MLLM for Generating a Multimodal Summarization Dataset
ACL 2024
Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks
ACL 2024
Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
EMNLP 2024
Uncertainty-Guided Modal Rebalance for Hateful Memes Detection
ACL 2024
AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding
EMNLP 2024
PixT3: Pixel-based Table-To-Text Generation
ACL 2024
Foundation Model for Biomedical Graphs: Integrating Knowledge Graphs and Protein Structures to Large Language Models
ACL 2024
Boosting Textural NER with Synthetic Image and Instructive Alignment
ACL 2024
<
1
…
86
87
88
…
186
>