conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
EssayDetect at GenAI Detection Task 2: Guardians of Academic Integrity: Multilingual Detection of AI-Generated Essays
COLING 2025
OVQA: A Dataset for Visual Question Answering and Multimodal Research in Odia Language
COLING 2025
ExMute: A Context-Enriched Multimodal Dataset for Hateful Memes
COLING 2025
Bridging Language and Scenes through Explicit 3-D Model Construction
COLING 2025
VCRMNER: Visual Cue Refinement in Multimodal NER using CLIP Prompts
COLING 2025
Experiential Semantic Information and Brain Alignment: Are Multimodal Models Better than Language Models?
CONLL 2025
An Appraisal Theoretic Approach to Modelling Affect Flow in Conversation Corpora
CONLL 2025
The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion
CVPR 2025
CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models
CVPR 2025
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
CVPR 2025
Cross-modal Causal Relation Alignment for Video Question Grounding
CVPR 2025
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
CVPR 2025
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
CVPR 2025
UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References
CVPR 2025
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
CVPR 2025
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
CVPR 2025
PhD: A ChatGPT-Prompted Visual Hallucination Evaluation Dataset
CVPR 2025
Evaluating Vision-Language Models as Evaluators in Path Planning
CVPR 2025
SnowMaster: Comprehensive Real-world Image Desnowing via MLLM with Multi-Model Feedback Optimization
CVPR 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
CVPR 2025
Exploring Timeline Control for Facial Motion Generation
CVPR 2025
Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models
CVPR 2025
GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion
CVPR 2025
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
CVPR 2025
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
CVPR 2025
<
1
…
90
91
92
…
523
>