Artificial Intelligence › Core AI ›

Foundation Models

4845 directly classified papers

Papers per year

Papers

UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models WACV 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models EACL 2026

PatchEAD: Unifying Industrial Visual Prompting Frameworks for Patch-Exclusive Anomaly Detection WACV 2026

See, Think, Learn: A Self-Taught Multimodal Reasoner WACV 2026

T2-RAGBench: Text-and-Table Benchmark for Evaluating Retrieval-Augmented Generation EACL 2026

BigTokDetect: A Clinically-Informed Vision–Language Modeling Framework for Detecting Pro-Bigorexia Videos on TikTok EACL 2026

OPFormer: Object Pose Estimation Leveraging Foundation Model with Geometric Encoding WACV 2026

Zero-Shot Domain Generalisation via Prompt-Driven Feature Refinement WACV 2026

One-Shot Fine-Grained Re-Identification of Paint Marked Honey Bees using Vision Foundation Models WACV 2026

Zero-shot Hierarchical Plant Segmentation via Foundation Segmentation Models and Text-to-image Attention WACV 2026

PVeRA: Probabilistic Vector-Based Random Matrix Adaptation WACV 2026

Beyond Sample-Level Feedback: Using Reference-Level Feedback to Guide Data Synthesis EACL 2026

The Pragmatic Mind of Machines: Tracing the Emergence of Pragmatic Competence in Large Language Models EACL 2026

Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning EACL 2026

MAFM3: Modular Adaptation of Foundation Models for Multi-Modal Medical AI WACV 2026

HistoMILKD: A Multiple Instance Learning based Multi-Teacher Knowledge Distillation Framework for Whole Slide Image Classification WACV 2026

Grounding Degradations in Natural Language for All-In-One Video Restoration WACV 2026

Geo3DVQA: Evaluating Vision-Language Models for 3D Geospatial Reasoning from Aerial Imagery WACV 2026

T2VWorldBench: A Benchmark for Evaluating World Knowledge in Text-to-Video Generation WACV 2026

ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos WACV 2026

Do Generative Video Models Understand Physical Principles? WACV 2026

Enhanced Back-Projection of Vision Features for 3D Symmetry Detection WACV 2026

ZonUI-3B: Competitive GUI Grounding with a 3B VLM Trained on a Single Consumer GPU WACV 2026

OW-Rep: Open World Object Detection with Instance Representation Learning WACV 2026

Adaptive LLM-Symbolic Reasoning via Dynamic Logical Solver Composition EACL 2026