conftrace_

← Models

Deep Learning › Models ›

Foundation Models

278 papers

Papers per year

5

13

23

104

117

16

Papers

(Almost) Free Modality Stitching of Foundation Models EMNLP 2025

Contra4: Evaluating Contrastive Cross-Modal Reasoning in Audio, Video, Image, and 3D EMNLP 2025

fLSA: Learning Semantic Structures in Document Collections Using Foundation Models EMNLP 2025

SEMMA: A Semantic Aware Knowledge Graph Foundation Model EMNLP 2025

SciSketch: An Open-source Framework for Automated Schematic Diagram Generation in Scientific Papers EMNLP 2025

Enhancing Foundation Models in Transaction Understanding with LLM-based Sentence Embeddings EMNLP 2025

VisualEDU: A Benchmark for Assessing Coding and Visual Comprehension through Educational Problem-Solving Video Generation EMNLP 2025

jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval EMNLP 2025

Detect Anything 3D in the Wild ICCV 2025

Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection ICCV 2025

Find Any Part in 3D ICCV 2025

SAM4D: Segment Anything in Camera and LiDAR Streams ICCV 2025

Scaling Laws for Native Multimodal Models ICCV 2025

SegAnyPET: Universal Promptable Segmentation from Positron Emission Tomography Images ICCV 2025

Enhancing Prompt Generation with Adaptive Refinement for Camouflaged Object Detection ICCV 2025

FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration ICCV 2025

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning ICCV 2025

Unified Multimodal Understanding via Byte-Pair Visual Encoding ICCV 2025

F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration ICCV 2025

SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing ICCV 2025

FE-CLIP: Frequency Enhanced CLIP Model for Zero-Shot Anomaly Detection and Segmentation ICCV 2025

RoboTron-Mani: All-in-One Multimodal Large Model for Robotic Manipulation ICCV 2025

DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation ICCV 2025

Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models? ICCV 2025

Correspondence as Video: Test-Time Adaption on SAM2 for Reference Segmentation in the Wild ICCV 2025