conftrace_

← Architectures

Deep Learning › Architectures ›

Transformers

9,294 papers

Papers per year

Papers

DONUT: A Decoder-Only Model for Trajectory Prediction ICCV 2025

MultiModal Action Conditioned Video Simulation ICCV 2025

Breaking Rectangular Shackles: Cross-View Object Segmentation for Fine-Grained Object Geo-Localization ICCV 2025

Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts ICCV 2025

What If: Understanding Motion Through Sparse Interactions ICCV 2025

Auto-Regressively Generating Multi-View Consistent Images ICCV 2025

CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching ICCV 2025

MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation ICCV 2025

FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing ICCV 2025

Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing ICCV 2025

Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping ICCV 2025

Class Token as Proxy: Optimal Transport-assisted Proxy Learning for Weakly Supervised Semantic Segmentation ICCV 2025

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction ICCV 2025

Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features ICCV 2025

Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection ICCV 2025

Adaptive Prompt Learning via Gaussian Outlier Synthesis for Out-of-distribution Detection ICCV 2025

FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models ICCV 2025

VACE: All-in-One Video Creation and Editing ICCV 2025

HORT: Monocular Hand-held Objects Reconstruction with Transformers ICCV 2025

MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling ICCV 2025

SVIP: Semantically Contextualized Visual Patches for Zero-Shot Learning ICCV 2025

MamTiff-CAD: Multi-Scale Latent Diffusion with Mamba+ for Complex Parametric Sequence ICCV 2025

MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration ICCV 2025

Bridging Local Inductive Bias and Long-Range Dependencies with Pixel-Mamba for End-to-end Whole Slide Image Analysis ICCV 2025

LA-MOTR: End-to-End Multi-Object Tracking by Learnable Association ICCV 2025