← Architectures

Deep Learning › Architectures ›

Transformers

9294 directly classified papers

Papers per year

Papers

Seurat: From Moving Points to Depth CVPR 2025

EntitySAM: Segment Everything in Video CVPR 2025

Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention CVPR 2025

InfiniSST: Simultaneous Translation of Unbounded Speech with Large Language Model ACL 2025

APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers CVPR 2025

Dual Prompting Image Restoration with Diffusion Transformers CVPR 2025

SGC-Net: Stratified Granular Comparison Network for Open-Vocabulary HOI Detection CVPR 2025

Language Repository for Long Video Understanding ACL 2025

MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking CVPR 2025

Spiking Transformer with Spatial-Temporal Attention CVPR 2025

SATA: Spatial Autocorrelation Token Analysis for Enhancing the Robustness of Vision Transformers CVPR 2025

Smarter, Not Harder: Training-Free Adaptive Computation for Transformers ACL 2025

LaVin-DiT: Large Vision Diffusion Transformer CVPR 2025

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget CVPR 2025

SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception CVPR 2025

From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalities ACL 2025

Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection CVPR 2025

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer CVPR 2025

LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living CVPR 2025

VP-MEL: Visual Prompts Guided Multimodal Entity Linking ACL 2025

Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation CVPR 2025

Mamba-Reg: Vision Mamba Also Needs Registers CVPR 2025

Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation CVPR 2025

NBF at SemEval-2025 Task 5: Light-Burst Attention Enhanced System for Multilingual Subject Recommendation ACL 2025

EfficientCrackNet: A Lightweight Model for Crack Segmentation WACV 2025