Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Architectures
Deep Learning
›
Architectures
›
Transformers
9294 directly classified papers
Papers per year
2011: 1
2014: 2
2015: 6
2016: 17
2017: 67
2018: 156
2019: 404
2020: 769
2021: 1217
2022: 1446
2023: 1628
2024: 1574
2025: 1647
2026: 360
Papers
Seurat: From Moving Points to Depth
CVPR 2025
EntitySAM: Segment Everything in Video
CVPR 2025
Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
CVPR 2025
InfiniSST: Simultaneous Translation of Unbounded Speech with Large Language Model
ACL 2025
APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers
CVPR 2025
Dual Prompting Image Restoration with Diffusion Transformers
CVPR 2025
SGC-Net: Stratified Granular Comparison Network for Open-Vocabulary HOI Detection
CVPR 2025
Language Repository for Long Video Understanding
ACL 2025
MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking
CVPR 2025
Spiking Transformer with Spatial-Temporal Attention
CVPR 2025
SATA: Spatial Autocorrelation Token Analysis for Enhancing the Robustness of Vision Transformers
CVPR 2025
Smarter, Not Harder: Training-Free Adaptive Computation for Transformers
ACL 2025
LaVin-DiT: Large Vision Diffusion Transformer
CVPR 2025
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
CVPR 2025
SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception
CVPR 2025
From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalities
ACL 2025
Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection
CVPR 2025
3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer
CVPR 2025
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living
CVPR 2025
VP-MEL: Visual Prompts Guided Multimodal Entity Linking
ACL 2025
Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation
CVPR 2025
Mamba-Reg: Vision Mamba Also Needs Registers
CVPR 2025
Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation
CVPR 2025
NBF at SemEval-2025 Task 5: Light-Burst Attention Enhanced System for Multilingual Subject Recommendation
ACL 2025
EfficientCrackNet: A Lightweight Model for Crack Segmentation
WACV 2025
<
1
…
39
40
41
…
372
>