Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Architectures
Deep Learning
›
Architectures
›
Transformers
9294 directly classified papers
Papers per year
2011: 1
2014: 2
2015: 6
2016: 17
2017: 67
2018: 156
2019: 404
2020: 769
2021: 1217
2022: 1446
2023: 1628
2024: 1574
2025: 1647
2026: 360
Papers
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking
CVPR 2025
Vision-Language Embodiment for Monocular Depth Estimation
CVPR 2025
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
CVPR 2025
ArtFormer: Controllable Generation of Diverse 3D Articulated Objects
CVPR 2025
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining
CVPR 2025
ABC-Former: Auxiliary Bimodal Cross-domain Transformer with Interactive Channel Attention for White Balance
CVPR 2025
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition
CVPR 2025
Open Ad-hoc Categorization with Contextualized Feature Learning
CVPR 2025
TSP-Mamba: The Travelling Salesman Problem Meets Mamba for Image Super-resolution and Beyond
CVPR 2025
BOOTPLACE: Bootstrapped Object Placement with Detection Transformers
CVPR 2025
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
CVPR 2025
Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization
EMNLP 2025
Cluster Based Heterogeneous Federated Foundation Model Adaptation and Fine-Tuning
AAAI 2025
Sequence Accumulation and Beyond: Infinite Context Length on Single GPU and Large Clusters
AAAI 2025
Detecting Legal Citations in United Kingdom Court Judgments
EMNLP 2025
Conan-Embedding-v2: Training an LLM from Scratch for Text Embeddings
EMNLP 2025
VRoPE: Rotary Position Embedding for Video Large Language Models
EMNLP 2025
WST: Wavelet-Based Multi-scale Tuning for Visual Transfer Learning
AAAI 2025
Boosting Multi-modal Keyphrase Prediction with Dynamic Chain-of-Thought in Vision-Language Models
EMNLP 2025
UniMuMo: Unified Text, Music, and Motion Generation
AAAI 2025
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
CVPR 2025
Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization Through Spare-Coding Transformer
AAAI 2025
mmFAS: Multimodal Face Anti-Spoofing Using Multi-Level Alignment and Switch-Attention Fusion
AAAI 2025
IMAGDressing-v1: Customizable Virtual Dressing
AAAI 2025
A Generative Pre-Trained Language Model for Channel Prediction in Wireless Communications Systems
EMNLP 2025
<
1
…
52
53
54
…
372
>