Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Architectures
Deep Learning
›
Architectures
›
Transformers
9294 directly classified papers
Papers per year
2011: 1
2014: 2
2015: 6
2016: 17
2017: 67
2018: 156
2019: 404
2020: 769
2021: 1217
2022: 1446
2023: 1628
2024: 1574
2025: 1647
2026: 360
Papers
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
CVPR 2025
Learning Visual Generative Priors without Text
CVPR 2025
EntitySAM: Segment Everything in Video
CVPR 2025
Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
CVPR 2025
AudioGenX: Explainability on Text-to-Audio Generative Models
AAAI 2025
Video Language Model Pretraining with Spatio-temporal Masking
CVPR 2025
Analytical-Chemistry-Informed Transformer for Infrared Spectra Modeling
AAAI 2025
CiTrus: Squeezing Extra Performance out of Low-data Bio-signal Transfer Learning
AAAI 2025
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
CVPR 2025
Super-Class Guided Transformer for Zero-Shot Attribute Classification
AAAI 2025
On the Power of Convolution-Augmented Transformer
AAAI 2025
Modeling All Response Surfaces in One for Conditional Search Spaces
AAAI 2025
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
CVPR 2025
ERUPT: Efficient Rendering with Unposed Patch Transformer
CVPR 2025
AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion
CVPR 2025
ManiVideo: Generating Hand-Object Manipulation Video with Dexterous and Generalizable Grasping
CVPR 2025
SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens
CVPR 2025
SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining
CVPR 2025
Occlusion-aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition
CVPR 2025
VolFormer: Explore More Comprehensive Cube Interaction for Hyperspectral Image Restoration and Beyond
CVPR 2025
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
CVPR 2025
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
CVPR 2025
Star with Bilinear Mapping
CVPR 2025
HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views
CVPR 2025
Conan-Embedding-v2: Training an LLM from Scratch for Text Embeddings
EMNLP 2025
<
1
…
50
51
52
…
372
>