Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Architectures
Deep Learning
›
Architectures
›
Transformers
9294 directly classified papers
Papers per year
2011: 1
2014: 2
2015: 6
2016: 17
2017: 67
2018: 156
2019: 404
2020: 769
2021: 1217
2022: 1446
2023: 1628
2024: 1574
2025: 1647
2026: 360
Papers
Graph-Augmented Open-Domain Multi-Document Summarization
COLING 2025
Sparse Rewards Can Self-Train Dialogue Agents
ACL 2025
VCRMNER: Visual Cue Refinement in Multimodal NER using CLIP Prompts
COLING 2025
Your Scale Factors are My Weapon: Targeted Bit-Flip Attacks on Vision Transformers via Scale Factor Manipulation
CVPR 2025
VITED: Video Temporal Evidence Distillation
CVPR 2025
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
CVPR 2025
Probing Subphonemes in Morphology Models
ACL 2025
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
CVPR 2025
CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution
CVPR 2025
Streamlining the Collaborative Chain of Models into A Single Forward Pass in Generation-Based Tasks
ACL 2025
Linear Attention Modeling for Learned Image Compression
CVPR 2025
Towards Precise Scaling Laws for Video Diffusion Transformers
CVPR 2025
MATCHA: Towards Matching Anything
CVPR 2025
Don’t Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models
ACL 2025
Charm: The Missing Piece in ViT Fine-Tuning for Image Aesthetic Assessment
CVPR 2025
Caption Generation in Cultural Heritage: Crowdsourced Data and Tuning Multimodal Large Language Models
NAACL 2025
IRIS: Interpretable Retrieval-Augmented Classification for Long Interspersed Document Sequences
ACL 2025
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
ICCV 2025
AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer
CVPR 2025
Revisiting Audio-Visual Segmentation with Vision-Centric Transformer
CVPR 2025
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
CVPR 2025
Continuous 3D Perception Model with Persistent State
CVPR 2025
SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction
ACL 2025
DNF: Unconditional 4D Generation with Dictionary-based Neural Fields
CVPR 2025
MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors
CVPR 2025
<
1
…
25
26
27
…
372
>