Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Architectures
Deep Learning
›
Architectures
›
Transformers
9294 directly classified papers
Papers per year
2011: 1
2014: 2
2015: 6
2016: 17
2017: 67
2018: 156
2019: 404
2020: 769
2021: 1217
2022: 1446
2023: 1628
2024: 1574
2025: 1647
2026: 360
Papers
Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration
ICCV 2025
CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models
ICCV 2025
VSSD: Vision Mamba with Non-Causal State Space Duality
ICCV 2025
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
ICCV 2025
Context Guided Transformer Entropy Modeling for Video Compression
ICCV 2025
Deeply Supervised Flow-Based Generative Models
ICCV 2025
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
ICCV 2025
OminiControl: Minimal and Universal Control for Diffusion Transformer
ICCV 2025
AIComposer: Any Style and Content Image Composition via Feature Integration
ICCV 2025
Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting
ICCV 2025
UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation
ICCV 2025
LVFace: Progressive Cluster Optimization for Large Vision Models in Face Recognition
ICCV 2025
CE-FAM: Concept-Based Explanation via Fusion of Activation Maps
ICCV 2025
Robust Adverse Weather Removal via Spectral-based Spatial Grouping
ICCV 2025
CVPT: Cross Visual Prompt Tuning
ICCV 2025
COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation
ICCV 2025
SPE Attention: Making Attention Equivariant to Semantic-Preserving Permutation for Code Processing
EMNLP 2025
Mitigating Attention Localization in Small Scale: Self-Attention Refinement via One-step Belief Propagation
EMNLP 2025
ARXSA: A General Negative Feedback Control Theory in Vision-Language Models
EMNLP 2025
Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval
ICCV 2025
Dense Policy: Bidirectional Autoregressive Learning of Actions
ICCV 2025
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
CVPR 2025
DONUT: A Decoder-Only Model for Trajectory Prediction
ICCV 2025
Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models
ICCV 2025
Accelerating Diffusion Transformer via Gradient-Optimized Cache
ICCV 2025
<
1
…
24
25
26
…
372
>