← Architectures

Deep Learning › Architectures ›

Transformers

9294 directly classified papers

Papers per year

Papers

MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization CVPR 2025

Learning Visual Generative Priors without Text CVPR 2025

EntitySAM: Segment Everything in Video CVPR 2025

Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention CVPR 2025

AudioGenX: Explainability on Text-to-Audio Generative Models AAAI 2025

Video Language Model Pretraining with Spatio-temporal Masking CVPR 2025

Analytical-Chemistry-Informed Transformer for Infrared Spectra Modeling AAAI 2025

CiTrus: Squeezing Extra Performance out of Low-data Bio-signal Transfer Learning AAAI 2025

SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning CVPR 2025

Super-Class Guided Transformer for Zero-Shot Attribute Classification AAAI 2025

On the Power of Convolution-Augmented Transformer AAAI 2025

Modeling All Response Surfaces in One for Conditional Search Spaces AAAI 2025

Hypergraph Vision Transformers: Images are More than Nodes, More than Edges CVPR 2025

ERUPT: Efficient Rendering with Unposed Patch Transformer CVPR 2025

AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion CVPR 2025

ManiVideo: Generating Hand-Object Manipulation Video with Dexterous and Generalizable Grasping CVPR 2025

SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens CVPR 2025

SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining CVPR 2025

Occlusion-aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition CVPR 2025

VolFormer: Explore More Comprehensive Cube Interaction for Hyperspectral Image Restoration and Beyond CVPR 2025

Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation CVPR 2025

Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better CVPR 2025

Star with Bilinear Mapping CVPR 2025

HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views CVPR 2025

Conan-Embedding-v2: Training an LLM from Scratch for Text Embeddings EMNLP 2025