vision transformer

1091 papers

Explore in graph

Also known as

VITE VIT CLIP-VIT VT

Co-occurring keywords

image classification (1943) semantic segmentation (3179) model compression (3283) self-supervised learning (3751) attention mechanism (3975) convolutional neural network (4216) object detection (2759) transfer learning (5442) representation learning (6174) knowledge distillation (3680)

Papers

FLatten Transformer: Vision Transformer using Focused Linear Attention ICCV 2023

Robustifying Token Attention for Vision Transformers ICCV 2023

Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit? ICCV 2023

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding ICCV 2023

Cross-Modal Orthogonal High-Rank Augmentation for RGB-Event Transformer-Trackers ICCV 2023

LaPE: Layer-adaptive Position Embedding for Vision Transformers with Independent Layer Normalization ICCV 2023

STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition ICCV 2023

Token-Label Alignment for Vision Transformers ICCV 2023

A Multidimensional Analysis of Social Biases in Vision Transformers ICCV 2023

Adaptive and Background-Aware Vision Transformer for Real-Time UAV Tracking ICCV 2023

FLIP: Cross-domain Face Anti-spoofing with Language Guidance ICCV 2023

Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking ICCV 2023

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection ICCV 2023

ParCNetV2: Oversized Kernel with Enhanced Attention ICCV 2023

Scratching Visual Transformer's Back with Uniform Attention ICCV 2023

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers ICCV 2023

UMIFormer: Mining the Correlations between Similar Tokens for Multi-View 3D Reconstruction ICCV 2023

InterFormer: Real-time Interactive Image Segmentation ICCV 2023

ASIC: Aligning Sparse in-the-wild Image Collections ICCV 2023

Revisiting Vision Transformer from the View of Path Ensemble ICCV 2023

Adaptive Frequency Filters As Efficient Global Token Mixers ICCV 2023

Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers ICCV 2023

Evaluating Data Attribution for Text-to-Image Models ICCV 2023

SG-Former: Self-guided Transformer with Evolving Token Reallocation ICCV 2023

Building Vision Transformers with Hierarchy Aware Feature Aggregation ICCV 2023