conftrace_

← Architectures

Deep Learning › Architectures ›

Transformers

9,294 papers

Papers per year

Papers

Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection CVPR 2024

TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation CVPR 2024

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach CVPR 2024

Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining CVPR 2024

Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation CVPR 2024

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation CVPR 2024

MoST: Multi-Modality Scene Tokenization for Motion Prediction CVPR 2024

PIGEON: Predicting Image Geolocations CVPR 2024

Improving Spectral Snapshot Reconstruction with Spectral-Spatial Rectification CVPR 2024

HomoFormer: Homogenized Transformer for Image Shadow Removal CVPR 2024

UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and Unfavorable Sets CVPR 2024

FAR: Flexible Accurate and Robust 6DoF Relative Camera Pose Estimation CVPR 2024

MoCha-Stereo: Motif Channel Attention Network for Stereo Matching CVPR 2024

Multi-modal Learning for Geospatial Vegetation Forecasting CVPR 2024

Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields CVPR 2024

AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation CVPR 2024

OMG-Seg: Is One Model Good Enough For All Segmentation? CVPR 2024

Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis CVPR 2024

Discovering Syntactic Interaction Clues for Human-Object Interaction Detection CVPR 2024

FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation CVPR 2024

Grounding Everything: Emerging Localization Properties in Vision-Language Transformers CVPR 2024

Mean-Shift Feature Transformer CVPR 2024

EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation CVPR 2024

Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning CVPR 2024

ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions CVPR 2024