conftrace_

← Architectures

Deep Learning › Architectures ›

Transformers

9,294 papers

Papers per year

Papers

Structure-Aware Sparse-View X-ray 3D Reconstruction CVPR 2024

Dexterous Grasp Transformer CVPR 2024

EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models CVPR 2024

TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression CVPR 2024

Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment CVPR 2024

SRTube: Video-Language Pre-Training with Action-Centric Video Tube Features and Semantic Role Labeling CVPR 2024

BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model CVPR 2024

State Space Models for Event Cameras CVPR 2024

Resource-Efficient Transformer Pruning for Finetuning of Large Models CVPR 2024

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models CVPR 2024

GLID: Pre-training a Generalist Encoder-Decoder Vision Model CVPR 2024

When StyleGAN Meets Stable Diffusion: a W+ Adapter for Personalized Image Generation CVPR 2024

Putting the Object Back into Video Object Segmentation CVPR 2024

Adapters Strike Back CVPR 2024

Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology CVPR 2024

PredToken: Predicting Unknown Tokens and Beyond with Coarse-to-Fine Iterative Decoding CVPR 2024

Video-Based Human Pose Regression via Decoupled Space-Time Aggregation CVPR 2024

Random Entangled Tokens for Adversarially Robust Vision Transformer CVPR 2024

COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction CVPR 2024

Deformable One-shot Face Stylization via DINO Semantic Guidance CVPR 2024

Test-Time Domain Generalization for Face Anti-Spoofing CVPR 2024

Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features CVPR 2024

Do Vision and Language Encoders Represent the World Similarly? CVPR 2024

IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images CVPR 2024

Class Tokens Infusion for Weakly Supervised Semantic Segmentation CVPR 2024