← Architectures

Deep Learning › Architectures ›

Transformers

9294 directly classified papers

Papers per year

Papers

GMT: Guided Mask Transformer for Leaf Instance Segmentation WACV 2025

Exploration Lab IITK at SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection ACL 2025

SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders ICCV 2025

CuMPerLay: Learning Cubical Multiparameter Persistence Vectorizations ICCV 2025

Bandit Based Attention Mechanism in Vision Transformers WACV 2025

DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution ICCV 2025

Jamo-Level Subword Tokenization in Low-Resource Korean Machine Translation NAACL 2025

OuroMamba: A Data-Free Quantization Framework for Vision Mamba ICCV 2025

LLM-assisted Entropy-based Adaptive Distillation for Unsupervised Fine-grained Visual Representation Learning ICCV 2025

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats ICCV 2025

MAGMA: Manifold Regularization for MAEs WACV 2025

F2former: When Fractional Fourier Meets Deep Wiener Deconvolution and Selective Frequency Transformer for Image Deblurring WACV 2025

Transferable-Guided Attention is All You Need for Video Domain Adaptation WACV 2025

Multilingual State Space Models for Structured Question Answering in Indic Languages NAACL 2025

FreEformer: Frequency Enhanced Transformer for Multivariate Time Series Forecasting IJCAI 2025

DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion ICCV 2025

MixA: A Mixed Attention approach with Stable Lightweight Linear Attention to enhance Efficiency of Vision Transformers at the Edge ICCV 2025

Point Cloud Self-supervised Learning via 3D to Multi-view Masked Learner ICCV 2025

Frequency-Dynamic Attention Modulation For Dense Prediction ICCV 2025

Dual-S3D: Hierarchical Dual-Path Selective SSM-CNN for High-Fidelity Implicit Reconstruction ICCV 2025

Target Bias Is All You Need: Zero-Shot Debiasing of Vision-Language Models with Bias Corpus ICCV 2025

STLSP: Integrating Structure and Text with Large Language Models for Link Sign Prediction of Networks IJCAI 2025

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation ICCV 2025

Lidar Waveforms are Worth 40x128x33 Words ICCV 2025

Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method ICCV 2025