conftrace_

← Architectures

Deep Learning › Architectures ›

Transformers

9,294 papers

Papers per year

Papers

Dual-S3D: Hierarchical Dual-Path Selective SSM-CNN for High-Fidelity Implicit Reconstruction ICCV 2025

Lidar Waveforms are Worth 40x128x33 Words ICCV 2025

DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion ICCV 2025

MedVSR: Medical Video Super-Resolution with Cross State-Space Propagation ICCV 2025

FonTS: Text Rendering With Typography and Style Controls ICCV 2025

DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution ICCV 2025

Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection ICCV 2025

BridgeDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment ICCV 2025

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats ICCV 2025

MixA: A Mixed Attention approach with Stable Lightweight Linear Attention to enhance Efficiency of Vision Transformers at the Edge ICCV 2025

Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation ICCV 2025

Sparse Fine-Tuning of Transformers for Generative Tasks ICCV 2025

LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models ICCV 2025

Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method ICCV 2025

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy ICCV 2025

TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis ICCV 2025

SDFormer: Vision-based 3D Semantic Scene Completion via SAM-assisted Dual-channel Voxel Transformer ICCV 2025

One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory ICCV 2025

UniGS: Modeling Unitary 3D Gaussians for Novel View Synthesis from Sparse-view Images ICCV 2025

SP2T: Sparse Proxy Attention for Dual-stream Point Transformer ICCV 2025

ASCENT-ViT: Attention-based Scale-aware Concept Learning Framework for Enhanced Alignment in Vision Transformers IJCAI 2025

IterMeme: Expert-Guided Multimodal LLM for Interactive Meme Creation with Layout-Aware Generation IJCAI 2025

MonoMixer: Marrying Convolution and Vision Transformer for Efficient Self-Supervised Monocular Depth Estimation IJCAI 2025

Categorical Attention: Fine-grained Language-guided Noise Filtering Network for Occluded Person Re-Identification IJCAI 2025

Multi-Modal Point Cloud Completion with Interleaved Attention Enhanced Transformer IJCAI 2025