conftrace_

← Architectures

Deep Learning › Architectures ›

Transformers

9,294 papers

Papers per year

Papers

Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages NIPS 2024

Selective Attention: Enhancing Transformer through Principled Context Control NIPS 2024

Slot State Space Models NIPS 2024

A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation NIPS 2024

Peri-midFormer: Periodic Pyramid Transformer for Time Series Analysis NIPS 2024

QKFormer: Hierarchical Spiking Transformer using Q-K Attention NIPS 2024

Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks NIPS 2024

AROMA: Preserving Spatial Structure for Latent PDE Modeling with Local Neural Fields NIPS 2024

$SE(3)$ Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation NIPS 2024

Pseudo-Siamese Blind-spot Transformers for Self-Supervised Real-World Denoising NIPS 2024

Video Token Merging for Long Video Understanding NIPS 2024

Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or Memorizing NIPS 2024

Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning NIPS 2024

On the Role of Attention Masks and LayerNorm in Transformers NIPS 2024

No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations NIPS 2024

From Unstructured Data to In-Context Learning: Exploring What Tasks Can Be Learned and When NIPS 2024

Loki: Low-rank Keys for Efficient Sparse Attention NIPS 2024

Supra-Laplacian Encoding for Transformer on Dynamic Graphs NIPS 2024

Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model NIPS 2024

Multiview Scene Graph NIPS 2024

Addressing Spatial-Temporal Heterogeneity: General Mixed Time Series Analysis via Latent Continuity Recovery and Alignment NIPS 2024

In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization NIPS 2024

LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate NIPS 2024

MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models NIPS 2024

Learning and Transferring Sparse Contextual Bigrams with Linear Transformers NIPS 2024