transformer efficiency

12 papers

Explore in graph

Co-occurring keywords

model efficiency (205) model compression (3283) efficient computing (779) sparse attention (133) token selection (25) neural network optimization (1293) inference efficiency (245) model optimization (75) attention sparsity (6) attention mechanism (3975)

Papers

Principles of Visual Tokens for Efficient Video Understanding ICCV 2025

Smarter, Not Harder: Training-Free Adaptive Computation for Transformers ACL 2025

Numerical Pruning for Efficient Autoregressive Models AAAI 2025

CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling EMNLP 2024

Fast Attention Requires Bounded Entries NIPS 2023

Linearizing Transformer with Key-Value Memory EMNLP 2022

Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection ACL 2022

Fine- and Coarse-Granularity Hybrid Self-Attention for Efficient BERT ACL 2022

Sparsifying Transformer Models with Trainable Representation Pooling ACL 2022

ClusterFormer: Neural Clustering Attention for Efficient and Effective Transformer ACL 2022

Predicting Attention Sparsity in Transformers ACL 2022

Bag of Tricks for Optimizing Transformer Efficiency EMNLP 2021