Artificial Intelligence › Core AI ›

Efficient Computing

596 directly classified papers

Papers per year

Papers

Sustainability of Data Center Digital Twins with Reinforcement Learning AAAI 2024

PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference ACL 2024

ETAS: Zero-Shot Transformer Architecture Search via Network Trainability and Expressivity ACL 2024

Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning ACL 2024

A Comprehensive Evaluation of Quantization Strategies for Large Language Models ACL 2024

Cache Me if You Can: Accelerating Diffusion Models through Block Caching CVPR 2024

Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers CVPR 2024

Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization EMNLP 2024

Context-Aware Assistant Selection for Improved Inference Acceleration with Large Language Models EMNLP 2024

Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models EMNLP 2024

Make Some Noise: Unlocking Language Model Parallel Inference Capability through Noisy Training EMNLP 2024

Turn Waste into Worth: Rectifying Top-k Router of MoE EMNLP 2024

InfiniPot: Infinite Context Processing on Memory-Constrained LLMs EMNLP 2024

CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification EMNLP 2024

AMPO: Automatic Multi-Branched Prompt Optimization EMNLP 2024

RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference EMNLP 2024

LongHeads: Multi-Head Attention is Secretly a Long Context Processor EMNLP 2024

MobileQuant: Mobile-friendly Quantization for On-device Language Models EMNLP 2024

Change Is the Only Constant: Dynamic LLM Slicing based on Layer Redundancy EMNLP 2024

Style-Compress: An LLM-Based Prompt Compression Framework Considering Task-Specific Styles EMNLP 2024

Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs EMNLP 2024

Universal Physics Transformers: A Framework For Efficiently Scaling Neural Operators NIPS 2024

EnOF-SNN: Training Accurate Spiking Neural Networks via Enhancing the Output Feature NIPS 2024

Efficient LLM Scheduling by Learning to Rank NIPS 2024

Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling NIPS 2024