conftrace_

← Application Areas

Machine Learning › Application Areas ›

Efficient Computing

6,876 papers

Papers per year

Papers

Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference ACL 2024

Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation ACL 2024

RelayAttention for Efficient Large Language Model Serving with Long System Prompts ACL 2024

Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once? ACL 2024

Towards Robust and Generalized Parameter-Efficient Fine-Tuning for Noisy Label Learning ACL 2024

SparseFlow: Accelerating Transformers by Sparsifying Information Flows ACL 2024

Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models ACL 2024

FastFiD: Improve Inference Efficiency of Open Domain Question Answering via Sentence Selection ACL 2024

SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget ACL 2024

CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers ACL 2024

NACL: A General and Effective KV Cache Eviction Framework for LLM at Inference Time ACL 2024

Moûsai: Efficient Text-to-Music Diffusion Models ACL 2024

Full Parameter Fine-tuning for Large Language Models with Limited Resources ACL 2024

Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models ACL 2024

T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step ACL 2024

Dodo: Dynamic Contextual Compression for Decoder-only LMs ACL 2024

On the Impact of Calibration Data in Post-training Quantization and Pruning ACL 2024

Exploring Precision and Recall to assess the quality and diversity of LLMs ACL 2024

ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition ACL 2024

Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models ACL 2024

LLM in a flash: Efficient Large Language Model Inference with Limited Memory ACL 2024

Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs ACL 2024

API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs ACL 2024

Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers ACL 2024

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech ACL 2024