conftrace_

model compression

3302 papers

Explore in graph

Also known as

MC

Co-occurring keywords

knowledge distillation (3725) large language model (13587) neural network (6616) efficient computing (781) neural network optimization (1293) transfer learning (5449) convolutional neural network (4226) neural network pruning (265) language model (4599) parameter efficiency (417)

Papers

Efficient Vocabulary Reduction for Small Language Models COLING 2025

Low-Rank Interconnected Adaptation across Layers ACL 2025

OptiPrune: Effective Pruning Approach for Every Target Sparsity COLING 2025

Slender-Mamba: Fully Quantized Mamba in 1.58 Bits From Head to Toe COLING 2025

ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers ICCV 2025

Q-Mamba: Towards more efficient Mamba models via post-training quantization ACL 2025

NeuroAda: Activating Each Neuron’s Potential for Parameter-Efficient Fine-Tuning EMNLP 2025

Best Practices for Distilling Large Language Models into BERT for Web Search Ranking COLING 2025

Speed Without Sacrifice: Fine-Tuning Language Models with Medusa and Knowledge Distillation in Travel Applications ACL 2025

ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models COLING 2025

Logits-Based Finetuning EMNLP 2025

AAIG at GenAI Detection Task 1: Exploring Syntactically-Aware, Resource-Efficient Small Autoregressive Decoders for AI Content Detection COLING 2025

Efficient One-shot Compression via Low-Rank Local Feature Distillation NAACL 2025

Octopus: On-device language model for function calling of software APIs NAACL 2025

ZigZagKV: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty COLING 2025

Comprehensive and Efficient Distillation for Lightweight Sentiment Analysis Models EMNLP 2025

GAP: a Global Adaptive Pruning Method for Large Language Models EMNLP 2025

Neural Parameter Search for Slimmer Fine-Tuned Models and Better Transfer ACL 2025

Let’s Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model COLING 2025

EcoLoRA: Communication-Efficient Federated Fine-Tuning of Large Language Models EMNLP 2025

MobiZO: Enabling Efficient LLM Fine-Tuning at the Edge via Inference Engines EMNLP 2025

DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers ICCV 2025

Balcony: A Lightweight Approach to Dynamic Inference of Generative Language Models EMNLP 2025

GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression EMNLP 2025

DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs EMNLP 2025