model architecture

324 papers

Explore in graph

Co-occurring keywords

neural network (6616) convolutional neural network (4216) vision transformer (1091) image classification (1943) neural architecture search (665) neural network optimization (1293) efficient computing (779) model compression (3283) attention mechanism (3975) deep learning (2111)

Papers

LSNet: See Large, Focus Small CVPR 2025

LegoMT2: Selective Asynchronous Sharded Data Parallel Training for Massive Neural Machine Translation ACL 2025

MossNet: Mixture of State-Space Experts is a Multi-Head Attention IJCNLP 2025

ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance IJCNLP 2025

From Data to Knowledge: Evaluating How Efficiently Language Models Learn Facts ACL 2025

ModRWKV: Transformer Multimodality in Linear Time EMNLP 2025

Deconstructing Attention: Investigating Design Principles for Effective Language Modeling IJCNLP 2025

OPTICAL: Leveraging Optimal Transport for Contribution Allocation in Dataset Distillation CVPR 2025

PowerMLP: An Efficient Version of KAN AAAI 2025

Building Vision Models upon Heat Conduction CVPR 2025

Talent: A Tabular Analytics and Learning Toolbox JMLR 2025

Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models CVPR 2025

Deconstructing Attention: Investigating Design Principles for Effective Language Modeling AACL 2025

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network CVPR 2025

Pico: A Modular Framework for Hypothesis-Driven Small Language Model Research EMNLP 2025

Temporally Streaming Audio-Visual Synchronization for Real-World Videos WACV 2025

DenseSSM: State Space Models with Dense Hidden Connection for Efficient Large Language Models NAACL 2025

From Parameters to Performance: A Data-Driven Study on LLM Structure and Development EMNLP 2025

SpiralMLP: A Lightweight Vision MLP Architecture WACV 2025

Mixture-of-Clustered-Experts: Advancing Expert Specialization and Generalization in Instruction Tuning EMNLP 2025

Bandit Based Attention Mechanism in Vision Transformers WACV 2025

A Closer Look into Mixture-of-Experts in Large Language Models NAACL 2025

Diffusion Model is Effectively Its Own Teacher CVPR 2025

Cost-Optimal Grouped-Query Attention for Long-Context Modeling EMNLP 2025

LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models NAACL 2025