conftrace_

Papers

Measuring In-Context Computation Complexity via Hidden State Prediction ICML 2025 MrT5: Dynamic Token Merging for Efficient Byte-level Language Models ICLR 2025 SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention NIPS 2024 Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations EMNLP 2024 MoEUT: Mixture-of-Experts Universal Transformers NIPS 2024 Randomized Positional Encodings Boost Length Generalization of Transformers ACL 2023 Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions EMNLP 2023 Approximating Two-Layer Feedforward Networks for Efficient Transformers EMNLP 2023 CTL++: Evaluating Generalization on Never-Seen Compositional Patterns of Known Functions, and Compatibility of Neural Representations EMNLP 2022 The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization ICLR 2022 The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention ICML 2022 A Modern Self-Referential Weight Matrix That Learns to Modify Itself ICML 2022 The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers EMNLP 2021 Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks ICLR 2021 Going Beyond Linear Transformers with Recurrent Fast Weight Programmers NIPS 2021 Improving Differentiable Neural Computers Through Memory Masking, De-allocation, and Link Distribution Sharpness Control ICLR 2019