conftrace_
2021 UAI UAI 2021

Variance reduction in frequency estimators via control variates method

Abstract

Generating succinct summaries (also known as sketches) of massive data streams is becoming increasingly important. Such a task typically requires fast, accurate, and small space algorithms in order to support the downstream applications, mainly in areas such as data analysis, machine learning and data mining. A fundamental and well-studied problem in this context is that of estimating the frequencies of the items appearing in a data stream. The Count-Min-Sketch (Cormode and Muthukrishnan, J. Algorithms, 55(1):58–75, 2005) and Count-Sketch (Charikar et al., Theor. Comput. Sci., 312(1):3–15, 2004) are two known classical algorithms for this purpose. However, a limitation of these techniques is that the variance of their estimate tends to be large. In this work, we address this problem and suggest a technique that reduces the variance in their respective estimates, at the cost of little computational overhead. Our technique relies on the classical Control-Variate trick (Lavenberg and Welch, Manage. Sci., 27:322–335, 1981) used for reducing variance in Monte-Carlo simulation. We present a theoretical analysis of our proposal by carefully choosing the control variates and complement them with experiments on synthetic as well as real-world datasets.

🌉 Interdisciplinary Bridge - Machine Learning and Mathematics & Optimization
🐝 Cross-Pollinator - Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy