0
Enthuse: Efficient Adaptable High-throughput Streaming Aggregation Engines
arXiv:2405.18168v4 Announce Type: replace
Abstract: Aggregation queries are a series of computationally-demanding analytics operations on counted, grouped or time series data. They include tasks such as summation or finding the median among the items of the same group, and within a specified number of the last observed tuples for sliding window aggregation (SWAG). They have a wide range of applications including database analytics, operating systems, bank security and medical sensors. Existing challenges include the hardware complexity that comes with efficiently handling per-group states using hash-based approaches. This paper presents Enthuse, an adaptable pipeline for calculating a wide range of aggregation queries with high throughput. It is then adapted for SWAG and achieves up to 476x speedup over the CPU core of the same platform. It achieves unparalleled levels of performance and functionality such as a throughput of 1 GT/s on our setup for SWAG without groups, and more advanced operators with up to 4x the window sizes than the state-of-the-art with groups as an approximation for SWAG featuring per-group windows using a fraction of the resources and no DRAM.
Abstract: Aggregation queries are a series of computationally-demanding analytics operations on counted, grouped or time series data. They include tasks such as summation or finding the median among the items of the same group, and within a specified number of the last observed tuples for sliding window aggregation (SWAG). They have a wide range of applications including database analytics, operating systems, bank security and medical sensors. Existing challenges include the hardware complexity that comes with efficiently handling per-group states using hash-based approaches. This paper presents Enthuse, an adaptable pipeline for calculating a wide range of aggregation queries with high throughput. It is then adapted for SWAG and achieves up to 476x speedup over the CPU core of the same platform. It achieves unparalleled levels of performance and functionality such as a throughput of 1 GT/s on our setup for SWAG without groups, and more advanced operators with up to 4x the window sizes than the state-of-the-art with groups as an approximation for SWAG featuring per-group windows using a fraction of the resources and no DRAM.