Benchmarking Power and Energy Consumption of Stream Processing Frameworks
The exponential growth of real-time data-centric systems relies on the continuous processing of large data streams through stream processing frameworks. However, these frameworks are resource-intensive, require numerous dependencies and induce high computing loads, resulting in increased energy consumption and carbon emissions. The selection of an appropriate stream processing framework can significantly reduce energy consumption and carbon footprint within a company’s IT infrastructure. While several studies have compared the performance of stream processing frameworks, none have specifically examined them from a sustainability standpoint. This report aims to bridge this gap by conducting a comparison of energy consumption and performance among three different frameworks: Kafka Streams, Apache Flink, and Spark Structured Streaming. To achieve this, we implemented a realworld use case in Java and conducted multiple experiments at varying streaming rates, while monitoring the systems using Prometheus and Grafana. Our findings indicate that, on average, Kafka Streams and Apache Flink exhibit lower power and energy consumption compared to the Spark Structured Streaming module. Among the two, Flink proves to be the most efficient in terms of power for medium to high throughput applications, while Kafka Streams is most suitable when tolerating lower throughput, as both frameworks demonstrate similar power consumption levels. Additionally, we also analysed the CPU and RAM usage of each framework, revealing distinct patterns for each stream processing engine.