Thulawa: Key-Aware Intra-Partition Parallelism extending Kafka Streams

Main Article Content

Keshan Pathirana, Teshan Jayakody, Bihesha Dilshan, Manula Gunatilleke, Nuwan Kodagoda, Samadhi Rathnayake

Abstract

In the present, Event Stream Processing (ESP) is a critical process for real-time data analytics, which enables organizations to process and act on continuous streams of events. However, existing frameworks like Apache Kafka Streams face limitations in efficiently parallelizing event processing, particularly parallelizing beyond individual partitions, which can create performance bottlenecks in high-throughput scenarios. To overcome these limitations in Kafka Streams, our research introduces design enhancements through a parallel processing design that significantly enhances stream processing performance. Our approach enhances processing throughput and event scheduling by introducing parallel event processing and fair scheduling of events for processing, ensuring improved throughput in high-performance environments. To overcome Kafka Streams' processing limitations, we designed an architecture where scheduling, event submission, and execution are handled in parallel. A dedicated scheduling mechanism handles fair scheduling events for processing independently, while also event submission and execution occur parallelly, ensuring optimal resource utilization and improved throughput. These enhancements were implemented within an extended Kafka Streams library, Kafka-Thulawa. The findings show substantial improvement in processing throughput when compared to Kafka Streams. This study advances real-time stream processing by integrating parallel event execution within partitions, fair event scheduling, and improved design architecture, addressing key limitations in existing Kafka Streams.

Article Details

Section
Articles