Scheduling Strategies for Apache Storm

Master Thesis

Motivation
Data Stream Processing (DSP) systems have emerged as a way for the timely processing of real-time data generated by a variety of sources. Continuous streams of data from these sources are passed through different operators, each of which performs some computations on the data. The operators are chained and together form the overall logic of a stream processing application.
With new concepts such as In-network processing or edge computing, more and more devices become available to process data in the core network or at the extreme edge of the network, rather than in distant cloud computing architectures. Given this scenario, the question arises where to place and schedule operators. Strategies for placement and scheduling can optimize for different metrics (e.g. latency, resource utilization, user-defined QoS etc.). Furthermore, the problem is NP-hard, so efficient heuristics are needed to solve it in reasonable time.
This thesis examines the problem using Apache Storm as a DSP. For this system, several schedulers have been proposed in literature, however, they each employ a different approach and have a different focus for optimization. This makes it very hard to compare them and identify means to efficiently solve the placement problem.

Start: 01.05.2017

Ende: 31.10.2017

Betreuer:

  • Julien Gedeon (gedeon(a-t)tk.tu-darmstadt.de)

Forschungsgebiete: Telecooperation , – SUN – Smart Urban Networks