Scalable Data Management in the Presence of High-Speed Networks

I-Store – Skew-resilient Database for Fast Networks

Scalable distributed in-memory databases are at the core of data-intensive computation. Although scaling-out solutions help to handle large amounts of data, more nodes do not necessarily lead to improved query performance. In fact, recent papers have shown that performance can even degrade when scaling out due to higher communication overhead (e.g., shuffling data across nodes) and limited bandwidth. Thus, current distributed database systems are built with the assumption that the network is the major bottleneck and should be avoided at all costs.
In recent years, high-speed networks with a bandwidth close to the local memory bus have become economically viable. These network technologies provide Remote Direct Memory Access (RDMA) to allow direct memory access to a remote host and also reduce the latency of data transfer through bypassing the remote’s CPU. Therefore, the assumption that the network is the bottleneck no longer holds

Although the higher network bandwidth enables scalability to a certain extent, this approach fails if the data or workload is skewed and cannot be evenly partitioned. Additionally, in modern data centers the network or the compute power is often shared, which can lead to network/compute skew.

Therefore, in this project, we develop a fast, scalable, and skew-resilient approach to execute distributed queries on fast networks. Our main contribution is a novel execution strategy, which enables collaborative query processing by remote work stealing to mitigate skew, as this is a common issues that hinders scalable query execution. Moreover, this approach also solves compute and network skew, which enables scalability.

Funding-Notice

This project is funded by Huawei Innovation Research Program (HIRP)

Research Team

  • Prof. Dr. Carsten Binnig
  • Tobias Ziegler

Publications

  • Tobias Ziegler, Uwe Röhm, Carsten Binnig: Skew-resilient Query Processing for Fast Networks. NoDMC@BTW 2019
  • Tobias Ziegler, Sumukha Tumkur Vani, Carsten Binnig, Rodrigo Fonseca, Tim Kraska: Designing Distributed Tree-based Index Structures for Fast RDMA-capable Networks. SIGMOD 2019
  • Gustavo Alonso, Carsten Binnig, Ippokratis Pandis, Kenneth Salem, Jan Skrzypczak, Ryan Stutsman, Lasse Thostrup, Tianzheng Wang, Zeke Wang, Tobias Ziegler: DPI: The Data Processing Interface for Modern Networks. CIDR 2019
  • Marcel Blöcher, Tobias Ziegler, Carsten Binnig, Patrick Eugster: Boosting scalable data analytics with modern programmable networks. DaMoN 2018
  • Abdallah Salama, Carsten Binnig, Tim Kraska, Ansgar Scherp, Tobias Ziegler: Rethinking Distributed Query Execution on High-Speed Networks. IEEE Data Eng. Bull 2017
  • Carsten Binnig, Erfan Zamanian, Tim Harris, Tim Kraska: The End of a Myth: Distributed Transaction can Scale. Research Paper, PVLDB 2017
  • Carsten Binnig, Andrew Crotty, Alex Galakatos et al.: The End of Slow Networks: It’s Time for a Redesign. Vision Paper, PVLDB 2016
  • Carsten Binnig, Erfan Zamanian, Tim Kraska et al.:Making Distributed Transactions Scale. search Paper, Nedbday 2016
  • Carsten Binnig, Ugur Cetintemel, Tim Kraska et al.: I-Store: Data Management for Fast Networks. Research Paper, Nedbday 2015