AnyScale – DBMS Architectures for Any Hardware and Workload
AnyScale is a research project exploring adaptive database architectures to efficiently scale to a breadth of modern hardware delivering robust performance for diverse workloads.
Within the last decade, we have seen disruptive hardware evolution significantly affecting the designs database systems. Increasing main-memory capacities disrupted disk-centric DBMS designs and let in-memory DBMS emerge providing unprecedented performance. Though soon after, the end of Moore’s Law required adoption to multi-core and multi-socket hardware impacting reliable performance and again requiring drastic redesign. Recently, the rise of the Cloud is amplifying the challenge to find good DBMS designs, as the Cloud requires database designs to perform well on a breadth of hardware for a wide range of workloads. As a result, database systems are constantly redesigned in the pursuit of optimal performance for a given hardware and workload.
AnyScale is a research project exploring efficient adaption of DBMS architectures to conditions unforeseeable at the time of designing a DBMS, i.e., diverse workloads, evolving hardware, and variable system scale. The goal is to simplify DBMS design while also guaranteeing robust performance under these changing or even unforeseeable conditions.
Conceptually, we approach this efficient adaptation of DBMS architectures with two fundamental abstractions: configurable design and automatic architecture optimization. That is, we propose configurable DBMS designs to establish general building blocks which are then configured by optimization procedures deriving optimized DBMS architectures for given conditions (i.e., workload and hardware).
As first step, starting at the foundation of modern in-memory DBMS, we present a new approach for achieving robust performance of data structures making it easier to reuse the same design for different hardware generations and also different workload. To achieve robust performance the main idea is to strictly separate the data structure design from the actual strategies to execute access operations and adjust the actual execution strategies by configurations instead of hard-wiring the execution strategy into the data structure. For this new configuration approach, we demonstrate performance benefits over existing approaches for individual data structures as well as complex OLTP workloads.
(cf. “SIGMOD'20: Robust Performance of Main Memory Data Structures by Configuration”)
As second step, we evaluate characteristics of concurrency control, an essential DBMS feature, on evolving hardware. We revisit the evaluation by Xiangyao Yu, et. al from 2015, which analyses the characteristics of concurrency control on hardware expected to be employed today. Despite the original assumption of the authors, today we do not see single-socket CPUs with 1000 cores. Instead multi-socket hardware made its way into production data centers. Hence, we follow up on this prior work wait an evaluation of the characteristics of concurrency control schemes on real production multi-socket hardware with 1568 cores. We made several interesting findings. Most importantly, projecting the behaviour of DBMS (components) to future hardware proved difficult, confirming the challenges of a single exhaustive DBMS design/architecture and confirming the need for configurable DBMS designs/optimized DBMS architectures. Additionally, findings from this evaluation for future hardware guide our following proposal for a radical new approach to future-prove DBMS designs/architectures.
(cf. “Damon@SIGMOD'20: The Tale of 1000 Cores: An Evaluation of Concurrency Control on Real(ly) Large Multi-Socket Hardware”)
As third step, following our previous findings and the advent of new platforms (i.e., FaaS), we propose a radical new approach for scale-out distributed DBMSs. Instead of hard-baking an architectural model, such as a shared-nothing architecture, into the distributed DBMS design, we aim for a new class of so-called architecture-less DBMSs. The main idea is that an architecture-less DBMS can mimic any architecture on a per-query basis on-the-fly without any additional overhead for reconfiguration. Our initial results show that our architecture-less DBMS AnyDB can provide significant speedup across varying workloads compared to a traditional DBMS implementing a static architecture. Also, we expect this architecture-less approach to further simplify DBMS design and enhance our previous approach to adaptivity of DBMS architectures with elasticity, hardware heterogeneity, and bridging scale-up/scale-out.
(cf. “CIDR'21: AnyDB: An Architecture-less DBMS for Any Workload”)
Currently, we are striving for an extensive implementation of our vision of the architecture-less DBMS.
This research project is funded and generously supported by SAP SE under the SAP HANA Campus.
|Tiemo Bang M.Sc.|
Below you can find material produced as part of the AnyScale research project:
- CIDR'21: AnyDB: An Architecture-less DBMS for Any Workload [Paper] [Slides] [Talk Recording]
- DaMoN@SIGMOD'20: The Tale of 1000 Cores: An Evaluation of Concurrency Control on Real(ly) Large Multi-Socket Hardware [Twitter] [Paper] [Slides] [Talk Recording]
- SIGMOD'20: Robust Performance of Main Memory Data Structures by Configuration [Paper] [Slides] [Talk Recording]
Bang, Tiemo ; May, Norman ; Petrov, Ilia ; Binnig, Carsten (2021):
AnyDB: An Architecture-less DBMS for Any Workload.
11th Annual Conference on Innovative Data Systems Research (CIDR 2021), virtual Conference, 10.-15.01.2021, [Konferenzveröffentlichung]
Bang, Tiemo ; May, Norman ; Petrov, Ilia ; Binnig, Carsten (2020):
The Tale of 1000 Cores: An Evaluation of Concurrency Control on Real(Ly) Large Multi-Socket Hardware.
DaMoN ’20: 16th International Workshop on Data Management on New Hardware, virtual Conference, 15.06., ISBN 9781450380249,
Bang, Tiemo ; Oukid, Ismail ; May, Norman ; Petrov, Ilia ; Binnig, Carsten (2020):
Robust Performance of Main Memory Data Structures by Configuration.
S. 1651-1666, SIGMOD ’20: 2020 ACM SIGMOD International Conference on Management of Data, virtual Conference, 14.-19.06., ISBN 9781450367356,