Scalable Data Management on Modern Hardware

In this research area we focus among others on the design of distributed and parallel data management systems for the next generation data center and enterprise cluster hardware.

For example, high-speed RDMA-capable networks such as InfiniBand FDR/EDR used to be a very expensive technology that was only deployed in high-performance computing (HPC) clusters. However, InfiniBand has recently become cost-competitive with Ethernet and is becoming an interesting alternative for future data centers and enterprise clusters. Our initial results of building an InfiniBand-optimized system called NAM-DB show that this trend toward high-speed networks enables a new bread of distributed data management systems which lead to major performance gains compared to existing systems for analytical but also transactional workloads.

We are also working on research problems that arise in large-scale data management scenarios in the cloud. We therefore have been developing an open-source distributed data management system called XDB. XDB is designed to analyze data in parallel on cloud deployments composed of commodity hardware and slow networks. One of the major contributions of XDB is a locality-aware and elastic partitioning scheme that minimizes the communication costs for data-intensive analytical workloads resulting in a significant speed-up compared to existing partitioning schemes. We have received a best paper award at the IEEE Big Data 2014 conference and a best demo award at SIGMOD 2014 conference for our research results in this area.

More projects in this research area can be found below.

AnyScale – DBMS Architectures for Any Hardware and Workload

AnyScale is a research project exploring adaptive database architectures to efficiently scale to a breadth of modern hardware delivering robust performance for diverse workloads.


Data Processing Interface (DPI)

DPI is an interface providing a set of simple yet powerful abstractions flexible enough to exploit features of modern networks (e.g., RDMA or in-network processing) suitable for data processing.


Distributed Storage built with Specialized Hardware

Multes implements smart distributed storage built with FPGAs that can efficiently be shared by a large number of tenants. It is smart because it is possible to offload filtering into the storage nodes. The nodes can also perform scans on the data. It is distributed because it runs on multiple FPGAs that replicate the data using a leader-based consensus protocol that is both low latency and high throughput. It is *storage* because it stores key-value pairs in a Cuckoo hash table and implements slab-based memory allocation. Multes is open-source and can be easily deployed on various AMD/Xilinx FPGAs, as well as, on the AMD HACC clusters.