Research area A.1 works on solving the conflict between the advantages of using large amounts of data and the protection of private data by investigating technical means of data protection. In particular, cryptographic methods such as secure multi-party computation can help here, as they allow for the secure computation of, e.g., a machine learning service, without having to disclose sensitive data. However, these techniques still suffer from efficiency and usability drawbacks. Here, area A.1 aims to make these techniques more efficient and usable within crucial application scenarios.
Current PhD project of subarea A.1:
Scalable Application of Secure Multi-Party Computation
Together with the ongoing progress in digitalisation, data collection has risen to unprecedented levels. Especially mobile devices enable to gather large quantities of user data which can be of interest, e.g., for training machine learning (ML) models in the health, finance or insurance sector. Simultaneously, users are becoming more aware and concerned about collection and use of their private data, and legislation like the General Data Protection Regulation (GDPR) restricts which and how personal data can be used. Thus, new mechanisms become necessary that allow to profit from access to a wide collection of data while also protecting the users from undue data usage, providing transparency on how their data is used and satisfying statutory provisions.
Secure multi-party computation (MPC) is a cryptographic technique that allows multiple parties to compute some fixed function on their private input data without requiring any party to disclose any of its private information. Thus, MPC can be used to process data from different users while keeping everything but the result private. In addition, this mechanism fixes what exactly is computed on the data so that full transparency on how the data is used can be ensured. Such privacy and transparency guarantees do not come for free. Instead, the used cryptographic primitives often drastically decrease overall scalability which makes processing on large data sets difficult or even unfeasible.
In this work, we identify use cases that strongly benefit from using large datasets from different users, identify which security and privacy guarantees are required, and then build according MPC solutions tailored to maximise efficiency and scalability. In addition, we identify wider fields and work on the development of more generic building blocks for such fields which simplify the later construction of scalable protocols that are more specific. This also increases the usability of MPC by providing building blocks and tools to the development of new protocols for upcoming use cases.
We focus on two general directions. First, the use of MPC in machine learning has received large interest for use cases that lie in a multitude of different disciplines. Increasing the scalability is of particular interest as non-privacy-preserving optimisations already tend towards excessive utilisation of computing resources. Furthermore, revisiting graph problems while keeping privacy in mind yields multiple opportunities for MPC protocols as has been demonstrated for, e.g., contact tracing or assignments of apprenticeships.
Current PhD project of subarea A.1:
Mechanisms for Protecting Privacy in Applications
Today, mobile applications are central to our lives. Driven by the goal of personalized user experience through machine learning (ML) techniques, operators collect large quantities of individual user data. As a result, user data has become essential to them, raising the need for privacy protection and spawning legislation like the General Data Protection Regulation (GDPR).
The usage of privacy-preserving technologies from applied cryptography such as secure computation (SC) has been shown to be a promising approach to preserve privacy while still allowing an application to process user data. Recently, research has been focused on making machine learning techniques privacy-preserving. However, using these techniques usually requires large-scale computations even without privacy in mind. Existing solutions with optimal leakage do not scale well and require expert knowledge for deployment, which disincentivizes privacy protection in real-world applications. While some privacy-preserving solutions gain efficiency by leaking some information, this approach leaves open the real-world impact on privacy, partly because attacks exploiting leakage have only been studied in artificial environments.
In this work, we evaluate and build mechanisms for protecting privacy, focused on large-scale applications from the domain of machine learning. Our goal is for these mechanisms to enable practical ways for effectively preserving privacy in real-world applications that can even be used by non-experts.
To achieve this, we develop methods from SC for efficient, privacy-preserving applications at large scale. Building on existing private ML work that was solely focused on privacy-preserving neural networks and decision trees, we show how to practically protect privacy in crucial upcoming variants from machine learning. As an important use case that requires the protection of biometric information due to international standardization efforts, we demonstrate how to apply SC techniques to allow for highly efficient, privacy-preserving speaker recognition. Further, in collaboration with legal experts we design a novel system building on SC technologies that allows security agencies to exchange suspect information in a manner that satisfies European data protection laws, thereby moving towards a privacy-friendly solution to the problem that data protection is perceived to hinder modern law enforcement. Our developed tools are published as open source and are targeted to be usable by non-experts.
Additionally, we examine the practical security of existing solutions. We prove the insecurity of a protocol central to a line of prior privacy-preserving ML research and show how to learn private inputs. We also provide a first understanding of the practical impact of information leakage by searchable encryption schemes, an SC mechanism for querying databases used in private ML. For this, we evaluate existing attacks in scenarios surveyed by real-world data, laying out in which use cases common leakage profiles violate privacy.