Mission Distributed-Cyber-Security: Federated Machine Learning
The importance of machine learning (ML) is increasing rapidly, especially in the recent past, while at the same time the technologies involved are becoming more mature. ML is used in many applications such as computer linguistics (CL), speech and object recognition, but also for vulnerability and malware detection.
An important reason that makes ML indispensable is that it allows us to make predictions and automate complex tasks without human intervention. Therefore, it is expected that machine learning will play an even more important role in almost all software applications in the near future, and will increasingly be integrated directly into various applications and devices to automate complex tasks.
The traditional approach to machine learning is built around a centralized infrastructure that is typically owned by one party. This infrastructure is used to store data and train predictive models. However, due to recent trends such as IoT, smart cities or autonomous driving, a typical ML setup has become much more complex. Not only the number of devices, applications and services that collect data to train predictive models has increased significantly but also the number of involved parties. To better support these setups, federated machine learning has become more important which allows to train predictive models in a decentralized manner.
The main idea of federated learning is that it enables devices to learn predictive models in a collaborative way while keeping all training data local. Federated machine learning is particularly useful in situations where the model is based on data collected and processed by a very large number of devices owned by different parties. Federated learning is thus used to minimize the costs and risks associated with processing sensitive data since training of models can directly happen on the end devices without moving sensitive data through the network.
In the typical federated learning setup today, a central server is still being used; i.e., the parties involved in the training process need to send their local model updates to a central server, which combines these updates into a global model. This approach to federated ML is called 'centralized federated machine learning'. However, this approach to federated ML has shown to open up many possible attacks to manipulate the predictive models (e.g., poisoning attacks) or even breach the privacy of participants.
Hence, solutions for efficient, secure and fully distributed federated learning architectures are essential, especially as learning algorithms are now used in many IT applications and autonomous systems in which attacks as outlined above could cause severe damages.
This project is part of the mission Decentralized Cyber Security of the National Research Center for Applied Cybersecurity ATHENE. Our goal is to develop a new approach for secure federated ML. Instead of building on a centralized approach, the main idea of this project is to build a fully decentralized framework for federated ML based on blockchain technology.
What makes blockchains attractive for federated ML scenarios are two main characteristics: First, blockchains store their state (e.g., the parameters of a predictive model) in an immutable append-only ledger that contains the history of all model updates. That way, blockchains enable auditability and traceability in order to detect potential malicious operations on the shared model (i.e., model poisoning).
Second, blockchains can be operated reliably in a decentralized manner tolerating byzantine failures without the need to involve a central trusted instance which often does not exist in federated ML.
As a result, blockchains provide an interesting platform for federated ML. However, there still exist many challenging and open problems when using blockchain technology as a main building block for federated ML. In the context of our project, we want to investigate these challenges further and propose novel techniques to tackle them.
This research work is funded by the German Federal Ministery of Education and Research and the Hessen State Ministry for Higher Education, Research and the Arts within their joint support of the National Research Center for Applied Cybersecurity ATHENE.
|Muhammad El-Hindi M.Sc.|
|S2|02 E115||+49 6151 email@example.com-...|
El-Hindi, Muhammad ; Zhao, Zheguang ; Binnig, Carsten
Rezig, El Kindi ; Gadepally, Vijay ; Mattson, Timothy ; Stonebraker, Michael ; Kraska, Tim ; Wang, Fusheng ; Luo, Gang ; Kong, Jun ; Dubovitskaya, Alevtina (Hrsg.) (2021):
ACID-V: Towards a New Class of DBMSs for Data Sharing.
In: Lecture Notes in Computer Science, 12921, In: Heterogeneous Data Management, Polystores, and Analytics for Healthcare: VLDB Workshops, Poly 2021 and DMAH 2021, S. 60-64,
Springer, 47th International Conference on Very Large Data Bases (VLDB 2021), virtual Conference, 20.08.2021, ISBN 978-3-030-93662-4,
El-Hindi, Muhammad ; Karrer, Simon ; Doci, Gloria ; Binnig, Carsten (2020):
TrustDBle: Towards Trustable Shared Databases.
3rd International Symposium on Foundations and Applications of Blockchain, virtual Conference, 01.05.2020, [Konferenzveröffentlichung]