F-LION – Private and Secure Federated Learning for Collaborative Data Utilization

With the evolving digital evolution and its diverse applications, more and more data is being generated that needs to be transmitted, stored, and processed. In particular, analyzing Big Data to fight cyberattacks and cybercrime poses multiple challenges to companies and law enforcement agencies. The use of Artificial Intelligence (AI) and especially Machine Learning (ML) allows to model data correlations and automatically analyze them, preventing that the mass of data overwhelms the processing agencies.

However, in many cases, the AI learning process requires joint collaboration at the state, federal and international levels, as diverse organizations and companies have access to different data sets that can contribute to more effective data analysis. However, at the same time, various situations such as competition and data protection regulations (GDPR), often prevent organizations from sharing their data.

Federated Learning (FL) allows multiple parties to jointly train ML models, such as Deep Neural Networks (DNN), without having to share the source data to be analyzed. The user data never leaves the participants’ systems, as only what we call ‘model updates’ are sent to a central server, which merges them into a global model. In addition, FL also offers improved computational efficiency and scalability, as the training of deep neural networks can be distributed across many participants and executed in parallel. Hence, FL has become a dynamic research topic.

The figure shows an example of such FL system, in which several agencies can jointly train a DNN for automated data processing. Each participant has a private data set that must not be shared with the other participants, represented by the locks. Instead of sending the data to a server, each participant trains a model locally based on their data. Then, this model is sent to the server which aggregates to the individual models. Through this, no participant has to share their data either with other participants or with the server.

Exemplary structure of a FL scheme for cooperative data processing between different safety authorities.
Exemplary structure of a FL scheme for cooperative data processing between different safety authorities.

With the growing popularity of FL systems, the number and complexity of attacks made towards them have increased. Recently, several of these attacks have demonstrated that they either manipulate the resulting model or extract information about the training data used from the model updates. Model manipulations are considered to be security attacks and can have different goals, such as negatively affecting the accuracy of predictions or introducing backdoors that lead to predefined wrong predictions in the case of certain inputs.

Existing defense approaches can often counter only one of the two types of attacks. They either protect against security attacks or prevent the reconstruction of training data. In this project, we will therefore design and develop a dynamically deployable federated learning framework for confidential, intelligent, and inter-organizational data analysis using neural networks called F-LION.

F-LION is intended to enable the confidential coordination of the FL participants while simultaneously preventing model manipulation. As an example of usage, the detection of extremist content from different participants (e.g., security agencies) is implemented. The resulting AI enables the automatic scanning of public data sources to detect potential threats at an early stage. The developed methods are intended to be general-purpose and format-independent, so that F-LION will be applicable to other use cases without any adaptations. FL provides a solid basis for data confidentiality since the data does not leave the server of the providing authority.

The project will also explore approaches to prevent the leak and extraction of the data used in this process from the model updates while significantly increasing the framework’s robustness against manipulation or erroneous training data.