F-LION – Private and Secure Federated Learning for Collaborative Data Utilization

With the evolving digital evolution and its diverse applications, more and more data is generated that needs to be transmitted, stored, and processed. The use of Artificial Intelligence (AI) and especially Machine Learning (ML) allows to model data correlations and automatically analyze them, preventing that the mass of data overwhelms the processing agencies.

However, in many cases, the AI learning process requires joint collaboration at the state, federal and international levels, as diverse organizations and companies have access to different data sets that can contribute to more effective data analysis. At the same time, various situations such as competition and data protection regulations (GDPR), often prevent organizations from sharing their data.

Exemplary structure of a FL scheme for cooperative data processing between different safety authorities.
Exemplary structure of a FL scheme for cooperative data processing between different safety authorities.

Federated Learning (FL) allows multiple parties to jointly train ML models, such as Deep Neural Networks (DNN), without having to share the source data to be analyzed. The user data never leaves the participants’ systems, as only what we call ‘model updates’ are sent to a central server, which merges them into a global model. The figure shows an example of such FL system, in which several agencies jointly train a DNN for automated data processing. Each participant has a private data set that must not be shared with others. Instead of sharing the data, each participant trains its own model locally and only shares the parameters of the DNN.

In the project, we design a flexible FL framework that can be dynamically applied to arbitrary applications allowing the involved parties to easily setup a new use-case, collaborate and benefiting from each other without exchanging data. We design secure and robust algorithms ensuring that malformed data of individual participants do not degrade the overall performance of the DNN that results from the collaboration. In a case-study, we design novel ML algorithms to automatically pre-filter data, allowing the involved humans to focus on the actual relevant data samples.