Research subarea B.2 focuses on privacy and empowerment of users through added value in collectives.
The first goal here is to use collective machine learning (ML) for generating added value within digital collectives.
To assess the influence of users' data on ML models, the second goal is to introduce key privacy figures for ML models that are assessable within digital collectives.
Third, hybrid apps for digital collectives provide the basis for collective ML.
The fourth goal aims at establishing trust in apps and services by transparency in order to empower users.
Current PhD project of subarea B.2:
User Empowerment Through Technical Transparency
Platform providers of large Internet services such as online social networks are known to collect large amounts of data on their users that is monetized and used to provide targeted advertising for example. The data collection of such platforms is often neither in the best interest of the users nor are users aware of the extent and impact of this data collection.
reveals what data apps could have accessed in the worst case, not what was actually collected and is subsequently used by platform providers.
As such, current approaches like TETs are still lacking in the completeness of information gathered and in illustrating implications of what platform providers can do with such information. Thus, this subproject B.2 of the Research Training Group (RTG) 2050 focuses on empowering users by increasing the transparency of this data collection by platform providers on mobile devices. Among the expected benefits of this approach in comparison to previous work are a more complete view of the gathered information on users and the potential to provide explanations on observations how derived data is used, e.g., for targeted advertising.
For this, a TET is proposed that utilizes information from different levels: First, by adopting an approach from the insider detection domain, we discover what data is collected on a low-level of the operating system, whether this data collection is (ab)normal and with whom this information is potentially shared. Second, on the application level, shadow profiles are created that depict what information a service provider has collected from a user. Third, data from the network level may be used to complement information. Finally, the information of different users is combined in order to simulate the platform providerâ€™s view and derive what additional information can be induced that was not apparent beforehand. For that purpose, machine learning techniques such as federated learning will be investigated to facilitate the approach in a privacy-preserving manner and potentially improve it.
Previous PhD project of subarea B.2 (Phase II)
On Privacy-Enhanced Distributed Analytics in Online Social Networks
Within subproject B.2 of RTG 2050, we focus on enhancing the privacy aspect in OSNs through three research pillars:
Online Social Network Architecture
• We propose the concept of hybrid online social networks (HOSNs), which combines the usage of COSNs and decentralized OSNs, by that users benefit from both the market penetration of COSNs and the privacy advantages of decentralized OSNs. Users can post their public content to the COSN, while sharing their private content only with their friends through the decentralized OSN beyond the knowledge of the service providers.
• Understanding the user perception of HOSN is crucial to calibrate the development of the concept. That is, we study the relationships between four aspects that influence users’ perception and behavior: privacy concerns, trust beliefs, risk beliefs, and the willingness to use. In the light of the relationships between these aspects, we develop software features to address users’ privacy concerns, increase trustworthiness, and increase willingness to use.
Distributed Analytics Application
• Recommender systems are essential to improve services in OSNs. Several analytic techniques can be used to generate recommendations. We work on the Association Rule Mining (ARM) technique. We enable efficient privacy-preserving ARM on distributed data. To achieve that, we combine graph sampling and distributed ARM algorithms.
• We look into enhancing the privacy of the emerging distributed machine learning technique, Federated Learning (FL). In particular, we explore the privacy benefits of applying FL in a hierarchy architecture, where the aggregation of the updates happens in multiple layers through the hierarchy.
Threats to Distributed Analytics
• We study the attacks against FL. For that, we identify the foci and gaps in the research literature. We point out issues in the assumptions and evaluation setups commonly used by researchers, and their implications on (1) the applicability of the proposed attacks and (2) the generalizability of the conclusions.
• The labels of user data can be of high sensitivity, e.g., in medical applications. We highlight the information leakage risk of sharing gradients in FL by investigating novel attacks that extract ground-truth labels from gradients.