A.1 Quantifying Privacy in Large Sparse Datasets

A.1 Multidimensional privacy metrics for user empowerment

- Spyros Boukoros -

Users leak their private information in a variety of ways depending on which device they use, their activity or their mobility. The research area A1 aims to empower users by giving them tangible privacy metrics and tools for a variety of daily activities where privacy is leaked. By having users understand their privacy level, we can raise awareness and possibly alter their perception about the value of their data. Our research begins by creating privacy metrics for users on the move, where their phone (or more generally a mobile device) passively or actively collects data in a crowd-sourced fashion. We investigate privacy on this special type of location data and create privacy metrics that enable both users and application providers to easily understand privacy concepts. We continue our research by examining the place where users spend a significant amount of their time and where privacy is a major issue, their homes. More specifically, we study privacy in smart meters, that is special devices that measure buildings' electrical consumption in frequent intervals, that might become a defacto device in every home. We investigate how the most prominent way of anonymizing user data, that is smart meters aggregation, does not work and we illustrate the challenges is correct smart meters aggregation. Our proposed privacy metric, which is in the form of a cryptographic game, enables us to quantify all of these challenges. Our work paves the road for more careful planning of aggregations. Finally, we investigate privacy leakages that occur when users share their preferences (shopping lists, movies/books/music, etc) online. It is known from previous research that such kind of data, usually called microdata, can identify individuals even in pseudonymous datasets. We develop a lightweight and user friendly tool that enable users to understand their privacy level before they share their data with providers. In order to do so, we rely on users' (offline) preferences and on public available information about their data (more specifically the rarity of their choices).

Tandem partner: A.2