Privacy and utility assessment within statistical data bases

Abstract : Personal data promise relevant improvements in almost every economy sectors thanks to all the knowledge that can be extracted from it. As a proof of it, some of the biggest companies in the world, Google, Amazon, Facebook and Apple (GAFA) rely on this resource for providing their services. However, although personal data can be very useful for improvement and development of services, they can also, intentionally or not, harm data respondent’s privacy. Indeed, many studies have shown how data that were intended to protect respondents’ personal data were finally used to leak private information. Therefore, it becomes necessary to provide methods for protecting respondent’s privacy while ensuring utility of data for services. For this purpose, Europe has established a new regulation (The General Data Protection Regulation) (EU, 2016) that aims to protect European citizens’ personal data. However, the regulation only targets one side of the main goal as it focuses on privacy of citizens while the goal is about the best trade-off between privacy and utility. Indeed, privacy and utility are usually inversely proportional and the greater the privacy, the lower the data utility. One of the main approaches for addressing the trade-off between privacy and utility is data anonymization. In the literature, anonymization refers either to anonymization mechanisms or anonymization metrics. While the mechanisms are useful for anonymizing data, metrics are necessary to validate whether or not the best trade-off has been reached. However, existing metrics have several flaws including the lack of accuracy and the complexity of implementation. Moreover existing metrics are intended to assess either privacy or utility, this adds difficulties when assessing the trade-off between privacy and utility. In this thesis, we propose a novel approach for assessing both utility and privacy called Discrimination Rate (DR). The DR is an information theoretical approach which provides practical and fine grained measurements. The DR measures the capability of attributes to refine a set of respondents with measurements scaled between 0 and 1, the best refinement leading to single respondents. For example an identifier has a DR equals to 1 as it completely refines a set of respondents. We are therefore able to provide fine grained assessments and comparison of anonymization mechanisms (whether different instantiations of the same mechanism or different anonymization mechanisms) in terms of utility and privacy. Moreover, thanks to the DR, we provide formal definitions of identifiers (Personally Identifying Information) which has been recognized as one of the main concern of privacy regulations. The DR can therefore be used both by companies and regulators for tackling the personal data protection issues
Complete list of metadatas

Cited literature [110 references]  Display  Hide  Download
Contributor : Abes Star <>
Submitted on : Sunday, June 2, 2019 - 1:03:48 AM
Last modification on : Saturday, June 22, 2019 - 3:48:58 AM


Version validated by the jury (STAR)


  • HAL Id : tel-02145208, version 1


Louis-Philippe Sondeck. Privacy and utility assessment within statistical data bases. Cryptography and Security [cs.CR]. Institut National des Télécommunications, 2017. English. ⟨NNT : 2017TELE0023⟩. ⟨tel-02145208⟩



Record views


Files downloads