Machine learning helps us distill the unreasonable complexity of the world around us into (relatively) simple models. In theory, models should learn facts about the population from which their training dataset is sampled. In practice, models often learn about the idiosyncrasies of the data they are fed. As a result, there is a concern that machine learning models could leak sensitive information in unpredictable ways. The goal of this project is to understanding when, how, and why this can occur and what can be done about it.
In his premonitory book “The Assault on Privacy: Computers, Data Banks, and Dossiers,” Arthur R. Miller warns of the threat of information technology. 45 years later, we are all too aware of the importance of collecting, sharing, and analyzing sensitive data with care. In particular, the problem of sharing (or publishing) datasets is a critical one. Past attempts at data sharing through anonymization, though it sounds like a simple bullet-proof procedure, have been met with catastrophes. Incidents experienced by companies like AOL and Netflix remind us of the consequences of underestimating this problem.
The goal of this research is to design data sharing protocols and mechanisms that achieve theoretically-sound privacy guarantees such as differential privacy. This project seeks to explore solutions spanning multiple applications, ranging from secure data aggregation to micro-data publishing, as well as multiple technical approaches, ranging from secure multiparty computation to data synthesis.
- Plausible Deniability for Privacy-Preserving Data Synthesis
- Achieving Differential Privacy in Secure Multiparty Data Aggregation Protocols on Star Networks
Data breaches are an ever increasing concern nowadays. For instance, the 2017 Equifax breach exposed sensitive financial information of an estimated 143 millions US persons. Furthermore, companies are often powerless to protected themselves as exemplified by Yahoo! which was breached in 2013 and then again in 2014!
As a defense, some security researchers have advocated for the use of always-encrypted databases. A prominent approach to construct always-encrypted databases is to leverage weak forms of encryption which expose, in their ciphertexts, some plaintext information in order to preserve query processing functionality. The level of security offered by such constructions remains unclear and recent academic papers have suggested that straightforward inference techniques such as frequency analysis undermine their security. This project seeks to analyze the security of popular encrypted databases constructions through the design of sophisticated inference attacks which exploit the complexity of real-world data such as correlations between data record attributes.
Side-channel attacks exploit the implementation of a system rather than its design. For example, such attacks include extracting cryptographic keys by probing the CPU cache during a decryption operation or through inference from a power analysis trace. Despite a growing set of prominent instances targeting real-world systems such as Spectre and Meltdown, side-channel threats remain poorly understood. There is a dearth of systematic knowledge about the cause of side-channel vulnerabilities and ways to effectively mitigate them. This project seeks to better understand side-channel threats through studying attacks on diverse systems ranging from CPU secure enclaves to cloud-based multi-tenant search systems.
Location data is a particularly sensitive type of data because human mobility reveals so much information about our lives. Where we are and where we go defines the blueprint of our lives and divulges our social relationships. Yet in the age of smartphones and location-enhanced services, our location data is increasingly exposed to untrusted and sometimes untrustworthy third parties. The broad goal of this project is to understand salient threats to location privacy in our daily lives, as well as to design and analyze techniques to preserve privacy when sharing location data.
- Synthesizing Plausible Privacy-Preserving Location Traces
- A Location-Privacy Threat Stemming from the Use of Shared Public IP Addresses
- How Others Compromise Your Location Privacy: The Case of Shared Public IPs at Hotspots.