Private Information Leakage from Polygenic Risk Scores

Abstract

Polygenic Risk Scores (PRSs) predict the likelihood of individuals to develop certain diseases based on their genetic variations. However, the privacy implications of publicly sharing PRS values are often underestimated. In this work, we demonstrate that PRS can be exploited to recover genotypes and potentially de-anonymize individuals. By using dynamic programming and population-based likelihood estimation, we show that it is possible to reconstruct a portion of an individual’s genome from their single associated PRS values. We highlight the risks of combining multiple PRSs to improve genotype-recovery accuracy, which can lead to the re-identification of individuals or their relatives in genomic databases or to the prediction of additional health risks, not originally associated with the disclosed PRSs. We then develop an analytical framework to assess the privacy risk of releasing individual PRS values and provide a potential solution that facilitates sharing without decreasing the utility of the shared PRS models. These results underscore the importance of treating individual PRSs as sensitive data and of implementing stronger safeguards for genetic privacy.

Publication
Under submission
Date