IEEE Transactions on Information Forensics and Security | 2019

Secure Approximate String Matching for Privacy-Preserving Record Linkage

 

Abstract


Real-world applications of record linkage often require matching to be robust in spite of small variations in string fields. For example, two health care providers should be able to detect a patient in common, even if one record contains a typo or transcription error. In the privacy-preserving setting, however, the problem of approximate string matching has been cast as a trade-off between security and practicality, and the literature has mainly focused on Bloom filter encodings, an approach which can leak significant information about the underlying records. We present a novel public-key construction for secure two-party evaluation of threshold functions in restricted domains based on embeddings found in the message spaces of additively homomorphic encryption schemes. We use this to construct an efficient two-party protocol for privately computing the threshold Dice coefficient. Relative to the approach of Bloom filter encodings, our proposal offers formal security guarantees and greater matching accuracy. We implement the protocol and demonstrate the feasibility of this approach in linking medium-sized patient databases with tens of thousands of records.

Volume 14
Pages 2623-2632
DOI 10.1109/TIFS.2019.2903651
Language English
Journal IEEE Transactions on Information Forensics and Security

Full Text