A Distributed Near-Optimal LSH-based Framework for Privacy-Preserving Record Linkage

Dimitrios Karapiperis1 and Vassilios S. Verykios1

  1. School of Science and Technology
    Hellenic Open University
    {dkarapiperis,verykios}@eap.gr

Abstract

In this paper, we present a framework which relies on the Map/Reduce paradigm in order to distribute computations among underutilized commodity hardware resources uniformly, without imposing an extra overhead on the existing infrastructure. The volume of the distance computations, required for records comparison, is largely reduced by utilizing the so-called Locality-Sensitive Hashing technique, which is optimally tuned in order to avoid highly redundant computations. Experimental results illustrate the effectiveness of our distributed framework in finding the matched record pairs in voluminous data sets.

Key words

Locality-Sensitive Hashing, Bloom filter, Map/Reduce

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS140215040K

Publication information

Volume 11, Issue 2 (June 2014)
Year of Publication: 2014
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

DownloadAvailable in PDF
Portable Document Format

How to cite

Karapiperis, D., Verykios, V. S.: A Distributed Near-Optimal LSH-based Framework for Privacy-Preserving Record Linkage. Computer Science and Information Systems, Vol. 11, No. 2, 745–763. (2014), https://doi.org/10.2298/CSIS140215040K