Linkja logo

Linkja is a privacy preserving record linkage (PPRL) system designed to securely link individuals using de-identified methods. Linkja joins records without exposing underlying PHI. Welcome users, developers, and window shoppers!

Why Linkja?

Linkja was developed to fill a gap in PPRL systems. Notably, Linkja uses clear rules, a flexible interface for inclusion of new fields, and open source code so that the solution will evolve through engagement with a larger development community.

Linkja was validated on historically hard-to-link populations (many individuals were homeless, low socioeconomic status, and missing social security numbers), in a system with a high proportion of missing data.

Linkja was built for the public good. Namely, linkage between systems with variable data capture and quality, which encompasses diverse health sectors.

It is open source and available under General Public License. It comprises three modules: the salt and crypto engine, de-identification, and disambiguation.


Salt engine

The salt engine generates a unique key for each site and a shared project key used in generating the secret keys for tokenization.
Language: Java


Crypto engine

The crypto engine generates a shareable crypto library. The library contains the algorithm and a secret that is used to generate the keys in memory during tokenization.
Language: Java


De-identification: Data standardization, exclusion, and hashing

This module includes a data pipeline to digest and validate data, standardize data, manage data exclusions, create composite identifiers from patient identifiers, and tokenize them using the SHA256 algorithm and the secret keys.
Language: Java


Disambiguation

This module allows the aggregator to merge hashed data, disambiguate the hashes, and assign a master (global) patient ID to matched and non-matched patients using deterministic algorithms.
Language: Java


Each module is available as a separate tool and can be used independently. To promote data contributions, the hashing module is available in multiple formats.

Acknowledgements

The concept and codes are the culmination of years of efforts and contributions by several individuals:

William Trick, MD, Principal Investigator, Cook County Health
Abel Kho, MD, Principal Investigator, Northwestern University
Francisco Angulo, MBA, Technical Director, Cook County Health
Luke Rasmussen, MS, Lead Developer, Northwestern University
Kruti Doshi, MBA, Project Lead, Cook County Health
Larry Lemmon, Java Developer, independent

For a full list of contributors, please see our Acknowledgements page.