As a follow-up to our blog on de-identified and anonymized healthcare data, we’re going to wrap up our third and final part of this series talking about tokenized healthcare data and continue exploring David’s* story.
It’s important to note that tokenization is not the same as de-identification, nor does just tokenizing health data meet HIPPA standards alone (this is where the value of expert determination comes front and center).
The linguistic definition of the word tokenize is “to separate (text) into discreet words, sequences, symbols, or other linguistic units.” The most basic form of tokenization has existed for centuries – think about subway and casino tokens – they serve as a substitute for actual money. In short, creating a unique identifier that allows for more context within an individual patient.
When you take this definition and examples and apply it to healthcare, tokenization is the process of de-identifying information that could identify a patient (e.g. name, address, zip, email, medical claims, etc.) and turning that information into a string of letters, numbers and symbols that are passed along via a token that is encrypted.
Think of a token as a pseudonym. It prevents personal identifiable information (PII) from being exposed. How? By replacing its identifying quality with a unique string of letters and numbers. So, for instance, David Smith Jr., goes from his specific identity to U^7×123!.
Tokens, however, can still be leveraged for re-identification of individuals – it is one of the reasons they are utilized, so that the original source can be traced back. But it can ONLY be re-identified by a reliable third-party that holds the initial key, or relational linkage to the original data.
For instance, our LexisNexis® Gravitas™ Token is an expert-certified de-identified, patent-pending token. This means that independent experts have validated that the risk of re-identification has been mitigated and determined that the tokenization solution meets the thresholds of de-identification under HIPAA.
Scenarios, where tokenized health data come into play, could be:
- Comparative effectiveness of two different medications
- Evaluating treatment adherence related to provider proximity/services provided
- Clinical trial of a new oncology treatment
- Genomics research
- Research detailing effects of disease burden on socioeconomic status
- Retrospective cohort studies
- Study using blood samples for clinical care
- Effects of social determinants of health on fetal mortality rates
Now let’s take the same individual we’ve been following but apply tokenization.
With tokenization, all of David Jr.’s PII is swapped out for a token, giving the ability to link back to a specific patient – but only to where the data originated from. This would enable us to put together disparate information, such as prescriptions from different providers to generate the patient’s complete medication list. For instance, perhaps we can see that David Jr. is taking depression medication and an antibiotic for a recent infection, but no heart medications, like his father, David Smith Sr. is taking.
Leveraging a Referential Data Layer, tokenization technology with LexisNexis® Gravitas enables better precision when matching patient data so as not to mix the health records of David Jr. and David Sr., whom both have vastly different medical histories and records. It helps mitigate false positives, driving a level of added precision and confidence, which is critical in healthcare.
Unlike the “old” way of matching based on probabilistic or deterministic matching, this wholly new way of referential matching links two data sets together referencing an aggregate layer of data that then combines all known insights about a person. The precision of referential data is unmatched.
What About Tokenized Social Determinants of Health (SDoH)?
Merely tokenizing data isn’t the end all-be all of leveraging data for research intent. It is widely understood that data resources are heavily siloed and lack real-world context. By tokenizing data AND appending socio-economic context such as social determinants of health (SDoH) enable connections and more comprehensive context between datasets that allow for privacy for patients/members without stigma or bias.
There can be hesitancy to disclose personal SDoH information due to misunderstanding, trust, or even pride. The ability to share SDoH data via a token eliminates incomplete data sets and enables a more complete picture of a patient. Knowing that socioeconomic factors play such an important role in health outcomes, why wouldn’t you want to make every study a health equity study?
In Summary: Our Healthcare Data Series
Through this blog series, we hope you’ve gained clarity as well as better understand the various needs for de-identified data, anonymized data, tokenized data in healthcare, along with expert determination all make much more sense as it relates to the healthcare ecosystem.
Below you’ll find a cheat sheet that summarizes the fictional patient of David as discussed throughout this blog series.
|How David Smith Jr.’s Data is Impacted|
|De-Identified Health Data: De-identified healthcare data means breaking the link between the data and the individual because PII would be removed or replaced.||A life science company may want to de-identify David’s patient data to find insights into other depression patients by creating a cohort of others that “look” like him. Since enough personal information is stripped away, this minimizes the risk of identifying David Smith Jr. Creating a cohort of patients who are like David Smith Jr. can help identify patients for future clinical trials or research studies.|
|Anonymized Health Data: Anonymized health data never contains identifiable information, so the data cannot be tied back to a single individual.||Using anonymized health data, for David Jr. we can de-identify his data to find insights into other patients with depression – however all identifiable information is completely removed. The data set can now only be filtered via simplistic parameters like age and geographic region, and it cannot ever link back to the patient level due to the lack of identifiers.|
|Tokenized Health Data: Tokenization is the process of replacing PII with a random token in which the original data holder, the key holder, can re-identify the data.||All of David Jr.’s PII is swapped out for a token, giving the ability to link back to a specific patient – but only to where the data originated from. This would enable us to put together disparate information, such as prescriptions from different providers to generate the patient’s complete medication list. For instance, perhaps we can see that David Jr. is taking depression medication and an antibiotic for a recent infection, but no heart medications, like his father, David Sr. is taking.|
Are you seeking a source of trusted de-identified healthcare data?
*Please note, all names, diagnoses, and other information included in this infographic are fictitious. They are included for illustrative purposes and do not identify any actual persons (living or deceased).