Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Fida Kamal Dankar is active.

Publication


Featured researches published by Fida Kamal Dankar.


Journal of the American Medical Informatics Association | 2009

A Globally Optimal k-Anonymity Method for the De-Identification of Health Data

Khaled El Emam; Fida Kamal Dankar; Romeo Issa; Elizabeth Jonker; Daniel Amyot; Elise Cogo; Jean-Pierre Corriveau; Mark Walker; Sadrul Habib Chowdhury; Régis Vaillancourt; Tyson Roffey; Jim Bottomley

BACKGROUND Explicit patient consent requirements in privacy laws can have a negative impact on health research, leading to selection bias and reduced recruitment. Often legislative requirements to obtain consent are waived if the information collected or disclosed is de-identified. OBJECTIVE The authors developed and empirically evaluated a new globally optimal de-identification algorithm that satisfies the k-anonymity criterion and that is suitable for health datasets. DESIGN Authors compared OLA (Optimal Lattice Anonymization) empirically to three existing k-anonymity algorithms, Datafly, Samarati, and Incognito, on six public, hospital, and registry datasets for different values of k and suppression limits. Measurement Three information loss metrics were used for the comparison: precision, discernability metric, and non-uniform entropy. Each algorithms performance speed was also evaluated. RESULTS The Datafly and Samarati algorithms had higher information loss than OLA and Incognito; OLA was consistently faster than Incognito in finding the globally optimal de-identification solution. CONCLUSIONS For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance.


Journal of the American Medical Informatics Association | 2008

Protecting Privacy Using k-Anonymity

Khaled El Emam; Fida Kamal Dankar

OBJECTIVE There is increasing pressure to share health information and even make it publicly available. However, such disclosures of personal health information raise serious privacy concerns. To alleviate such concerns, it is possible to anonymize the data before disclosure. One popular anonymization approach is k-anonymity. There have been no evaluations of the actual re-identification probability of k-anonymized data sets. DESIGN Through a simulation, we evaluated the re-identification risk of k-anonymization and three different improvements on three large data sets. MEASUREMENT Re-identification probability is measured under two different re-identification scenarios. Information loss is measured by the commonly used discernability metric. RESULTS For one of the re-identification scenarios, k-Anonymity consistently over-anonymizes data sets, with this over-anonymization being most pronounced with small sampling fractions. Over-anonymization results in excessive distortions to the data (i.e., high information loss), making the data less useful for subsequent analysis. We found that a hypothesis testing approach provided the best control over re-identification risk and reduces the extent of information loss compared to baseline k-anonymity. CONCLUSION Guidelines are provided on when to use the hypothesis testing approach instead of baseline k-anonymity.


edbt icdt workshops | 2012

The application of differential privacy to health data

Fida Kamal Dankar; Khaled El Emam

Differential privacy has gained a lot of attention in recent years as a general model for the protection of personal information when used and disclosed for secondary purposes. It has also been proposed as an appropriate model for health data. In this paper we review the current literature on differential privacy and highlight important general limitations to the model and the proposed mechanisms. We then examine some practical challenges to the application of differential privacy to health data. The review concludes by identifying areas that researchers and practitioners in this area need to address to increase the adoption of differential privacy for health data.


BMC Medical Informatics and Decision Making | 2012

Estimating the re-identification risk of clinical data sets

Fida Kamal Dankar; Khaled El Emam; Angelica Neisa; Tyson Roffey

BackgroundDe-identification is a common way to protect patient privacy when disclosing clinical data for secondary purposes, such as research. One type of attack that de-identification protects against is linking the disclosed patient data with public and semi-public registries. Uniqueness is a commonly used measure of re-identification risk under this attack. If uniqueness can be measured accurately then the risk from this kind of attack can be managed. In practice, it is often not possible to measure uniqueness directly, therefore it must be estimated.MethodsWe evaluated the accuracy of uniqueness estimators on clinically relevant data sets. Four candidate estimators were identified because they were evaluated in the past and found to have good accuracy or because they were new and not evaluated comparatively before: the Zayatz estimator, slide negative binomial estimator, Pitman’s estimator, and mu-argus. A Monte Carlo simulation was performed to evaluate the uniqueness estimators on six clinically relevant data sets. We varied the sampling fraction and the uniqueness in the population (the value being estimated). The median relative error and inter-quartile range of the uniqueness estimates was measured across 1000 runs.ResultsThere was no single estimator that performed well across all of the conditions. We developed a decision rule which selected between the Pitman, slide negative binomial and Zayatz estimators depending on the sampling fraction and the difference between estimates. This decision rule had the best consistent median relative error across multiple conditions and data sets.ConclusionThis study identified an accurate decision rule that can be used by health privacy researchers and disclosure control professionals to estimate uniqueness in clinical data sets. The decision rule provides a reliable way to measure re-identification risk.


PLOS ONE | 2012

A Protocol for the Secure Linking of Registries for HPV Surveillance

Khaled El Emam; Saeed Samet; Jun Hu; Liam Peyton; Craig C. Earle; Gayatri C. Jayaraman; Tom Wong; Murat Kantarcioglu; Fida Kamal Dankar; Aleksander Essex

Introduction In order to monitor the effectiveness of HPV vaccination in Canada the linkage of multiple data registries may be required. These registries may not always be managed by the same organization and, furthermore, privacy legislation or practices may restrict any data linkages of records that can actually be done among registries. The objective of this study was to develop a secure protocol for linking data from different registries and to allow on-going monitoring of HPV vaccine effectiveness. Methods A secure linking protocol, using commutative hash functions and secure multi-party computation techniques was developed. This protocol allows for the exact matching of records among registries and the computation of statistics on the linked data while meeting five practical requirements to ensure patient confidentiality and privacy. The statistics considered were: odds ratio and its confidence interval, chi-square test, and relative risk and its confidence interval. Additional statistics on contingency tables, such as other measures of association, can be added using the same principles presented. The computation time performance of this protocol was evaluated. Results The protocol has acceptable computation time and scales linearly with the size of the data set and the size of the contingency table. The worse case computation time for up to 100,000 patients returned by each query and a 16 cell contingency table is less than 4 hours for basic statistics, and the best case is under 3 hours. Discussion A computationally practical protocol for the secure linking of data from multiple registries has been demonstrated in the context of HPV vaccine initiative impact assessment. The basic protocol can be generalized to the surveillance of other conditions, diseases, or vaccination programs.


BMC Medical Informatics and Decision Making | 2011

De-identifying a public use microdata file from the Canadian national discharge abstract database

Khaled El Emam; David Paton; Fida Kamal Dankar; Gunes Koru

BackgroundThe Canadian Institute for Health Information (CIHI) collects hospital discharge abstract data (DAD) from Canadian provinces and territories. There are many demands for the disclosure of this data for research and analysis to inform policy making. To expedite the disclosure of data for some of these purposes, the construction of a DAD public use microdata file (PUMF) was considered. Such purposes include: confirming some published results, providing broader feedback to CIHI to improve data quality, training students and fellows, providing an easily accessible data set for researchers to prepare for analyses on the full DAD data set, and serve as a large health data set for computer scientists and statisticians to evaluate analysis and data mining techniques. The objective of this study was to measure the probability of re-identification for records in a PUMF, and to de-identify a national DAD PUMF consisting of 10% of records.MethodsPlausible attacks on a PUMF were evaluated. Based on these attacks, the 2008-2009 national DAD was de-identified. A new algorithm was developed to minimize the amount of suppression while maximizing the precision of the data. The acceptable threshold for the probability of correct re-identification of a record was set at between 0.04 and 0.05. Information loss was measured in terms of the extent of suppression and entropy.ResultsTwo different PUMF files were produced, one with geographic information, and one with no geographic information but more clinical information. At a threshold of 0.05, the maximum proportion of records with the diagnosis code suppressed was 20%, but these suppressions represented only 8-9% of all values in the DAD. Our suppression algorithm has less information loss than a more traditional approach to suppression. Smaller regions, patients with longer stays, and age groups that are infrequently admitted to hospitals tend to be the ones with the highest rates of suppression.ConclusionsThe strategies we used to maximize data utility and minimize information loss can result in a PUMF that would be useful for the specific purposes noted earlier. However, to create a more detailed file with less information loss suitable for more complex health services research, the risk would need to be mitigated by requiring the data recipient to commit to a data sharing agreement.


BMC Medical Informatics and Decision Making | 2013

Evaluating the risk of patient re-identification from adverse drug event reports

Khaled El Emam; Fida Kamal Dankar; Angelica Neisa; Elizabeth Jonker

BackgroundOur objective was to develop a model for measuring re-identification risk that more closely mimics the behaviour of an adversary by accounting for repeated attempts at matching and verification of matches, and apply it to evaluate the risk of re-identification for Canada’s post-marketing adverse drug event database (ADE).Re-identification is only demonstrably plausible for deaths in ADE. A matching experiment between ADE records and virtual obituaries constructed from Statistics Canada vital statistics was simulated. A new re-identification risk is considered, it assumes that after gathering all the potential matches for a patient record (all records in the obituaries that are potential matches for an ADE record), an adversary tries to verify these potential matches. Two adversary scenarios were considered: (a) a mildly motivated adversary who will stop after one verification attempt, and (b) a highly motivated adversary who will attempt to verify all the potential matches and is only limited by practical or financial considerations.MethodsThe mean percentage of records in ADE that had a high probability of being re-identified was computed.ResultsUnder scenario (a), the risk of re-identification from disclosing the province, age at death, gender, and exact date of the report is quite high, but the removal of province brings down the risk significantly. By only generalizing the date of reporting to month and year and including all other variables, the risk is always low. All ADE records have a high risk of re-identification under scenario (b), but the plausibility of that scenario is limited because of the financial and practical deterrent even for highly motivated adversaries.ConclusionsIt is possible to disclose Canada’s adverse drug event database while ensuring that plausible re-identification risks are acceptably low. Our new re-identification risk model is suitable for such risk assessments.


edbt icdt workshops | 2010

A method for evaluating marketer re-identification risk

Fida Kamal Dankar; Khaled El Emam

Disclosures of health databases for secondary purposes is increasing rapidly. In this paper, we develop and evaluate a re-identification risk metric for the case where an intruder wishes to re-identify as many records as possible in a disclosed database. In this case, the intruder is concerned about the overall matching success rate. The metric is evaluated on public and health datasets and recommendations for its use are provided.


The Canadian Journal of Hospital Pharmacy | 2009

Evaluating the Risk of Re-identification of Patients from Hospital Prescription Records

Khaled El Emam; Fida Kamal Dankar; Régis Vaillancourt; Tyson Roffey; Mark Lysyk


edbt/icdt workshops | 2016

Using Robust Estimation Theory to Design Efficient Secure Multiparty Linear Regression

Fida Kamal Dankar; Sabri Boughorbel; Radja Badji

Collaboration


Dive into the Fida Kamal Dankar's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tyson Roffey

Children's Hospital of Eastern Ontario

View shared research outputs
Top Co-Authors

Avatar

Angelica Neisa

Children's Hospital of Eastern Ontario

View shared research outputs
Top Co-Authors

Avatar

Elizabeth Jonker

Children's Hospital of Eastern Ontario

View shared research outputs
Top Co-Authors

Avatar

Régis Vaillancourt

Children's Hospital of Eastern Ontario

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Craig C. Earle

Ontario Institute for Cancer Research

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David Paton

Canadian Institute for Health Information

View shared research outputs
Top Co-Authors

Avatar

Elise Cogo

Children's Hospital of Eastern Ontario

View shared research outputs
Researchain Logo
Decentralizing Knowledge