Elizabeth Jonker
Children's Hospital of Eastern Ontario
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Elizabeth Jonker.
Journal of the American Medical Informatics Association | 2009
Khaled El Emam; Fida Kamal Dankar; Romeo Issa; Elizabeth Jonker; Daniel Amyot; Elise Cogo; Jean-Pierre Corriveau; Mark Walker; Sadrul Habib Chowdhury; Régis Vaillancourt; Tyson Roffey; Jim Bottomley
BACKGROUND Explicit patient consent requirements in privacy laws can have a negative impact on health research, leading to selection bias and reduced recruitment. Often legislative requirements to obtain consent are waived if the information collected or disclosed is de-identified. OBJECTIVE The authors developed and empirically evaluated a new globally optimal de-identification algorithm that satisfies the k-anonymity criterion and that is suitable for health datasets. DESIGN Authors compared OLA (Optimal Lattice Anonymization) empirically to three existing k-anonymity algorithms, Datafly, Samarati, and Incognito, on six public, hospital, and registry datasets for different values of k and suppression limits. Measurement Three information loss metrics were used for the comparison: precision, discernability metric, and non-uniform entropy. Each algorithms performance speed was also evaluated. RESULTS The Datafly and Samarati algorithms had higher information loss than OLA and Incognito; OLA was consistently faster than Incognito in finding the globally optimal de-identification solution. CONCLUSIONS For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance.
Journal of Medical Internet Research | 2009
Khaled El Emam; Elizabeth Jonker; Margaret Sampson; Karmela Krleža-Jerić; Angelica Neisa
Background Electronic data capture (EDC) tools provide automated support for data collection, reporting, query resolution, randomization, and validation, among other features, for clinical trials. There is a trend toward greater adoption of EDC tools in clinical trials, but there is also uncertainty about how many trials are actually using this technology in practice. A systematic review of EDC adoption surveys conducted up to 2007 concluded that only 20% of trials are using EDC systems, but previous surveys had weaknesses. Objectives Our primary objective was to estimate the proportion of phase II/III/IV Canadian clinical trials that used an EDC system in 2006 and 2007. The secondary objectives were to investigate the factors that can have an impact on adoption and to develop a scale to assess the extent of sophistication of EDC systems. Methods We conducted a Web survey to estimate the proportion of trials that were using an EDC system. The survey was sent to the Canadian site coordinators for 331 trials. We also developed and validated a scale using Guttman scaling to assess the extent of sophistication of EDC systems. Trials using EDC were compared by the level of sophistication of their systems. Results We had a 78.2% response rate (259/331) for the survey. It is estimated that 41% (95% CI 37.5%-44%) of clinical trials were using an EDC system. Trials funded by academic institutions, government, and foundations were less likely to use an EDC system compared to those sponsored by industry. Also, larger trials tended to be more likely to adopt EDC. The EDC sophistication scale had six levels and a coefficient of reproducibility of 0.901 (P< .001) and a coefficient of scalability of 0.79. There was no difference in sophistication based on the funding source, but pediatric trials were likely to use a more sophisticated EDC system. Conclusion The adoption of EDC systems in clinical trials in Canada is higher than the literature indicated: a large proportion of clinical trials in Canada use some form of automated data capture system. To inform future adoption, research should gather stronger evidence on the costs and benefits of using different EDC systems.
BMC Medical Informatics and Decision Making | 2011
Khaled El Emam; David L. Buckeridge; Angelica Neisa; Elizabeth Jonker; Aman Verma
BackgroundThe public is less willing to allow their personal health information to be disclosed for research purposes if they do not trust researchers and how researchers manage their data. However, the public is more comfortable with their data being used for research if the risk of re-identification is low. There are few studies on the risk of re-identification of Canadians from their basic demographics, and no studies on their risk from their longitudinal data. Our objective was to estimate the risk of re-identification from the basic cross-sectional and longitudinal demographics of Canadians.MethodsUniqueness is a common measure of re-identification risk. Demographic data on a 25% random sample of the population of Montreal were analyzed to estimate population uniqueness on postal code, date of birth, and gender as well as their generalizations, for periods ranging from 1 year to 11 years.ResultsAlmost 98% of the population was unique on full postal code, date of birth and gender: these three variables are effectively a unique identifier for Montrealers. Uniqueness increased for longitudinal data. Considerable generalization was required to reach acceptably low uniqueness levels, especially for longitudinal data. Detailed guidelines and disclosure policies on how to ensure that the re-identification risk is low are provided.ConclusionsA large percentage of Montreal residents are unique on basic demographics. For non-longitudinal data sets, the three character postal code, gender, and month/year of birth represent sufficiently low re-identification risk. Data custodians need to generalize their demographic information further for longitudinal data sets.
Journal of Medical Internet Research | 2011
Khaled El Emam; Katherine Moreau; Elizabeth Jonker
Background Findings and statements about how securely personal health information is managed in clinical research are mixed. Objective The objective of our study was to evaluate the security of practices used to transfer and share sensitive files in clinical trials. Methods Two studies were performed. First, 15 password-protected files that were transmitted by email during regulated Canadian clinical trials were obtained. Commercial password recovery tools were used on these files to try to crack their passwords. Second, interviews with 20 study coordinators were conducted to understand file-sharing practices in clinical trials for files containing personal health information. Results We were able to crack the passwords for 93% of the files (14/15). Among these, 13 files contained thousands of records with sensitive health information on trial participants. The passwords tended to be relatively weak, using common names of locations, animals, car brands, and obvious numeric sequences. Patient information is commonly shared by email in the context of query resolution. Files containing personal health information are shared by email and, by posting them on shared drives with common passwords, to facilitate collaboration. Conclusion If files containing sensitive patient information must be transferred by email, mechanisms to encrypt them and to ensure that password strength is high are necessary. More sophisticated collaboration tools are required to allow file sharing without password sharing. We provide recommendations to implement these practices.
BMC Medical Informatics and Decision Making | 2013
Khaled El Emam; Fida Kamal Dankar; Angelica Neisa; Elizabeth Jonker
BackgroundOur objective was to develop a model for measuring re-identification risk that more closely mimics the behaviour of an adversary by accounting for repeated attempts at matching and verification of matches, and apply it to evaluate the risk of re-identification for Canada’s post-marketing adverse drug event database (ADE).Re-identification is only demonstrably plausible for deaths in ADE. A matching experiment between ADE records and virtual obituaries constructed from Statistics Canada vital statistics was simulated. A new re-identification risk is considered, it assumes that after gathering all the potential matches for a patient record (all records in the obituaries that are potential matches for an ADE record), an adversary tries to verify these potential matches. Two adversary scenarios were considered: (a) a mildly motivated adversary who will stop after one verification attempt, and (b) a highly motivated adversary who will attempt to verify all the potential matches and is only limited by practical or financial considerations.MethodsThe mean percentage of records in ADE that had a high probability of being re-identified was computed.ResultsUnder scenario (a), the risk of re-identification from disclosing the province, age at death, gender, and exact date of the report is quite high, but the removal of province brings down the risk significantly. By only generalizing the date of reporting to month and year and including all other variables, the risk is always low. All ADE records have a high risk of re-identification under scenario (b), but the plausibility of that scenario is limited because of the financial and practical deterrent even for highly motivated adversaries.ConclusionsIt is possible to disclose Canada’s adverse drug event database while ensuring that plausible re-identification risks are acceptably low. Our new re-identification risk model is suitable for such risk assessments.
American Journal of Bioethics | 2013
Khaled El Emam; Elizabeth Jonker; Ester Moher; Luk Arbuckle
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.
canadian conference on artificial intelligence | 2010
Marina Sokolova; Khaled El Emam; Sadrul Chowdhury; Emilio Neri; Sean Rose; Elizabeth Jonker
This study analyzes evaluation measures for rare event detection We introduce a procedure which is built upon the characteristics of rare events We propose properties for evaluation measures which assess the measure applicability to classification of rare events Prevention of leaks of personal health information supports the empirical evidence.
Journal of Medical Internet Research | 2013
Victoria Bobicev; Marina Sokolova; Khaled El Emam; Yasser Jafer; Brian Dewar; Elizabeth Jonker; Stan Matwin
Background Participants in medical forums often reveal personal health information about themselves in their online postings. To feel comfortable revealing sensitive personal health information, some participants may hide their identity by posting anonymously. They can do this by using fake identities, nicknames, or pseudonyms that cannot readily be traced back to them. However, individual writing styles have unique features and it may be possible to determine the true identity of an anonymous user through author attribution analysis. Although there has been previous work on the authorship attribution problem, there has been a dearth of research on automated authorship attribution on medical forums. The focus of the paper is to demonstrate that character-based author attribution works better than word-based methods in medical forums. Objective The goal was to build a system that accurately attributes authorship of messages posted on medical forums. The Authorship Attributor system uses text analysis techniques to crawl medical forums and automatically correlate messages written by the same authors. Authorship Attributor processes unstructured texts regardless of the document type, context, and content. Methods The messages were labeled by nicknames of the forum participants. We evaluated the system’s performance through its accuracy on 6000 messages gathered from 2 medical forums on an in vitro fertilization (IVF) support website. Results Given 2 lists of candidate authors (30 and 50 candidates, respectively), we obtained an F score accuracy in detecting authors of 75% to 80% on messages containing 100 to 150 words on average, and 97.9% on longer messages containing at least 300 words. Conclusions Authorship can be successfully detected in short free-form messages posted on medical forums. This raises a concern about the meaningfulness of anonymous posting on such medical forums. Authorship attribution tools can be used to warn consumers wishing to post anonymously about the likelihood of their identity being determined.
PLOS ONE | 2014
Khaled El Emam; Luk Arbuckle; Aleksander Essex; Saeed Samet; Benjamin Eze; Grant Middleton; David L. Buckeridge; Elizabeth Jonker; Ester Moher; Craig C. Earle
Background There is stigma attached to the identification of residents carrying antimicrobial resistant organisms (ARO) in long term care homes, yet there is a need to collect data about their prevalence for public health surveillance and intervention purposes. Objective We conducted a point prevalence study to assess ARO rates in long term care homes in Ontario using a secure data collection system. Methods All long term care homes in the province were asked to provide colonization or infection counts for methicillin-resistant Staphylococcus aureus (MRSA), vancomycin-resistant enterococci (VRE), and extended-spectrum beta-lactamase (ESBL) as recorded in their electronic medical records, and the number of current residents. Data was collected online during the October-November 2011 period using a Paillier cryptosystem that allows computation on encrypted data. Results A provably secure data collection system was implemented. Overall, 82% of the homes in the province responded. MRSA was the most frequent ARO identified at 3 cases per 100 residents, followed by ESBL at 0.83 per 100 residents, and VRE at 0.56 per 100 residents. The microbiological findings and their distribution were consistent with available provincial laboratory data reporting test results for AROs in hospitals. Conclusions We describe an ARO point prevalence study which demonstrated the feasibility of collecting data from long term care homes securely across the province and providing strong privacy and confidentiality assurances, while obtaining high response rates.
PLOS ONE | 2015
Khaled El Emam; Elizabeth Jonker; Luk Arbuckle; Bradley Malin
The following information is missing from the Competing Interests section: KEE is the founder of Privacy Analytics Inc.