Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jennifer A. Pacheco is active.

Publication


Featured researches published by Jennifer A. Pacheco.


Nature Biotechnology | 2013

Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data

Joshua C. Denny; Marylyn D. Ritchie; Robert J. Carroll; Raquel Zink; Jonathan D. Mosley; Julie R. Field; Jill M. Pulley; Andrea H. Ramirez; Erica Bowton; Melissa A. Basford; David Carrell; Peggy L. Peissig; Abel N. Kho; Jennifer A. Pacheco; Luke V. Rasmussen; David R. Crosslin; Paul K. Crane; Jyotishman Pathak; Suzette J. Bielinski; Sarah A. Pendergrass; Hua Xu; Lucia A. Hindorff; Rongling Li; Teri A. Manolio; Christopher G. Chute; Rex L. Chisholm; Eric B. Larson; Gail P. Jarvik; Murray H. Brilliant; Catherine A. McCarty

Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10−6 (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.


Science Translational Medicine | 2011

Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium

Abel N. Kho; Jennifer A. Pacheco; Peggy L. Peissig; Luke V. Rasmussen; Katherine M. Newton; Noah Weston; Paul K. Crane; Jyotishman Pathak; Christopher G. Chute; Suzette J. Bielinski; Iftikhar J. Kullo; Rongling Li; Teri A. Manolio; Rex L. Chisholm; Joshua C. Denny

Clinical data captured in electronic medical records accurately identify cases and controls for genome-wide association studies. Where Electronic Records and Genomics Meet There has been a surge of interest in using electronic medical records in hospitals and clinics to capture information about patients that is normally buried in doctors’ handwritten notes. Indeed, the U.S. government has made the implementation of electronic medical records a priority area and has instigated standards for the recording and use of these records. The clinical data captured in electronic medical records including diagnoses, medical tests, and medications provide accurate clinical information that will improve patient care. With the ability to sequence the genomes of individuals faster and cheaper than ever before, it may be possible in the future to include the genome sequences of patients in their electronic medical records. A consortium called the Electronic Medical Records and Genomics Network (eMERGE) has set out to investigate whether clinical data captured in electronic medical records could be used to accurately identify patients with particular diseases for inclusion in genome-wide association studies (GWAS). GWAS scrutinize the genomes of individuals with particular diseases to identify tiny genetic variations that are associated with the risk of developing that disease. Here, the eMERGE consortium reports its study of the electronic medical records from five clinical centers and how accurately it identified patients with one of five diseases: dementia, cataracts, peripheral arterial disease, type 2 diabetes, and cardiac conduction defects. The investigators show that even though the electronic medical records were of different types and did not all use natural language processing to extract information from the records, they were able to obtain robust positive and negative values for identifying patients with these diseases with sufficient accuracy for use in GWAS. They conclude that widespread adoption of electronic medical records will provide real-world clinical data that will be valuable for GWAS and other types of genetic research. Clinical data in electronic medical records (EMRs) are a potential source of longitudinal clinical data for research. The Electronic Medical Records and Genomics Network (eMERGE) investigates whether data captured through routine clinical care using EMRs can identify disease phenotypes with sufficient positive and negative predictive values for use in genome-wide association studies (GWAS). Using data from five different sets of EMRs, we have identified five disease phenotypes with positive predictive values of 73 to 98% and negative predictive values of 98 to 100%. Most EMRs captured key information (diagnoses, medications, laboratory tests) used to define phenotypes in a structured format. We identified natural language processing as an important tool to improve case identification rates. Efforts and incentives to increase the implementation of interoperable EMRs will markedly improve the availability of clinical data for genomics research.


Journal of the American Medical Informatics Association | 2012

Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study

Abel N. Kho; M. Geoffrey Hayes; Laura J. Rasmussen-Torvik; Jennifer A. Pacheco; William K. Thompson; Loren L. Armstrong; Joshua C. Denny; Peggy L. Peissig; Aaron W. Miller; Wei Qi Wei; Suzette J. Bielinski; Christopher G. Chute; Cynthia L. Leibson; Gail P. Jarvik; David R. Crosslin; Christopher S. Carlson; Katherine M. Newton; Wendy A. Wolf; Rex L. Chisholm; William L. Lowe

OBJECTIVE Genome-wide association studies (GWAS) require high specificity and large numbers of subjects to identify genotype-phenotype correlations accurately. The aim of this study was to identify type 2 diabetes (T2D) cases and controls for a GWAS, using data captured through routine clinical care across five institutions using different electronic medical record (EMR) systems. MATERIALS AND METHODS An algorithm was developed to identify T2D cases and controls based on a combination of diagnoses, medications, and laboratory results. The performance of the algorithm was validated at three of the five participating institutions compared against clinician review. A GWAS was subsequently performed using cases and controls identified by the algorithm, with samples pooled across all five institutions. RESULTS The algorithm achieved 98% and 100% positive predictive values for the identification of diabetic cases and controls, respectively, as compared against clinician review. By standardizing and applying the algorithm across institutions, 3353 cases and 3352 controls were identified. Subsequent GWAS using data from five institutions replicated the TCF7L2 gene variant (rs7903146) previously associated with T2D. DISCUSSION By applying stringent criteria to EMR data collected through routine clinical care, cases and controls for a GWAS were identified that subsequently replicated a known genetic variant. The use of standard terminologies to define data elements enabled pooling of subjects and data across five different institutions to achieve the robust numbers required for GWAS. CONCLUSIONS An algorithm using commonly available data from five different EMR can accurately identify T2D cases and controls for genetic study across multiple institutions.


Journal of the American Medical Informatics Association | 2013

Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network

Katherine M. Newton; Peggy L. Peissig; Abel N. Kho; Suzette J. Bielinski; Richard L. Berg; Vidhu Choudhary; Melissa A. Basford; Christopher G. Chute; Iftikhar J. Kullo; Rongling Li; Jennifer A. Pacheco; Luke V. Rasmussen; Leslie Spangler; Joshua C. Denny

BACKGROUND Genetic studies require precise phenotype definitions, but electronic medical record (EMR) phenotype data are recorded inconsistently and in a variety of formats. OBJECTIVE To present lessons learned about validation of EMR-based phenotypes from the Electronic Medical Records and Genomics (eMERGE) studies. MATERIALS AND METHODS The eMERGE network created and validated 13 EMR-derived phenotype algorithms. Network sites are Group Health, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University. RESULTS By validating EMR-derived phenotypes we learned that: (1) multisite validation improves phenotype algorithm accuracy; (2) targets for validation should be carefully considered and defined; (3) specifying time frames for review of variables eases validation time and improves accuracy; (4) using repeated measures requires defining the relevant time period and specifying the most meaningful value to be studied; (5) patient movement in and out of the health plan (transience) can result in incomplete or fragmented data; (6) the review scope should be defined carefully; (7) particular care is required in combining EMR and research data; (8) medication data can be assessed using claims, medications dispensed, or medications prescribed; (9) algorithm development and validation work best as an iterative process; and (10) validation by content experts or structured chart review can provide accurate results. CONCLUSIONS Despite the diverse structure of the five EMRs of the eMERGE sites, we developed, validated, and successfully deployed 13 electronic phenotype algorithms. Validation is a worthwhile process that not only measures phenotype performance but also strengthens phenotype algorithm definitions and enhances their inter-institutional sharing.


Journal of the American Medical Informatics Association | 2012

Portability of an algorithm to identify rheumatoid arthritis in electronic health records.

Robert J. Carroll; William K. Thompson; Anne E. Eyler; Arthur M. Mandelin; Tianxi Cai; Raquel Zink; Jennifer A. Pacheco; Chad S. Boomershine; Thomas A. Lasko; Hua Xu; Elizabeth W. Karlson; Raul Guzman Perez; Vivian S. Gainer; Shawn N. Murphy; Eric Ruderman; Richard M. Pope; Robert M. Plenge; Abel N. Kho; Katherine P. Liao; Joshua C. Denny

OBJECTIVES Electronic health records (EHR) can allow for the generation of large cohorts of individuals with given diseases for clinical and genomic research. A rate-limiting step is the development of electronic phenotype selection algorithms to find such cohorts. This study evaluated the portability of a published phenotype algorithm to identify rheumatoid arthritis (RA) patients from EHR records at three institutions with different EHR systems. MATERIALS AND METHODS Physicians reviewed charts from three institutions to identify patients with RA. Each institution compiled attributes from various sources in the EHR, including codified data and clinical narratives, which were searched using one of two natural language processing (NLP) systems. The performance of the published model was compared with locally retrained models. RESULTS Applying the previously published model from Partners Healthcare to datasets from Northwestern and Vanderbilt Universities, the area under the receiver operating characteristic curve was found to be 92% for Northwestern and 95% for Vanderbilt, compared with 97% at Partners. Retraining the model improved the average sensitivity at a specificity of 97% to 72% from the original 65%. Both the original logistic regression models and locally retrained models were superior to simple billing code count thresholds. DISCUSSION These results show that a previously published algorithm for RA is portable to two external hospitals using different EHR systems, different NLP systems, and different target NLP vocabularies. Retraining the algorithm primarily increased the sensitivity at each site. CONCLUSION Electronic phenotype algorithms allow rapid identification of case populations in multiple sites with little retraining.


Clinical Pharmacology & Therapeutics | 2014

Design and anticipated outcomes of the eMERGE-PGx project: a multicenter pilot for preemptive pharmacogenomics in electronic health record systems.

Laura J. Rasmussen-Torvik; Sarah Stallings; Adam S. Gordon; Berta Almoguera; Melissa A. Basford; Suzette J. Bielinski; Ariel Brautbar; Murray H. Brilliant; David Carrell; John J. Connolly; David R. Crosslin; Kimberly F. Doheny; Carlos J. Gallego; Omri Gottesman; Daniel Seung Kim; Kathleen A. Leppig; Rongling Li; Simon Lin; Shannon Manzi; Ana R. Mejia; Jennifer A. Pacheco; Vivian Pan; Jyotishman Pathak; Cassandra Perry; Josh F. Peterson; Cynthia A. Prows; James D. Ralston; Luke V. Rasmussen; Marylyn D. Ritchie; Senthilkumar Sadhasivam

We describe here the design and initial implementation of the eMERGE‐PGx project. eMERGE‐PGx, a partnership of the Electronic Medical Records and Genomics Network and the Pharmacogenomics Research Network, has three objectives: (i) to deploy PGRNseq, a next‐generation sequencing platform assessing sequence variation in 84 proposed pharmacogenes, in nearly 9,000 patients likely to be prescribed drugs of interest in a 1‐ to 3‐year time frame across several clinical sites; (ii) to integrate well‐established clinically validated pharmacogenetic genotypes into the electronic health record with associated clinical decision support and to assess process and clinical outcomes of implementation; and (iii) to develop a repository of pharmacogenetic variants of unknown significance linked to a repository of electronic health record–based clinical phenotype data for ongoing pharmacogenomics discovery. We describe site‐specific project implementation and anticipated products, including genetic variant and phenotype data repositories, novel variant association studies, clinical decision support modules, clinical and process outcomes, approaches to managing incidental findings, and patient and clinician education methods.


Journal of the American Medical Informatics Association | 2012

Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus

Wei Qi Wei; Cynthia L. Leibson; Jeanine E. Ransom; Abel N. Kho; Pedro J. Caraballo; High Seng Chai; Barbara P. Yawn; Jennifer A. Pacheco; Christopher G. Chute

OBJECTIVE To evaluate data fragmentation across healthcare centers with regard to the accuracy of a high-throughput clinical phenotyping (HTCP) algorithm developed to differentiate (1) patients with type 2 diabetes mellitus (T2DM) and (2) patients with no diabetes. MATERIALS AND METHODS This population-based study identified all Olmsted County, Minnesota residents in 2007. We used provider-linked electronic medical record data from the two healthcare centers that provide >95% of all care to County residents (ie, Olmsted Medical Center and Mayo Clinic in Rochester, Minnesota, USA). Subjects were limited to residents with one or more encounter January 1, 2006 through December 31, 2007 at both healthcare centers. DM-relevant data on diagnoses, laboratory results, and medication from both centers were obtained during this period. The algorithm was first executed using data from both centers (ie, the gold standard) and then from Mayo Clinic alone. Positive predictive values and false-negative rates were calculated, and the McNemar test was used to compare categorization when data from the Mayo Clinic alone were used with the gold standard. Age and sex were compared between true-positive and false-negative subjects with T2DM. Statistical significance was accepted as p<0.05. RESULTS With data from both medical centers, 765 subjects with T2DM (4256 non-DM subjects) were identified. When single-center data were used, 252 T2DM subjects (1573 non-DM subjects) were missed; an additional false-positive 27 T2DM subjects (215 non-DM subjects) were identified. The positive predictive values and false-negative rates were 95.0% (513/540) and 32.9% (252/765), respectively, for T2DM subjects and 92.6% (2683/2898) and 37.0% (1573/4256), respectively, for non-DM subjects. Age and sex distribution differed between true-positive (mean age 62.1; 45% female) and false-negative (mean age 65.0; 56.0% female) T2DM subjects. CONCLUSION The findings show that application of an HTCP algorithm using data from a single medical center contributes to misclassification. These findings should be considered carefully by researchers when developing and executing HTCP algorithms.


Journal of the American Medical Informatics Association | 2016

PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability

Jacqueline Kirby; Peter Speltz; Luke V. Rasmussen; Melissa A. Basford; Omri Gottesman; Peggy L. Peissig; Jennifer A. Pacheco; Gerard Tromp; Jyotishman Pathak; David Carrell; Stephen Ellis; Todd Lingren; William K. Thompson; Guergana Savova; Jonathan L. Haines; Dan M. Roden; Paul A. Harris; Joshua C. Denny

OBJECTIVE Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites. RESULTS As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%). DISCUSSION These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others. CONCLUSION By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.


Clinical and Translational Science | 2012

High Density GWAS for LDL Cholesterol in African Americans Using Electronic Medical Records Reveals a Strong Protective Variant in APOE

Laura J. Rasmussen-Torvik; Jennifer A. Pacheco; Russell A. Wilke; William K. Thompson; Marylyn D. Ritchie; Abel N. Kho; Arun Muthalagu; M. Geoff Hayes; Loren L. Armstrong; Douglas A. Scheftner; John T. Wilkins; Rebecca L. Zuvich; David R. Crosslin; Dan M. Roden; Joshua C. Denny; Gail P. Jarvik; Christopher S. Carlson; Iftikhar J. Kullo; Suzette J. Bielinski; Catherine A. McCarty; Rongling Li; Teri A. Manolio; Dana C. Crawford; Rex L. Chisholm

Only one low‐density lipoprotein cholesterol (LDL‐C) genome‐wide association study (GWAS) has been previously reported in ‐African Americans. We performed a GWAS of LDL‐C in African Americans using data extracted from electronic medical records (EMR) in the eMERGE network. African Americans were genotyped on the Illumina 1M chip. All LDL‐C measurements, prescriptions, and diagnoses of concomitant disease were extracted from EMR. We created two analytic datasets; one dataset having median LDL‐C calculated after the exclusion of some lab values based on comorbidities and medication (n= 618) and another dataset having median LDL‐C calculated without any exclusions (n= 1,249). SNP rs7412 in APOE was strongly associated with LDL‐C in both datasets (p < 5 × 10−8). In the dataset with exclusions, a decrease of 20.0 mg/dL per minor allele was observed. The effect size was attenuated (12.3 mg/dL) in the dataset without any lab values excluded. Although other signals in APOE have been detected in previous GWAS, this large and important SNP association has not been well detected in large GWAS because rs7412 was not included on many genotyping arrays. Use of median LDL‐C extracted from EMR after exclusions for medications and comorbidities increased the percentage of trait variance explained by genetic variation. Clin Trans Sci 2012; Volume 5: 394–399


Journal of the American Medical Informatics Association | 2015

Desiderata for computable representations of electronic health records-driven phenotype algorithms.

Huan Mo; William K. Thompson; Luke V. Rasmussen; Jennifer A. Pacheco; Guoqian Jiang; Richard C. Kiefer; Qian Zhu; Jie Xu; Enid Montague; David Carrell; Todd Lingren; Frank D. Mentch; Yizhao Ni; Firas H. Wehbe; Peggy L. Peissig; Gerard Tromp; Eric B. Larson; Christopher G. Chute; Jyotishman Pathak; Joshua C. Denny; Peter Speltz; Abel N. Kho; Gail P. Jarvik; Cosmin Adrian Bejan; Marc S. Williams; Kenneth M. Borthwick; Terrie Kitchner; Dan M. Roden; Paul A. Harris

Background Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM). Methods A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in PheKB.org and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms. Results We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility. Conclusion A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages.

Collaboration


Dive into the Jennifer A. Pacheco's collaboration.

Top Co-Authors

Avatar

Joshua C. Denny

Vanderbilt University Medical Center

View shared research outputs
Top Co-Authors

Avatar

Abel N. Kho

Northwestern University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gail P. Jarvik

University of Washington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge