Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Peggy L. Peissig is active.

Publication


Featured researches published by Peggy L. Peissig.


Nature Biotechnology | 2013

Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data

Joshua C. Denny; Marylyn D. Ritchie; Robert J. Carroll; Raquel Zink; Jonathan D. Mosley; Julie R. Field; Jill M. Pulley; Andrea H. Ramirez; Erica Bowton; Melissa A. Basford; David Carrell; Peggy L. Peissig; Abel N. Kho; Jennifer A. Pacheco; Luke V. Rasmussen; David R. Crosslin; Paul K. Crane; Jyotishman Pathak; Suzette J. Bielinski; Sarah A. Pendergrass; Hua Xu; Lucia A. Hindorff; Rongling Li; Teri A. Manolio; Christopher G. Chute; Rex L. Chisholm; Eric B. Larson; Gail P. Jarvik; Murray H. Brilliant; Catherine A. McCarty

Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10−6 (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.


Science Translational Medicine | 2011

Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium

Abel N. Kho; Jennifer A. Pacheco; Peggy L. Peissig; Luke V. Rasmussen; Katherine M. Newton; Noah Weston; Paul K. Crane; Jyotishman Pathak; Christopher G. Chute; Suzette J. Bielinski; Iftikhar J. Kullo; Rongling Li; Teri A. Manolio; Rex L. Chisholm; Joshua C. Denny

Clinical data captured in electronic medical records accurately identify cases and controls for genome-wide association studies. Where Electronic Records and Genomics Meet There has been a surge of interest in using electronic medical records in hospitals and clinics to capture information about patients that is normally buried in doctors’ handwritten notes. Indeed, the U.S. government has made the implementation of electronic medical records a priority area and has instigated standards for the recording and use of these records. The clinical data captured in electronic medical records including diagnoses, medical tests, and medications provide accurate clinical information that will improve patient care. With the ability to sequence the genomes of individuals faster and cheaper than ever before, it may be possible in the future to include the genome sequences of patients in their electronic medical records. A consortium called the Electronic Medical Records and Genomics Network (eMERGE) has set out to investigate whether clinical data captured in electronic medical records could be used to accurately identify patients with particular diseases for inclusion in genome-wide association studies (GWAS). GWAS scrutinize the genomes of individuals with particular diseases to identify tiny genetic variations that are associated with the risk of developing that disease. Here, the eMERGE consortium reports its study of the electronic medical records from five clinical centers and how accurately it identified patients with one of five diseases: dementia, cataracts, peripheral arterial disease, type 2 diabetes, and cardiac conduction defects. The investigators show that even though the electronic medical records were of different types and did not all use natural language processing to extract information from the records, they were able to obtain robust positive and negative values for identifying patients with these diseases with sufficient accuracy for use in GWAS. They conclude that widespread adoption of electronic medical records will provide real-world clinical data that will be valuable for GWAS and other types of genetic research. Clinical data in electronic medical records (EMRs) are a potential source of longitudinal clinical data for research. The Electronic Medical Records and Genomics Network (eMERGE) investigates whether data captured through routine clinical care using EMRs can identify disease phenotypes with sufficient positive and negative predictive values for use in genome-wide association studies (GWAS). Using data from five different sets of EMRs, we have identified five disease phenotypes with positive predictive values of 73 to 98% and negative predictive values of 98 to 100%. Most EMRs captured key information (diagnoses, medications, laboratory tests) used to define phenotypes in a structured format. We identified natural language processing as an important tool to improve case identification rates. Efforts and incentives to increase the implementation of interoperable EMRs will markedly improve the availability of clinical data for genomics research.


Journal of the American Medical Informatics Association | 2012

Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study

Abel N. Kho; M. Geoffrey Hayes; Laura J. Rasmussen-Torvik; Jennifer A. Pacheco; William K. Thompson; Loren L. Armstrong; Joshua C. Denny; Peggy L. Peissig; Aaron W. Miller; Wei Qi Wei; Suzette J. Bielinski; Christopher G. Chute; Cynthia L. Leibson; Gail P. Jarvik; David R. Crosslin; Christopher S. Carlson; Katherine M. Newton; Wendy A. Wolf; Rex L. Chisholm; William L. Lowe

OBJECTIVE Genome-wide association studies (GWAS) require high specificity and large numbers of subjects to identify genotype-phenotype correlations accurately. The aim of this study was to identify type 2 diabetes (T2D) cases and controls for a GWAS, using data captured through routine clinical care across five institutions using different electronic medical record (EMR) systems. MATERIALS AND METHODS An algorithm was developed to identify T2D cases and controls based on a combination of diagnoses, medications, and laboratory results. The performance of the algorithm was validated at three of the five participating institutions compared against clinician review. A GWAS was subsequently performed using cases and controls identified by the algorithm, with samples pooled across all five institutions. RESULTS The algorithm achieved 98% and 100% positive predictive values for the identification of diabetic cases and controls, respectively, as compared against clinician review. By standardizing and applying the algorithm across institutions, 3353 cases and 3352 controls were identified. Subsequent GWAS using data from five institutions replicated the TCF7L2 gene variant (rs7903146) previously associated with T2D. DISCUSSION By applying stringent criteria to EMR data collected through routine clinical care, cases and controls for a GWAS were identified that subsequently replicated a known genetic variant. The use of standard terminologies to define data elements enabled pooling of subjects and data across five different institutions to achieve the robust numbers required for GWAS. CONCLUSIONS An algorithm using commonly available data from five different EMR can accurately identify T2D cases and controls for genetic study across multiple institutions.


Journal of the American Medical Informatics Association | 2013

Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network

Katherine M. Newton; Peggy L. Peissig; Abel N. Kho; Suzette J. Bielinski; Richard L. Berg; Vidhu Choudhary; Melissa A. Basford; Christopher G. Chute; Iftikhar J. Kullo; Rongling Li; Jennifer A. Pacheco; Luke V. Rasmussen; Leslie Spangler; Joshua C. Denny

BACKGROUND Genetic studies require precise phenotype definitions, but electronic medical record (EMR) phenotype data are recorded inconsistently and in a variety of formats. OBJECTIVE To present lessons learned about validation of EMR-based phenotypes from the Electronic Medical Records and Genomics (eMERGE) studies. MATERIALS AND METHODS The eMERGE network created and validated 13 EMR-derived phenotype algorithms. Network sites are Group Health, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University. RESULTS By validating EMR-derived phenotypes we learned that: (1) multisite validation improves phenotype algorithm accuracy; (2) targets for validation should be carefully considered and defined; (3) specifying time frames for review of variables eases validation time and improves accuracy; (4) using repeated measures requires defining the relevant time period and specifying the most meaningful value to be studied; (5) patient movement in and out of the health plan (transience) can result in incomplete or fragmented data; (6) the review scope should be defined carefully; (7) particular care is required in combining EMR and research data; (8) medication data can be assessed using claims, medications dispensed, or medications prescribed; (9) algorithm development and validation work best as an iterative process; and (10) validation by content experts or structured chart review can provide accurate results. CONCLUSIONS Despite the diverse structure of the five EMRs of the eMERGE sites, we developed, validated, and successfully deployed 13 electronic phenotype algorithms. Validation is a worthwhile process that not only measures phenotype performance but also strengthens phenotype algorithm definitions and enhances their inter-institutional sharing.


Journal of the American Medical Informatics Association | 2012

Importance of multi-modal approaches to effectively identify cataract cases from electronic health records

Peggy L. Peissig; Luke V. Rasmussen; Richard L. Berg; James G. Linneman; Catherine A. McCarty; Carol Waudby; Lin Chen; Joshua C. Denny; Russell A. Wilke; Jyotishman Pathak; David Carrell; Abel N. Kho; Justin Starren

OBJECTIVE There is increasing interest in using electronic health records (EHRs) to identify subjects for genomic association studies, due in part to the availability of large amounts of clinical data and the expected cost efficiencies of subject identification. We describe the construction and validation of an EHR-based algorithm to identify subjects with age-related cataracts. MATERIALS AND METHODS We used a multi-modal strategy consisting of structured database querying, natural language processing on free-text documents, and optical character recognition on scanned clinical images to identify cataract subjects and related cataract attributes. Extensive validation on 3657 subjects compared the multi-modal results to manual chart review. The algorithm was also implemented at participating electronic MEdical Records and GEnomics (eMERGE) institutions. RESULTS An EHR-based cataract phenotyping algorithm was successfully developed and validated, resulting in positive predictive values (PPVs) >95%. The multi-modal approach increased the identification of cataract subject attributes by a factor of three compared to single-mode approaches while maintaining high PPV. Components of the cataract algorithm were successfully deployed at three other institutions with similar accuracy. DISCUSSION A multi-modal strategy incorporating optical character recognition and natural language processing may increase the number of cases identified while maintaining similar PPVs. Such algorithms, however, require that the needed information be embedded within clinical documents. CONCLUSION We have demonstrated that algorithms to identify and characterize cataracts can be developed utilizing data collected via the EHR. These algorithms provide a high level of accuracy even when implemented across multiple EHRs and institutional boundaries.


Clinical Medicine & Research | 2007

Use of an Electronic Medical Record for the Identification of Research Subjects with Diabetes Mellitus

Russell A. Wilke; Richard L. Berg; Peggy L. Peissig; Terrie Kitchner; Bozana Sijercic; Catherine A. McCarty; Daniel J. McCarty

Diabetes mellitus is a rapidly increasing and costly public health problem. Large studies are needed to understand the complex gene-environment interactions that lead to diabetes and its complications. The Marshfield Clinic Personalized Medicine Research Project (PMRP) represents one of the largest population-based DNA biobanks in the United States. As part of an effort to begin phenotyping common diseases within the PMRP, we now report on the construction of a diabetes case-finding algorithm using electronic medical record data from adult subjects aged ≥50 years living in one of the target PMRP ZIP codes. Based upon diabetic diagnostic codes alone, we observed a false positive case rate ranging from 3.0% (in subjects with the highest glycosylated hemoglobin values) to 44.4% (in subjects with the lowest glycosylated hemoglobin values). We therefore developed an improved case finding algorithm that utilizes diabetic diagnostic codes in combination with clinical laboratory data and medication history. This algorithm yielded an estimated prevalence of 24.2% for diabetes mellitus in adult subjects aged ≥50 years.


PLOS ONE | 2011

Knowledge-driven multi-locus analysis reveals gene-gene interactions influencing HDL cholesterol level in two independent EMR-linked biobanks.

Stephen D. Turner; Richard L. Berg; James G. Linneman; Peggy L. Peissig; Dana C. Crawford; Joshua C. Denny; Dan M. Roden; Catherine A. McCarty; Marylyn D. Ritchie; Russell A. Wilke

Genome-wide association studies (GWAS) are routinely being used to examine the genetic contribution to complex human traits, such as high-density lipoprotein cholesterol (HDL-C). Although HDL-C levels are highly heritable (h2∼0.7), the genetic determinants identified through GWAS contribute to a small fraction of the variance in this trait. Reasons for this discrepancy may include rare variants, structural variants, gene-environment (GxE) interactions, and gene-gene (GxG) interactions. Clinical practice-based biobanks now allow investigators to address these challenges by conducting GWAS in the context of comprehensive electronic medical records (EMRs). Here we apply an EMR-based phenotyping approach, within the context of routine care, to replicate several known associations between HDL-C and previously characterized genetic variants: CETP (rs3764261, p = 1.22e-25), LIPC (rs11855284, p = 3.92e-14), LPL (rs12678919, p = 1.99e-7), and the APOA1/C3/A4/A5 locus (rs964184, p = 1.06e-5), all adjusted for age, gender, body mass index (BMI), and smoking status. By using a novel approach which censors data based on relevant co-morbidities and lipid modifying medications to construct a more rigorous HDL-C phenotype, we identified an association between HDL-C and TRIB1, a gene which previously resisted identification in studies with larger sample sizes. Through the application of additional analytical strategies incorporating biological knowledge, we further identified 11 significant GxG interaction models in our discovery cohort, 8 of which show evidence of replication in a second biobank cohort. The strongest predictive model included a pairwise interaction between LPL (which modulates the incorporation of triglyceride into HDL) and ABCA1 (which modulates the incorporation of free cholesterol into HDL). These results demonstrate that gene-gene interactions modulate complex human traits, including HDL cholesterol.


Journal of the American Medical Informatics Association | 2016

PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability

Jacqueline Kirby; Peter Speltz; Luke V. Rasmussen; Melissa A. Basford; Omri Gottesman; Peggy L. Peissig; Jennifer A. Pacheco; Gerard Tromp; Jyotishman Pathak; David Carrell; Stephen Ellis; Todd Lingren; William K. Thompson; Guergana Savova; Jonathan L. Haines; Dan M. Roden; Paul A. Harris; Joshua C. Denny

OBJECTIVE Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites. RESULTS As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%). DISCUSSION These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others. CONCLUSION By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.


pacific symposium on biocomputing | 2004

Study of effect of drug lexicons on medication extraction from electronic medical records.

E. Sirohi; Peggy L. Peissig

Extraction of relevant information from free-text clinical notes is becoming increasingly important in healthcare to provide personalized care to patients. The purpose of this dictionary-based NLP study was to determine the effects of using varying drug lexicons to automatically extract medication information from electronic medical records. A convenience training sample of 52 documents, each containing at least one medication, and a randomized test sample of 100 documents were used in this study. The training and test set documents contained a total of 681 and 641 medications respectively. Three sets of drug lexicons were used as sources for medication extraction: first, containing drug name and generic name; second with drug, generic and short names; third with drug, generic and short names followed by filtering techniques. Extraction with the first drug lexicon resulted in 83.7% sensitivity and 96.2% specificity for the training set and 85.2% sensitivity and 96.9% specificity for the test set. Adding the list of short names used for drugs resulted in increasing sensitivity to 95.0%, but decreased the specificity to 79.2% for the training set. Similar results of increased sensitivity of 96.4% and 80.1% specificity were obtained for the test set. Combination of a set of filtering techniques with data from the second lexicon increased the specificity to 98.5% and 98.8% for the training and test sets respectively while slightly decreasing the sensitivity to 94.1% (training) and 95.8% (test). Overall, the lexicon with filtering resulted in the highest precision, i.e., extracted the highest number of medications while keeping the number of extracted non-medications low.


Genetics in Medicine | 2013

Practical challenges in integrating genomic data into the electronic health record

Abel N. Kho; Luke V. Rasmussen; John J. Connolly; Peggy L. Peissig; Justin Starren; Hakon Hakonarson; M. Geoffrey Hayes

Genetic testing has had limited impact on routine clinical care. Widespread adoption of electronic health records presents a promising means of disseminating genetic testing into diverse care settings. Practical challenges to integration of genomic data into electronic health records include size and complexity of genetic test results, inadequate use of standards for clinical and genetic data, and limitations in electronic health record capacity to store and analyze genetic data. Related challenges include uncertainty in the interpretation of regulatory requirements for return of results, and privacy concerns specific to genetic testing. Successful integration of genomic data may require significant redesign of existing electronic health record systems.Genet Med 15 10, 772–778.Genetics in Medicine (2013); 15 10, 772–778. doi:10.1038/gim.2013.131

Collaboration


Dive into the Peggy L. Peissig's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David C. Page

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Abel N. Kho

Northwestern University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Marylyn D. Ritchie

Pennsylvania State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge