Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chao Pang is active.

Publication


Featured researches published by Chao Pang.


Journal of Biomedical Semantics | 2014

CLO: The cell line ontology

Sirarat Sarntivijai; Yu Lin; Zuoshuang Xiang; Terrence F. Meehan; Alexander D. Diehl; Uma D. Vempati; Stephan C. Schürer; Chao Pang; James Malone; Helen Parkinson; Yue Liu; Terue Takatsuki; Kaoru Saijo; Hiroshi Masuya; Yukio Nakamura; Matthew H. Brush; Melissa Haendel; Jie Zheng; Christian J. Stoeckert; Bjoern Peters; Christopher J. Mungall; Thomas E. Carey; David J. States; Brian D. Athey; Yongqun He

BackgroundCell lines have been widely used in biomedical research. The community-based Cell Line Ontology (CLO) is a member of the OBO Foundry library that covers the domain of cell lines. Since its publication two years ago, significant updates have been made, including new groups joining the CLO consortium, new cell line cells, upper level alignment with the Cell Ontology (CL) and the Ontology for Biomedical Investigation, and logical extensions.Construction and contentCollaboration among the CLO, CL, and OBI has established consensus definitions of cell line-specific terms such as ‘cell line’, ‘cell line cell’, ‘cell line culturing’, and ‘mortal’ vs. ‘immortal cell line cell’. A cell line is a genetically stable cultured cell population that contains individual cell line cells. The hierarchical structure of the CLO is built based on the hierarchy of the in vivo cell types defined in CL and tissue types (from which cell line cells are derived) defined in the UBERON cross-species anatomy ontology. The new hierarchical structure makes it easier to browse, query, and perform automated classification. We have recently added classes representing more than 2,000 cell line cells from the RIKEN BRC Cell Bank to CLO. Overall, the CLO now contains ~38,000 classes of specific cell line cells derived from over 200 in vivo cell types from various organisms.Utility and discussionThe CLO has been applied to different biomedical research studies. Example case studies include annotation and analysis of EBI ArrayExpress data, bioassays, and host-vaccine/pathogen interaction. CLO’s utility goes beyond a catalogue of cell line types. The alignment of the CLO with related ontologies combined with the use of ontological reasoners will support sophisticated inferencing to advance translational informatics development.


Human Mutation | 2012

Observ-OM and Observ-TAB: Universal Syntax Solutions for the Integration, Search, and Exchange of Phenotype And Genotype Information

Tomasz Adamusiak; Helen Parkinson; Juha Muilu; E Roos; Kasper Joeri van der Velde; Gudmundur A. Thorisson; Myles Byrne; Chao Pang; Sirisha Gollapudi; Vincent Ferretti; Hans L. Hillege; Anthony J. Brookes; Morris A. Swertz

Genetic and epidemiological research increasingly employs large collections of phenotypic and molecular observation data from high quality human and model organism samples. Standardization efforts have produced a few simple formats for exchange of these various data, but a lightweight and convenient data representation scheme for all data modalities does not exist, hindering successful data integration, such as assignment of mouse models to orphan diseases and phenotypic clustering for pathways. We report a unified system to integrate and compare observation data across experimental projects, disease databases, and clinical biobanks. The core object model (Observ‐OM) comprises only four basic concepts to represent any kind of observation: Targets, Features, Protocols (and their Applications), and Values. An easy‐to‐use file format (Observ‐TAB) employs Excel to represent individual and aggregate data in straightforward spreadsheets. The systems have been tested successfully on human biobank, genome‐wide association studies, quantitative trait loci, model organism, and patient registry data using the MOLGENIS platform to quickly setup custom data portals. Our system will dramatically lower the barrier for future data sharing and facilitate integrated search across panels and species. All models, formats, documentation, and software are available for free and open source (LGPLv3) at http://www.observ‐om.org. Hum Mutat 33:867–873, 2012.


Journal of the American Medical Informatics Association | 2015

BiobankConnect: software to rapidly connect data elements for pooled analysis across biobanks using ontological and lexical indexing

Chao Pang; Dennis Hendriksen; Martijn Dijkstra; K. Joeri van der Velde; Joël Kuiper; Hans L. Hillege; Morris A. Swertz

Objective Pooling data across biobanks is necessary to increase statistical power, reveal more subtle associations, and synergize the value of data sources. However, searching for desired data elements among the thousands of available elements and harmonizing differences in terminology, data collection, and structure, is arduous and time consuming. Materials and methods To speed up biobank data pooling we developed BiobankConnect, a system to semi-automatically match desired data elements to available elements by: (1) annotating the desired elements with ontology terms using BioPortal; (2) automatically expanding the query for these elements with synonyms and subclass information using OntoCAT; (3) automatically searching available elements for these expanded terms using Lucene lexical matching; and (4) shortlisting relevant matches sorted by matching score. Results We evaluated BiobankConnect using human curated matches from EU-BioSHaRE, searching for 32 desired data elements in 7461 available elements from six biobanks. We found 0.75 precision at rank 1 and 0.74 recall at rank 10 compared to a manually curated set of relevant matches. In addition, best matches chosen by BioSHaRE experts ranked first in 63.0% and in the top 10 in 98.4% of cases, indicating that our system has the potential to significantly reduce manual matching work. Conclusions BiobankConnect provides an easy user interface to significantly speed up the biobank harmonization process. It may also prove useful for other forms of biomedical data integration. All the software can be downloaded as a MOLGENIS open source app from http://www.github.com/molgenis, with a demo available at http://www.biobankconnect.org.


Bioinformatics | 2016

MOLGENIS/connect: a system for semi-automatic integration of heterogeneous phenotype data with applications in biobanks

Chao Pang; David van Enckevort; Mark de Haan; Fleur Kelpin; Jonathan Jetten; Dennis Hendriksen; Tommy de Boer; Bart Charbon; Erwin Winder; K. Joeri van der Velde; Dany Doiron; Isabel Fortier; Hans L. Hillege; Morris A. Swertz

Motivation: While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration. Results: To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontology-based query expansion to overcome variations in terminology. Then it generates algorithms that transform source attributes to a common target DataSchema. These include unit conversion, categorical value matching and complex conversion patterns (e.g. calculation of BMI). In comparison to human-experts, MOLGENIS/connect was able to auto-generate 27% of the algorithms perfectly, with an additional 46% needing only minor editing, representing a reduction in the human effort and expertise needed to pool data. Availability and Implementation: Source code, binaries and documentation are available as open-source under LGPLv3 from http://github.com/molgenis/molgenis and www.molgenis.org/connect. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Database | 2015

SORTA: a system for ontology-based re-coding and technical annotation of biomedical phenotype data.

Chao Pang; Annet Sollie; Anna Sijtsma; Dennis Hendriksen; Bart Charbon; Mark de Haan; Tommy de Boer; Fleur Kelpin; Jonathan Jetten; K. Joeri van der Velde; Nynke Smidt; Rolf H. Sijmons; Hans L. Hillege; Morris A. Swertz

There is an urgent need to standardize the semantics of biomedical data values, such as phenotypes, to enable comparative and integrative analyses. However, it is unlikely that all studies will use the same data collection protocols. As a result, retrospective standardization is often required, which involves matching of original (unstructured or locally coded) data to widely used coding or ontology systems such as SNOMED CT (clinical terms), ICD-10 (International Classification of Disease) and HPO (Human Phenotype Ontology). This data curation process is usually a time-consuming process performed by a human expert. To help mechanize this process, we have developed SORTA, a computer-aided system for rapidly encoding free text or locally coded values to a formal coding system or ontology. SORTA matches original data values (uploaded in semicolon delimited format) to a target coding system (uploaded in Excel spreadsheet, OWL ontology web language or OBO open biomedical ontologies format). It then semi- automatically shortlists candidate codes for each data value using Lucene and n-gram based matching algorithms, and can also learn from matches chosen by human experts. We evaluated SORTA’s applicability in two use cases. For the LifeLines biobank, we used SORTA to recode 90 000 free text values (including 5211 unique values) about physical exercise to MET (Metabolic Equivalent of Task) codes. For the CINEAS clinical symptom coding system, we used SORTA to map to HPO, enriching HPO when necessary (315 terms matched so far). Out of the shortlists at rank 1, we found a precision/recall of 0.97/0.98 in LifeLines and of 0.58/0.45 in CINEAS. More importantly, users found the tool both a major time saver and a quality improvement because SORTA reduced the chances of human mistakes. Thus, SORTA can dramatically ease data (re)coding tasks and we believe it will prove useful for many more projects. Database URL: http://molgenis.org/sorta or as an open source download from http://www.molgenis.org/wiki/SORTA


Bioinformatics | 2018

MOLGENIS Research: Advanced bioinformatics data software for non-bioinformaticians

K. Joeri van der Velde; Floris Imhann; Bart Charbon; Chao Pang; David van Enckevort; Mariska Slofstra; Ruggero Barbieri; Rudi Alberts; Dennis Hendriksen; Fleur Kelpin; Mark de Haan; Tommy de Boer; Sido Haakma; Connor Stroomberg; Salome Scholtens; Gert-Jan van de Geijn; Eleonora A. Festen; Rinse K. Weersma; Morris A. Swertz

Motivation The volume and complexity of biological data increases rapidly. Many clinical professionals and biomedical researchers without a bioinformatics background are generating big ’‐omics’ data, but do not always have the tools to manage, process or publicly share these data. Results Here we present MOLGENIS Research, an open‐source web‐application to collect, manage, analyze, visualize and share large and complex biomedical datasets, without the need for advanced bioinformatics skills. Availability and implementation MOLGENIS Research is freely available (open source software). It can be installed from source code (see http://github.com/molgenis), downloaded as a precompiled WAR file (for your own server), setup inside a Docker container (see http://molgenis.github.io), or requested as a Software‐as‐a‐Service subscription. For a public demo instance and complete installation instructions see http://molgenis.org/research.


Bioinformatics | 2017

BiobankUniverse: Automatic matchmaking between datasets for biobank data discovery and integration

Chao Pang; Fleur Kelpin; David van Enckevort; Niina Eklund; Kaisa Silander; Dennis Hendriksen; Mark de Haan; Jonathan Jetten; Tommy de Boer; Bart Charbon; Petr Holub; Hans L. Hillege; Morris A. Swertz

Motivation Biobanks are indispensable for large‐scale genetic/epidemiological studies, yet it remains difficult for researchers to determine which biobanks contain data matching their research questions. Results To overcome this, we developed a new matching algorithm that identifies pairs of related data elements between biobanks and research variables with high precision and recall. It integrates lexical comparison, Unified Medical Language System ontology tagging and semantic query expansion. The result is BiobankUniverse, a fast matchmaking service for biobanks and researchers. Biobankers upload their data elements and researchers their desired study variables, BiobankUniverse automatically shortlists matching attributes between them. Users can quickly explore matching potential and search for biobanks/data elements matching their research. They can also curate matches and define personalized data‐universes. Availability and implementation BiobankUniverse is available at http://biobankuniverse.com or can be downloaded as part of the open source MOLGENIS suite at http://github.com/molgenis/molgenis. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Scopus | 2012

Observ-OM and observ-TAB: Universal syntax solutions for the integration, search, and exchange of phenotype and genotype information

Tomasz Adamusiak; Helen E. Parkinson; Juha Muilu; Gudmundur A. Thorisson; Myles Byrne; Sirisha Gollapudi; Anthony J. Brookes; Morris A. Swertz; E Roos; Chao Pang; Ferretti; Hans L. Hillege; van der Velde Kj

Genetic and epidemiological research increasingly employs large collections of phenotypic and molecular observation data from high quality human and model organism samples. Standardization efforts have produced a few simple formats for exchange of these various data, but a lightweight and convenient data representation scheme for all data modalities does not exist, hindering successful data integration, such as assignment of mouse models to orphan diseases and phenotypic clustering for pathways. We report a unified system to integrate and compare observation data across experimental projects, disease databases, and clinical biobanks. The core object model (Observ‐OM) comprises only four basic concepts to represent any kind of observation: Targets, Features, Protocols (and their Applications), and Values. An easy‐to‐use file format (Observ‐TAB) employs Excel to represent individual and aggregate data in straightforward spreadsheets. The systems have been tested successfully on human biobank, genome‐wide association studies, quantitative trait loci, model organism, and patient registry data using the MOLGENIS platform to quickly setup custom data portals. Our system will dramatically lower the barrier for future data sharing and facilitate integrated search across panels and species. All models, formats, documentation, and software are available for free and open source (LGPLv3) at http://www.observ‐om.org. Hum Mutat 33:867–873, 2012.


BMC Endocrine Disorders | 2014

The prevalence of metabolic syndrome and metabolically healthy obesity in Europe: a collaborative analysis of ten large cohort studies

Jana V. van Vliet-Ostaptchouk; Marja-Liisa Nuotio; Sandra N. Slagter; Dany Doiron; Krista Fischer; Luisa Foco; Amadou Gaye; Martin Gögele; Margit Heier; Tero Hiekkalinna; Anni Joensuu; Christopher Newby; Chao Pang; Eemil Partinen; Eva Reischl; Christine Schwienbacher; Mari-Liis Tammesoo; Morris A. Swertz; Paul R. Burton; Vincent Ferretti; Isabel Fortier; Lisette Giepmans; Jennifer R. Harris; Hans L. Hillege; Jostein Holmen; Antti Jula; Jenny E. Kootstra-Ros; Kirsti Kvaløy; Turid Lingaas Holmen; Satu Männistö


2nd International Conference on Biomedical Ontology, ICBO 2011 | 2011

Cell line ontology: Redesigning the cell line knowledgebase to aid integrative translational informatics

Sirarat Sarntivijai; Zuoshuang Xiang; Terrence F. Meehan; Alexander D. Diehl; Uma D. Vempati; Stephan C. Schürer; Chao Pang; James Malone; Helen Parkinson; Brian D. Athey; Yongqun He

Collaboration


Dive into the Chao Pang's collaboration.

Top Co-Authors

Avatar

Morris A. Swertz

University Medical Center Groningen

View shared research outputs
Top Co-Authors

Avatar

Dennis Hendriksen

University Medical Center Groningen

View shared research outputs
Top Co-Authors

Avatar

Hans L. Hillege

University Medical Center Groningen

View shared research outputs
Top Co-Authors

Avatar

Fleur Kelpin

University Medical Center Groningen

View shared research outputs
Top Co-Authors

Avatar

K. Joeri van der Velde

University Medical Center Groningen

View shared research outputs
Top Co-Authors

Avatar

Mark de Haan

University Medical Center Groningen

View shared research outputs
Top Co-Authors

Avatar

Tommy de Boer

University Medical Center Groningen

View shared research outputs
Top Co-Authors

Avatar

Bart Charbon

University Medical Center Groningen

View shared research outputs
Top Co-Authors

Avatar

David van Enckevort

University Medical Center Groningen

View shared research outputs
Top Co-Authors

Avatar

Jonathan Jetten

University Medical Center Groningen

View shared research outputs
Researchain Logo
Decentralizing Knowledge