Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nansu Zong is active.

Publication


Featured researches published by Nansu Zong.


Nature Genetics | 2017

Finding useful data across multiple biomedical data repositories using DataMed

Lucila Ohno-Machado; Susanna-Assunta Sansone; George Alter; Ian Fore; Jeffrey S. Grethe; Hua Xu; Alejandra Gonzalez-Beltran; Philippe Rocca-Serra; Anupama E. Gururaj; Elizabeth A. Bell; Ergin Soysal; Nansu Zong; Hyeoneui Kim

The value of broadening searches for data across multiple repositories has been identified by the biomedical research community. As part of the US National Institutes of Health (NIH) Big Data to Knowledge initiative, we work with an international community of researchers, service providers and knowledge experts to develop and test a data index and search engine, which are based on metadata extracted from various data sets in a range of repositories. DataMed is designed to be, for data, what PubMed has been for the scientific literature. DataMed supports the findability and accessibility of data sets. These characteristics—along with interoperability and reusability—compose the four FAIR principles to facilitate knowledge discovery in todays big data–intensive science landscape.


Scientific Data | 2017

DATS, the data tag suite to enable discoverability of datasets

Susanna-Assunta Sansone; Alejandra Gonzalez-Beltran; Philippe Rocca-Serra; George Alter; Jeffrey S. Grethe; Hua Xu; Ian Fore; Jared Lyle; Anupama E. Gururaj; Xiaoling Chen; Hyeoneui Kim; Nansu Zong; Yueling Li; Ruiling Liu; I. Burak Ozyurt; Lucila Ohno-Machado

Today’s science increasingly requires effective ways to find and access existing datasets that are distributed across a range of repositories. For researchers in the life sciences, discoverability of datasets may soon become as essential as identifying the latest publications via PubMed. Through an international collaborative effort funded by the National Institutes of Health (NIH)’s Big Data to Knowledge (BD2K) initiative, we have designed and implemented the DAta Tag Suite (DATS) model to support the DataMed data discovery index. DataMed’s goal is to be for data what PubMed has been for the scientific literature. Akin to the Journal Article Tag Suite (JATS) used in PubMed, the DATS model enables submission of metadata on datasets to DataMed. DATS has a core set of elements, which are generic and applicable to any type of dataset, and an extended set that can accommodate more specialized data types. DATS is a platform-independent model also available as an annotated serialization in schema.org, which in turn is widely used by major search engines like Google, Microsoft, Yahoo and Yandex.


Bioinformatics | 2017

Deep mining heterogeneous networks of biomedical linked data to predict novel drug-target associations.

Nansu Zong; Hyeoneui Kim; Victoria Ngo; Olivier Harismendy

Motivation: A heterogeneous network topology possessing abundant interactions between biomedical entities has yet to be utilized in similarity‐based methods for predicting drug‐target associations based on the array of varying features of drugs and their targets. Deep learning reveals features of vertices of a large network that can be adapted in accommodating the similarity‐based solutions to provide a flexible method of drug‐target prediction. Results: We propose a similarity‐based drug‐target prediction method that enhances existing association discovery methods by using a topology‐based similarity measure. DeepWalk, a deep learning method, is adopted in this study to calculate the similarities within Linked Tripartite Network (LTN), a heterogeneous network generated from biomedical linked datasets. This proposed method shows promising results for drug‐target association prediction: 98.96% AUC ROC score with a 10‐fold cross‐validation and 99.25% AUC ROC score with a Monte Carlo cross‐validation with LTN. By utilizing DeepWalk, we demonstrate that: (i) this method outperforms other existing topology‐based similarity computation methods, (ii) the performance is better for tripartite than with bipartite networks and (iii) the measure of similarity using network topology outperforms the ones derived from chemical structure (drugs) or genomic sequence (targets). Our proposed methodology proves to be capable of providing a promising solution for drug‐target prediction based on topological similarity with a heterogeneous network, and may be readily re‐purposed and adapted in the existing of similarity‐based methodologies. Availability and Implementation: The proposed method has been developed in JAVA and it is available, along with the data at the following URL: https://github.com/zongnansu1982/drug‐target‐prediction. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Neurocomputing | 2017

xStore: Federated temporal query processing for large scale RDF triples on a cloud environment

Jinhyun Ahn; Jae-Hong Eom; Sejin Nam; Nansu Zong; Dong-Hyuk Im; Hong-Gee Kim

Abstract Temporal information retrieval tasks have a long history in information retrieval field and also have attracted neuroscientists working on memory system. It becomes more important in Semantic Web where structured data in RDF triples, often with temporal information, are rapidly accumulated over time. Existing triple stores already support loading RDF triples and answering a given SPARQL query with time interval constraints. However, few triple stores has been optimized for processing time interval queries which are important for temporal information retrieval tasks. In this paper, we propose xStore , a federated SPARQL engine running on a cloud environment, which supports a fast processing of temporal queries. xStore is built on top of heterogeneous storages such as key-value stores and conventional triple stores. Experiments over real-world temporal datasets showed that our approach is faster than a conventional SPARQL engine for processing temporal queries.


bioRxiv | 2016

DataMed: Finding useful data across multiple biomedical data repositories

Lucila Ohno-Machado; Susanna-Assunta Sansone; George Alter; Ian Fore; Jeffrey S. Grethe; Hua Xu; Alejandra Gonzalez-Beltran; Philippe Rocca-Serra; Ergin Soysal; Nansu Zong; Hyeoneui Kim

The value of broadening searches for data across multiple repositories has been identified by the biomedical research community. As part of the NIH Big Data to Knowledge initiative, we work with an international community of researchers, service providers and knowledge experts to develop and test a data index and search engine, which are based on metadata extracted from various datasets in a range of repositories. DataMed is designed to be, for data, what PubMed has been for the scientific literature. DataMed supports Findability and Accessibility of datasets. These characteristics - along with Interoperability and Reusability - compose the four FAIR principles to facilitate knowledge discovery in today’s big data-intensive science landscape.


Computers in Biology and Medicine | 2016

Structuralizing biomedical abstracts with discriminative linguistic features

Sejin Nam; Senator Jeong; Sang-Kyun Kim; Hong-Gee Kim; Victoria Ngo; Nansu Zong

OBJECTIVE Nearly 75% of the abstracts in MEDLINE papers present in an unstructured format. This study aims to automate the reformatting of unstructured abstracts into the Introduction, Methods, Results, and Discussion (IMRAD) format. The quality of this reformatting relies on the features used in sentence classification. Therefore, we explored the most effective linguistic features in MEDLINE papers. METHODS We constructed a feature set consisting of bag of words, linguistic features, grammatical features, and structural features. In order to evaluate the effectiveness, which is the capability of the sentence classification with the features, three datasets from PubMed Central Open Access Subset were selected and constructed: (1) structured abstract (SA) for training, (2) unstructured RCT abstract (UA-1) and (3) unstructured general abstract (UA-2). F-score and accuracy were used to measure the effectiveness on IMRAD section level and the overall classification. RESULTS Adding linguistic features improves the classification of the abstract sentence from 1.2% to 35.8% in terms of accuracy in three abstract datasets. The highest accuracies achieved were 91.7% in SA, 86.3% in UA-1, and 77.9% in UA-2. Linguistic features (dimensions=15) had fewer dimensions than bag-of-words (dimensions= 1541). All representative linguistic features (n-gram and verb phrase, and noun phrase) for each section are identified in our system (available at http://abstract.bike.re.kr). CONCLUSION Linguistic features can be used to effectively classify sentence with low computation burden in MEDLINE abstract.


Journal of the American Medical Informatics Association | 2018

DataMed – an open source discovery index for finding biomedical datasets

Xiaoling Chen; Anupama E. Gururaj; Burak Ozyurt; Ruiling Liu; Ergin Soysal; Trevor Cohen; Firat Tiryaki; Yueling Li; Nansu Zong; Min Jiang; Deevakar Rogith; Mandana Salimi; Hyeoneui Kim; Philippe Rocca-Serra; Alejandra Gonzalez-Beltran; Claudiu Farcas; Todd R. Johnson; Ron Margolis; George Alter; Susanna-Assunta Sansone; Ian Fore; Lucila Ohno-Machado; Jeffrey S. Grethe; Hua Xu

Abstract Objective Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. Materials and Methods DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health–funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. Results and Conclusion Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community.


data and knowledge engineering | 2017

Constructing faceted taxonomy for heterogeneous entities based on object properties in linked data

Nansu Zong; Hong-Gee Kim; Sejin Nam

Abstract The interlinking of data across the web, a concept known as Linked Data, fosters opportunities in data sharing and reusability. However, it may also pose some challenges, which includes the absence of concept taxonomies by which to organize heterogeneous entities that are from different data sources and diverse domains. Learning T-Box (Terminology Box) from A-Box (Assertion Box) has been studied to provide users with concept taxonomies, and is considered a better solution than mapping Linked Data sets with published ontologies. Yet, the existing process of automatically generated taxonomies that classify entities in a particular manner can be improved. Thus, this study aims to automatically create a faceted taxonomy to organize heterogeneous entities, enabling varying classifications of entities by diverse sub-taxonomies, to support faceted search and navigation for linked data applications. The authors have developed a framework on which each facet represented by an object property is used to extract portions of data in the data space, and an Instance-based Concept Taxonomy generation algorithm is developed to build a sub-taxonomy. Additionally, the strategies for sub-taxonomy refinement are proposed. Two experiments have been conducted to prove the promising performances of the proposed method in terms of efficiency and effectiveness.


Computers in Biology and Medicine | 2017

Supporting inter-topic entity search for biomedical Linked Data based on heterogeneous relationships

Nansu Zong; Sungin Lee; Jinhyun Ahn; Hong-Gee Kim

OBJECTIVE The keyword-based entity search restricts search space based on the preference of search. When given keywords and preferences are not related to the same biomedical topic, existing biomedical Linked Data search engines fail to deliver satisfactory results. This research aims to tackle this issue by supporting an inter-topic search-improving search with inputs, keywords and preferences, under different topics. METHODS This study developed an effective algorithm in which the relations between biomedical entities were used in tandem with a keyword-based entity search, Siren. The algorithm, PERank, which is an adaptation of Personalized PageRank (PPR), uses a pair of input: (1) search preferences, and (2) entities from a keyword-based entity search with a keyword query, to formalize the search results on-the-fly based on the index of the precomputed Individual Personalized PageRank Vectors (IPPVs). RESULTS Our experiments were performed over ten linked life datasets for two query sets, one with keyword-preference topic correspondence (intra-topic search), and the other without (inter-topic search). The experiments showed that the proposed method achieved better search results, for example a 14% increase in precision for the inter-topic search than the baseline keyword-based search engine. CONCLUSION The proposed method improved the keyword-based biomedical entity search by supporting the inter-topic search without affecting the intra-topic search based on the relations between different entities.


AMIA | 2017

Explorative Analyses on Indexing OMOP based Clinical Datasets with DATS.

Ethan Park; Diana Guijarro; Imho Jang; Stephen Trac; Nansu Zong; Hyeoneui Kim

Collaboration


Dive into the Nansu Zong's collaboration.

Top Co-Authors

Avatar

Hyeoneui Kim

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hua Xu

University of Texas Health Science Center at Houston

View shared research outputs
Top Co-Authors

Avatar

Ian Fore

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hong-Gee Kim

Seoul National University

View shared research outputs
Researchain Logo
Decentralizing Knowledge