Hsiang-Yuan Yeh
National Tsing Hua University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hsiang-Yuan Yeh.
BMC Medical Genomics | 2009
Hsiang-Yuan Yeh; Shih-Wu Cheng; Cheng-Yu Yeh; Shih-Fang Lin; Von-Wun Soo
BackgroundProstate cancer is a world wide leading cancer and it is characterized by its aggressive metastasis. According to the clinical heterogeneity, prostate cancer displays different stages and grades related to the aggressive metastasis disease. Although numerous studies used microarray analysis and traditional clustering method to identify the individual genes during the disease processes, the important gene regulations remain unclear. We present a computational method for inferring genetic regulatory networks from micorarray data automatically with transcription factor analysis and conditional independence testing to explore the potential significant gene regulatory networks that are correlated with cancer, tumor grade and stage in the prostate cancer.ResultsTo deal with missing values in microarray data, we used a K-nearest-neighbors (KNN) algorithm to determine the precise expression values. We applied web services technology to wrap the bioinformatics toolkits and databases to automatically extract the promoter regions of DNA sequences and predicted the transcription factors that regulate the gene expressions. We adopt the microarray datasets consists of 62 primary tumors, 41 normal prostate tissues from Stanford Microarray Database (SMD) as a target dataset to evaluate our method. The predicted results showed that the possible biomarker genes related to cancer and denoted the androgen functions and processes may be in the development of the prostate cancer and promote the cell death in cell cycle. Our predicted results showed that sub-networks of genes SREBF1, STAT6 and PBX1 are strongly related to a high extent while ETS transcription factors ELK1, JUN and EGR2 are related to a low extent. Gene SLC22A3 may explain clinically the differentiation associated with the high grade cancer compared with low grade cancer. Enhancer of Zeste Homolg 2 (EZH2) regulated by RUNX1 and STAT3 is correlated to the pathological stage.ConclusionsWe provide a computational framework to reconstruct the genetic regulatory network from the microarray data using biological knowledge and constraint-based inferences. Our method is helpful in verifying possible interaction relations in gene regulatory networks and filtering out incorrect relations inferred by imperfect methods. We predicted not only individual gene related to cancer but also discovered significant gene regulation networks. Our method is also validated in several enriched published papers and databases and the significant gene regulatory networks perform critical biological functions and processes including cell adhesion molecules, androgen and estrogen metabolism, smooth muscle contraction, and GO-annotated processes. Those significant gene regulations and the critical concept of tumor progression are useful to understand cancer biology and disease treatment.
Journal of Clinical Bioinformatics | 2012
Shih-Heng Yeh; Hsiang-Yuan Yeh; Von-Wun Soo
BackgroundSystematic approach for drug discovery is an emerging discipline in systems biology research area. It aims at integrating interaction data and experimental data to elucidate diseases and also raises new issues in drug discovery for cancer treatment. However, drug target discovery is still at a trial-and-error experimental stage and it is a challenging task to develop a prediction model that can systematically detect possible drug targets to deal with complex diseases.MethodsWe integrate gene expression, disease genes and interaction networks to identify the effective drug targets which have a strong influence on disease genes using network flow approach. In the experiments, we adopt the microarray dataset containing 62 prostate cancer samples and 41 normal samples, 108 known prostate cancer genes and 322 approved drug targets treated in human extracted from DrugBank database to be candidate proteins as our test data. Using our method, we prioritize the candidate proteins and validate them to the known prostate cancer drug targets.ResultsWe successfully identify potential drug targets which are strongly related to the well known drugs for prostate cancer treatment and also discover more potential drug targets which raise the attention to biologists at present. We denote that it is hard to discover drug targets based only on differential expression changes due to the fact that those genes used to be drug targets may not always have significant expression changes. Comparing to previous methods that depend on the network topology attributes, they turn out that the genes having potential as drug targets are weakly correlated to critical points in a network. In comparison with previous methods, our results have highest mean average precision and also rank the position of the truly drug targets higher. It thereby verifies the effectiveness of our method.ConclusionsOur method does not know the real ideal routes in the disease network but it tries to find the feasible flow to give a strong influence to the disease genes through possible paths. We successfully formulate the identification of drug target prediction as a maximum flow problem on biological networks and discover potential drug targets in an accurate manner.
BMC Medical Genomics | 2013
Yu-Fen Huang; Hsiang-Yuan Yeh; Von-Wun Soo
BackgroundDuring the last few years, the knowledge of drug, disease phenotype and protein has been rapidly accumulated and more and more scientists have been drawn the attention to inferring drug-disease associations by computational method. Development of an integrated approach for systematic discovering drug-disease associations by those informational data is an important issue.MethodsWe combine three different networks of drug, genomic and disease phenotype and assign the weights to the edges from available experimental data and knowledge. Given a specific disease, we use our network propagation approach to infer the drug-disease associations.ResultsWe apply prostate cancer and colorectal cancer as our test data. We use the manually curated drug-disease associations from comparative toxicogenomics database to be our benchmark. The ranked results show that our proposed method obtains higher specificity and sensitivity and clearly outperforms previous methods. Our result also show that our method with off-targets information gets higher performance than that with only primary drug targets in both test data.ConclusionsWe clearly demonstrate the feasibility and benefits of using network-based analyses of chemical, genomic and phenotype data to reveal drug-disease associations. The potential associations inferred by our method provide new perspectives for toxicogenomics and drug reposition evaluation.
BMC Systems Biology | 2012
Liang-Chun Chen; Hsiang-Yuan Yeh; Cheng-Yu Yeh; Carlos Roberto Arias; Von-Wun Soo
BackgroundDrug resistance has now posed more severe and emergent threats to human health and infectious disease treatment. However, wet-lab approaches alone to counter drug resistance have so far still achieved limited success due to less knowledge about the underlying mechanisms of drug resistance. Our approach apply a heuristic search algorithm in order to extract active network under drug treatment and use a random walk model to identify potential co-targets for effective antibacterial drugs.ResultsWe use interactome network of Mycobacterium tuberculosis and gene expression data which are treated with two kinds of antibiotic, Isoniazid and Ethionamide as our test data. Our analysis shows that the active drug-treated networks are associated with the trigger of fatty acid metabolism and synthesis and nicotinamide adenine dinucleotide (NADH)-related processes and those results are consistent with the recent experimental findings. Efflux pumps processes appear to be the major mechanisms of resistance but SOS response is significantly up-regulation under Isoniazid treatment. We also successfully identify the potential co-targets with literature confirmed evidences which are related to the glycine-rich membrane, adenosine triphosphate energy and cell wall processes.ConclusionsWith gene expression and interactome data supported, our study points out possible pathways leading to the emergence of drug resistance under drug treatment. We develop a computational workflow for giving new insights to bacterial drug resistance which can be gained by a systematic and global analysis of the bacterial regulation network. Our study also discovers the potential co-targets with good properties in biological and graph theory aspects to overcome the problem of drug resistance.
The Scientific World Journal | 2012
Carlos Roberto Arias; Hsiang-Yuan Yeh; Von-Wun Soo
Finding a genetic disease-related gene is not a trivial task. Therefore, computational methods are needed to present clues to the biomedical community to explore genes that are more likely to be related to a specific disease as biomarker. We present biomarker identification problem using gene prioritization method called gene prioritization from microarray data based on shortest paths, extended with structural and biological properties and edge flux using voting scheme (GP-MIDAS-VXEF). The method is based on finding relevant interactions on protein interaction networks, then scoring the genes using shortest paths and topological analysis, integrating the results using a voting scheme and a biological boosting. We applied two experiments, one is prostate primary and normal samples and the other is prostate primary tumor with and without lymph nodes metastasis. We used 137 truly prostate cancer genes as benchmark. In the first experiment, GP-MIDAS-VXEF outperforms all the other state-of-the-art methods in the benchmark by retrieving the truest related genes from the candidate set in the top 50 scores found. We applied the same technique to infer the significant biomarkers in prostate cancer with lymph nodes metastasis which is not established well.
Archive | 2011
Carlos Roberto Arias; Hsiang-Yuan Yeh; Von-Wun Soo
The identification of genes is an ongoing research issue in the biomedical and bioinformatics community. The Human Genome Project which was completed in 2003, identified approximately 20,000+ genes in the human DNA, but there are still many of these genes for which their function or role is unknown, and this accounts only for healthy DNA. Genetic diseases like Cancer, Alzheimer, Hemophilia and others, have mechanisms that we currently just started to understand. For instance, genes BRCA1 and BRCA2, famous for their role in breast cancer (Friedman et al., 1994), only account for 5% of the incidence of the aforementioned cancer (Oldenburg et al., 2007). Many questions rise: What are the rest of the mechanisms involved in this cancer type? Are there other genes involved? How? This only accounts for one type of cancer, and there are at least 177 different types according to the National Cancer Institute 1. The straightforward method to deal with this problem is to do wet lab experiments with large samples of normal and disease tissue, to test under different conditions the reactions, and check the expression or lack of it in different genes. The complication with this method is the cost, it takes time, it requires specialized equipment, and thus the economic price tag is high. Fortunately the bioinformatics area has acquired maturity during the recent years, biological data is becoming available in different formats throughout different databases and publications are providing new insights. Thanks to these, computational methods can be developed, methods that would save time, effort and money, methods that could help biomedical researchers get clues on which genes to explore on the wet laboratory, so that time is not wasted on genes that are unlikely to contribute in a given disease. Gene Prioritization methods can be used to find genes that were previously unknown to be related to a given disease. The general definition of gene prioritization is: Given a disease D, a candidate gene set C, and the training data T, then input all these data to the method and it will compute a score for each of the candidate genes, higher scoring genes are supposed to be the genes that are most likely related to disease D, see fig. 1. Methods can be classified according to the type of input data that the method uses, as Text and Data MiningMethods and Network Based Methods. Text and Data Mining methods use training data like genetic localisation, gene expression, phenotypic data (van Driel et al., 2003), PubMeb abstracts (Tiffin et al., 2005), spatial gene expression profiles, linkage analysis (Piro et al., 2010), gene ontology and others (Adie et al., 2005; Ashburner et al., 2000; Schlicker et al., 2010); as the name suggests this
bioinformatics and bioengineering | 2007
Hsiang-Yuan Yeh; Shih-Wu Cheng; Von-Wun Soo
Microarray is widely used for the cancer research and identifies different expressions for specific genes. We present a computational method for constructing cancer and normal gene regulatory networks from micorarray data based on transcription factor analysis and independency test. The web service technology is used to wrap the bioinformatics toolkits of methods and databases to automatically extract the promoter regions of DNA sequences and predict the transcription factors that regulate gene expressions. After reconstructing the gene regulatory network, the network statistical measure and network motifs extract the potential genes to compare the sub-networks between the cancer and normal gene networks. We adopt the microarray datasets from Stanford microarray database of prostate cancer as a target application to evaluate the methods.
bioinformatics and bioengineering | 2007
Yu-Ting Huang; Shih-Fang Lin; Chung-Cheng Chiu; Hsiang-Yuan Yeh; Von-Wun Soo
Adverse drug reaction (ADR) may cost a lot of unnecessary medical resources and leads to extra suffering on patients. To provide the prompt information about ADR and avoid the rate of occurrence of ADR is an important task yet to be done. The US Food and Drug Administration (FDA) provides a Adverse Event Reporting System (AERS) which contains a lot of clinical reports about ADRs. However, the biologists still do not know precisely which observed events are directly caused by drug-drug interactions. We use the probability analysis method to find the associations between a set of drugs and the symptoms for predicting the ADR and apply the decision tree to discovery the association rules between the drug-drug interactions and symptoms.
systems, man and cybernetics | 2006
Yu-Ting Huang; Hsiang-Yuan Yeh; Shih-Wu Cheng; Chien-Chih Tu; Chi-Li Kuo; Von-Wun Soo
We develop a framework using ontology inference and semantic processing techniques to help biologists to extract knowledge directly from a large scale of biological literature in NCBI PubMed. The system integrated various sharable thesauri of WordNet, MeSH (Medical Subject Heading), and GO (Gene ontology) to support the automatic semantic annotation and analysis. The natural language processing and semantic processing are facilitated by the ontological inference, and the system could automatically extract the correct molecular interactions from the complex sentences in an abstract automatically. It facilitates the biologists not only to save time and efforts to construct and analyze biological pathways, but also to discover the novel molecular interactions by comparing the information extracted from the literature with that in such existing pathway database as KEGG. We evaluated the system performance based on the pathways in Apoptosis domain.
The Scientific World Journal | 2012
Cheng-Yu Yeh; Hsiang-Yuan Yeh; Carlos Roberto Arias; Von-Wun Soo
With the large availability of protein interaction networks and microarray data supported, to identify the linear paths that have biological significance in search of a potential pathway is a challenge issue. We proposed a color-coding method based on the characteristics of biological network topology and applied heuristic search to speed up color-coding method. In the experiments, we tested our methods by applying to two datasets: yeast and human prostate cancer networks and gene expression data set. The comparisons of our method with other existing methods on known yeast MAPK pathways in terms of precision and recall show that we can find maximum number of the proteins and perform comparably well. On the other hand, our method is more efficient than previous ones and detects the paths of length 10 within 40 seconds using CPU Intel 1.73GHz and 1GB main memory running under windows operating system.