Timothy H. Wu
National Yang-Ming University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Timothy H. Wu.
BMC Genomics | 2012
Ting-Wen Chen; Ruei-Chi Richie Gan; Timothy H. Wu; Po-Jung Huang; Cheng-Yang Lee; Yi-Ywan M. Chen; Che-Chun Chen; Petrus Tang
BackgroundRecent developments in high-throughput sequencing (HTS) technologies have made it feasible to sequence the complete transcriptomes of non-model organisms or metatranscriptomes from environmental samples. The challenge after generating hundreds of millions of sequences is to annotate these transcripts and classify the transcripts based on their putative functions. Because many biological scientists lack the knowledge to install Linux-based software packages or maintain databases used for transcript annotation, we developed an automatic annotation tool with an easy-to-use interface.MethodsTo elucidate the potential functions of gene transcripts, we integrated well-established annotation tools: Blast2GO, PRIAM and RPS BLAST in a web-based service, FastAnnotator, which can assign Gene Ontology (GO) terms, Enzyme Commission numbers (EC numbers) and functional domains to query sequences.ResultsUsing six transcriptome sequence datasets as examples, we demonstrated the ability of FastAnnotator to assign functional annotations. FastAnnotator annotated 88.1% and 81.3% of the transcripts from the well-studied organisms Caenorhabditis elegans and Streptococcus parasanguinis, respectively. Furthermore, FastAnnotator annotated 62.9%, 20.4%, 53.1% and 42.0% of the sequences from the transcriptomes of sweet potato, clam, amoeba, and Trichomonas vaginalis, respectively, which lack reference genomes. We demonstrated that FastAnnotator can complete the annotation process in a reasonable amount of time and is suitable for the annotation of transcriptomes from model organisms or organisms for which annotated reference genomes are not avaiable.ConclusionsThe sequencing process no longer represents the bottleneck in the study of genomics, and automatic annotation tools have become invaluable as the annotation procedure has become the limiting step. We present FastAnnotator, which was an automated annotation web tool designed to efficiently annotate sequences with their gene functions, enzyme functions or domains. FastAnnotator is useful in transcriptome studies and especially for those focusing on non-model organisms or metatranscriptomes. FastAnnotator does not require local installation and is freely available at http://fastannotator.cgu.edu.tw.
BMC Genomics | 2014
Ting-Wen Chen; Hsin-Pai Li; Chi-Ching Lee; Ruei-Chi Gan; Po-Jung Huang; Timothy H. Wu; Cheng-Yang Lee; Yi-Feng Chang; Petrus Tang
BackgroundChromatin is a dynamic but highly regulated structure. DNA-binding proteins such as transcription factors, epigenetic and chromatin modifiers are responsible for regulating specific gene expression pattern and may result in different phenotypes. To reveal the identity of the proteins associated with the specific region on DNA, chromatin immunoprecipitation (ChIP) is the most widely used technique. ChIP assay followed by next generation sequencing (ChIP-seq) or microarray (ChIP-chip) is often used to study patterns of protein-binding profiles in different cell types and in cancer samples on a genome-wide scale. However, only a limited number of bioinformatics tools are available for ChIP datasets analysis.ResultsWe present ChIPseek, a web-based tool for ChIP data analysis providing summary statistics in graphs and offering several commonly demanded analyses. ChIPseek can provide statistical summary of the dataset including histogram of peak length distribution, histogram of distances to the nearest transcription start site (TSS), and pie chart (or bar chart) of genomic locations for users to have a comprehensive view on the dataset for further analysis. For examining the potential functions of peaks, ChIPseek provides peak annotation, visualization of peak genomic location, motif identification, sequence extraction, and comparison between datasets. Beyond that, ChIPseek also offers users the flexibility to filter peaks and re-analyze the filtered subset of peaks. ChIPseek supports 20 different genome assemblies for 12 model organisms including human, mouse, rat, worm, fly, frog, zebrafish, chicken, yeast, fission yeast, Arabidopsis, and rice. We use demo datasets to demonstrate the usage and intuitive user interface of ChIPseek.ConclusionsChIPseek provides a user-friendly interface for biologists to analyze large-scale ChIP data without requiring any programing skills. All the results and figures produced by ChIPseek can be downloaded for further analysis. The analysis tools built into ChIPseek, especially the ones for selecting and examine a subset of peaks from ChIP data, provides invaluable helps for exploring the high through-put data from either ChIP-seq or ChIP-chip. ChIPseek is freely available at http://chipseek.cgu.edu.tw.
BMC Bioinformatics | 2010
Ting-wen Chen; Timothy H. Wu; Wailap Victor Ng; Wen-chang Lin
BackgroundOrthologs are genes derived from the same ancestor gene loci after speciation events. Orthologous proteins usually have similar sequences and perform comparable biological functions. Therefore, ortholog identification is useful in annotations of newly sequenced genomes. With rapidly increasing number of sequenced genomes, constructing or updating ortholog relationship between all genomes requires lots of effort and computation time. In addition, elucidating ortholog relationships between distantly related genomes is challenging because of the lower sequence similarity. Therefore, an efficient ortholog detection method that can deal with large number of distantly related genomes is desired.ResultsAn efficient ortholog detection pipeline DODO (DOmain based Detection of Orthologs) is created on the basis of domain architectures in this study. Supported by domain composition, which usually directly related with protein function, DODO could facilitate orthologs detection across distantly related genomes. DODO works in two main steps. Starting from domain information, it first assigns protein groups according to their domain architectures and further identifies orthologs within those groups with much reduced complexity. Here DODO is shown to detect orthologs between two genomes in considerably shorter period of time than traditional methods of reciprocal best hits and it is more significant when analyzed a large number of genomes. The output results of DODO are highly comparable with other known ortholog databases.ConclusionsDODO provides a new efficient pipeline for detection of orthologs in a large number of genomes. In addition, a database established with DODO is also easier to maintain and could be updated relatively effortlessly. The pipeline of DODO could be downloaded from http://140.109.42.19:16080/dodo_web/home.htm
BMC Genomics | 2011
Ting-wen Chen; Timothy H. Wu; Wailap Victor Ng; Wen-chang Lin
BackgroundGene duplication provides resources for developing novel genes and new functions while retaining the original functions. In addition, alternative splicing could increase the complexity of expression at the transcriptome and proteome level without increasing the number of gene copy in the genome. Duplication and alternative splicing are thought to work together to provide the diverse functions or expression patterns for eukaryotes. Previously, it was believed that duplication and alternative splicing were negatively correlated and probably interchangeable.ResultsWe look into the relationship between occurrence of alternative splicing and duplication at different time after duplication events. We found duplication and alternative splicing were indeed inversely correlated if only recently duplicated genes were considered, but they became positively correlated when we took those ancient duplications into account. Specifically, for slightly or moderately duplicated genes with gene families containing 2 - 7 paralogs, genes were more likely to evolve alternative splicing and had on average a greater number of alternative splicing isoforms after long-term evolution compared to singleton genes. On the other hand, those large gene families (contain at least 8 paralogs) had a lower proportion of alternative splicing, and fewer alternative splicing isoforms on average even when ancient duplicated genes were taken into consideration. We also found these duplicated genes having alternative splicing were under tighter evolutionary constraints compared to those having no alternative splicing, and had an enrichment of genes that participate in molecular transducer activities.ConclusionsWe studied the association between occurrences of alternative splicing and gene duplication. Our results implicate that there are key differences in functions and evolutionary constraints among singleton genes or duplicated genes with or without alternative splicing incidences. It implies that the gene duplication and alternative splicing may have different functional significance in the evolution of speciation diversity.
Molecular & Cellular Proteomics | 2006
Rueichi R. Gan; Eugene C. Yi; Yulun Chiu; Hookeun Lee; Yu-chieh P. Kao; Timothy H. Wu; Ruedi Aebersold; David R. Goodlett; Wailap Victor Ng
To better understand the extremely halophilic archaeon Halobacterium species NRC-1, we analyzed its soluble proteome by two-dimensional liquid chromatography coupled to electrospray ionization tandem mass spectrometry. A total of 888 unique proteins were identified with a ProteinProphet probability (P) between 0.9 and 1.0. To evaluate the biochemical activities of the organism, the proteomic data were subjected to a biological network analysis using our BMSorter software. This allowed us to examine the proteins expressed in different biomodules and study the interactions between pertinent biomodules. Interestingly an integrated analysis of the enzymes in the amino acid metabolism and citrate cycle networks suggested that up to eight amino acids may be converted to oxaloacetate, fumarate, or oxoglutarate in the citrate cycle for energy production. In addition, glutamate and aspartate may be interconverted from other amino acids or synthesized from citrate cycle intermediates to meet the high demand for the acidic amino acids that are required to build the highly acidic proteome of the organism. Thus this study demonstrated that proteome analysis can provide useful information and help systems analyses of organisms.
BMC Genomics | 2012
Timothy H. Wu; Lichieh J Chu; Jian-Chiao Wang; Ting-Wen Chen; Yin-Jing Tien; Wen-chang Lin; Wailap Victor Ng
BackgroundResearches have been conducted for the identification of differentially expressed genes (DEGs) by generating and mining of cDNA expressed sequence tags (ESTs) for more than a decade. Although the availability of public databases make possible the comprehensive mining of DEGs among the ESTs from multiple tissue types, existing studies usually employed statistics suitable only for two categories. Multi-class test has been developed to enable the finding of tissue specific genes, but subsequent search for cancer genes involves separate two-category test only on the ESTs of the tissue of interest. This constricts the amount of data used. On the other hand, simple pooling of cancer and normal genes from multiple tissue types runs the risk of Simpsons paradox. Here we presented a different approach which searched for multi-cancer DEG candidates by analyzing all pertinent ESTs in all categories and narrowing down the cancer biomarker candidates via integrative analysis with microarray data and selection of secretory and membrane protein genes as well as incorporation of network analysis. Finally, the differential expression patterns of three selected cancer biomarker candidates were confirmed by real-time qPCR analysis.ResultsSeven hundred and twenty three primary DEG candidates (p-value < 0.05 and lower bound of confidence interval of odds ratio ≧ 1.65) were selected from a curated EST database with the application of Cochran-Mantel-Haenszel statistic (CMH). GeneGO analysis results indicated this set as neoplasm enriched. Cross-examination with microarray data further narrowed the list down to 235 genes, among which 96 had membrane or secretory annotations. After examined the candidates in protein interaction network, public tissue expression databases, and literatures, we selected three genes for further evaluation by real-time qPCR with eight major normal and cancer tissues. The higher-than-normal tissue expression of COL3A1, DLG3, and RNF43 in some of the cancer tissues is in agreement with our in silico predictions.ConclusionsSearching digitized transcriptome using CMH enabled us to identify multi-cancer differentially expressed gene candidates. Our methodology demonstrated simultaneously analysis for cancer biomarkers of multiple tissue types with the EST data. With the revived interest in digitizing the transcriptomes by NGS, cancer biomarkers could be more precisely detected from the ESTs. The three candidates identified in this study, COL3A1, DLG3, and RNF43, are valuable targets for further evaluation with a larger sample size of normal and cancer tissue or serum samples.
BMC Bioinformatics | 2016
Richie Ruei-Chi Gan; Ting-Wen Chen; Timothy H. Wu; Po-Jung Huang; Chi-Ching Lee; Yuan-Ming Yeh; Cheng-Hsun Chiu; Hsien-Da Huang; Petrus Tang
BackgroundNext-generation sequencing promises the de novo genomic and transcriptomic analysis of samples of interests. However, there are only a few organisms having reference genomic sequences and even fewer having well-defined or curated annotations. For transcriptome studies focusing on organisms lacking proper reference genomes, the common strategy is de novo assembly followed by functional annotation. However, things become even more complicated when multiple transcriptomes are compared.ResultsHere, we propose a new analysis strategy and quantification methods for quantifying expression level which not only generate a virtual reference from sequencing data, but also provide comparisons between transcriptomes. First, all reads from the transcriptome datasets are pooled together for de novo assembly. The assembled contigs are searched against NCBI NR databases to find potential homolog sequences. Based on the searched result, a set of virtual transcripts are generated and served as a reference transcriptome. By using the same reference, normalized quantification values including RC (read counts), eRPKM (estimated RPKM) and eTPM (estimated TPM) can be obtained that are comparable across transcriptome datasets. In order to demonstrate the feasibility of our strategy, we implement it in the web service PARRoT. PARRoT stands for Pipeline for Analyzing RNA Reads of Transcriptomes. It analyzes gene expression profiles for two transcriptome sequencing datasets. For better understanding of the biological meaning from the comparison among transcriptomes, PARRoT further provides linkage between these virtual transcripts and their potential function through showing best hits in SwissProt, NR database, assigning GO terms. Our demo datasets showed that PARRoT can analyze two paired-end transcriptomic datasets of approximately 100 million reads within just three hours.ConclusionsIn this study, we proposed and implemented a strategy to analyze transcriptomes from non-reference organisms which offers the opportunity to quantify and compare transcriptome profiles through a homolog based virtual transcriptome reference. By using the homolog based reference, our strategy effectively avoids the problems that may cause from inconsistencies among transcriptomes. This strategy will shed lights on the field of comparative genomics for non-model organism. We have implemented PARRoT as a web service which is freely available at http://parrot.cgu.edu.tw.
Scientific Reports | 2017
Ting-Wen Chen; Ruei Chi Gan; Yi Kai Fang; Kun Yi Chien; Wei Chao Liao; Chia Chun Chen; Timothy H. Wu; Ian Yi Feng Chang; Chi Yang; Po-Jung Huang; Yuan Ming Yeh; Cheng-Hsun Chiu; Tzu Wen Huang; Petrus Tang
AbsatractAlong with the constant improvement in high-throughput sequencing technology, an increasing number of transcriptome sequencing projects are carried out in organisms without decoded genome information and even on environmental biological samples. To study the biological functions of novel transcripts, the very first task is to identify their potential functions. We present a web-based annotation tool, FunctionAnnotator, which offers comprehensive annotations, including GO term assignment, enzyme annotation, domain/motif identification and predictions for subcellular localization. To accelerate the annotation process, we have optimized the computation processes and used parallel computing for all annotation steps. Moreover, FunctionAnnotator is designed to be versatile, and it generates a variety of useful outputs for facilitating other analyses. Here, we demonstrate how FunctionAnnotator can be helpful in annotating non-model organisms. We further illustrate that FunctionAnnotator can estimate the taxonomic composition of environmental samples and assist in the identification of novel proteins by combining RNA-Seq data with proteomics technology. In summary, FunctionAnnotator can efficiently annotate transcriptomes and greatly benefits studies focusing on non-model organisms or metatranscriptomes. FunctionAnnotator, a comprehensive annotation web-service tool, is freely available online at: http://fa.cgu.edu.tw/. This new web-based annotator will shed light on field studies involving organisms without a reference genome.
Journal of Proteomics | 2013
Rueyhung Roc Weng; Lichieh Julie Chu; Hung Wei Shu; Timothy H. Wu; Mengchieh Claire Chen; Yuwei Chang; Yihsuan Shannon Tsai; Michael C. Wilson; Yeou-Guang Tsay; David R. Goodlett; Wailap Victor Ng
UNLABELLED Mass measurement and precursor mass assignment are independent processes in proteomic data acquisition. Due to misassignments to C-13 peak, or for other reasons, extensive precursor mass shifts (i.e., deviations of the measured from calculated precursor neutral masses) in LC-MS/MS data obtained with the high-accuracy LTQ-Orbitrap mass spectrometers have been reported in previous studies. Although computational methods for post-acquisition reassignment to monoisotopic mass have been developed to curate the MS/MS spectra prior to database search, a simpler method for estimating the fraction of spectra with precursor mass shift so as to determine whether the data require curation remains desirable. Here, we provide the evidence that an easy approach, which applies a large precursor tolerance (2.1Da or higher) in SEQUEST search against a forward and decoy protein sequence database and then filters the data with PeptideProphet peptide identification probability (p≥0.9), could detect most of the MS/MS spectra containing inaccurate precursor masses. Furthermore, through the implementation of artificial mass shifts on 4000 randomly selected MS/MS spectra, which originally had accurate precursor mass assigned by the mass spectrometers, we demonstrated that the accuracy of the precursor mass has almost negligible influence on the efficacy and fidelity of peptide identification. BIOLOGICAL SIGNIFICANCE Integral precursor mass shift is a known problem and thus proteomic data should be handled and analyzed properly to avoid losing important protein identification and/or quantification information. A quick and easy approach for estimating the number of MS/MS spectra with inaccurate precursor mass assignments would be helpful for evaluating the performance of the instrument, determining whether the data requires curation prior to database search or should be searched with specific search parameter(s). Here we demonstrated most of the MS/MS spectra with inaccurate mass assignments (integral or non-integral changes) that could be easily identified by database search with large precursor tolerance windows.
Genomics | 2012
Ting-Wen Chen; Richie Ruei-Chi Gan; Timothy H. Wu; Wen-Chang Lin; Petrus Tang
During the viral infection and replication processes, viral proteins are highly regulated and may interact with host proteins. However, the functions and interaction partners of many viral proteins have yet to be explored. Here, we compiled a VIral Protein domain DataBase (VIP DB) to associate viral proteins with putative functions and interaction partners. We systematically assign domains and infer the functions of proteins and their protein interaction partners from their domain annotations. A total of 2,322 unique domains that were identified from 2,404 viruses are used as a starting point to correlate GO classification, KEGG metabolic pathway annotation and domain-domain interactions. Of the unique domains, 42.7% have GO records, 39.6% have at least one domain-domain interaction record and 26.3% can also be found in either mammals or plants. This database provides a resource to help virologists identify potential roles for viral protein. All of the information is available at http://vipdb.cgu.edu.tw.