Bill C. H. Chang
University of Melbourne
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bill C. H. Chang.
BMC Bioinformatics | 2006
A. K. M. A. Baten; Bill C. H. Chang; Saman K. Halgamuge; Jason Li
Recent advances and automation in DNA sequencing technology has created a vast amount of DNA sequence data. This increasing growth of sequence data demands better and efficient analysis methods. Identifying genes in this newly accumulated data is an important issue in bioinformatics, and it requires the prediction of the complete gene structure. Accurate identification of splice sites in DNA sequences plays one of the central roles of gene structural prediction in eukaryotes. Effective detection of splice sites requires the knowledge of characteristics, dependencies, and relationship of nucleotides in the splice site surrounding region. A higher-order Markov model is generally regarded as a useful technique for modeling higher-order dependencies. However, their implementation requires estimating a large number of parameters, which is computationally expensive. The proposed method for splice site detection consists of two stages: a first order Markov model (MM1) is used in the first stage and a support vector machine (SVM) with polynomial kernel is used in the second stage. The MM1 serves as a pre-processing step for the SVM and takes DNA sequences as its input. It models the compositional features and dependencies of nucleotides in terms of probabilistic parameters around splice site regions. The probabilistic parameters are then fed into the SVM, which combines them nonlinearly to predict splice sites. When the proposed MM1-SVM model is compared with other existing standard splice site detection methods, it shows a superior performance in all the cases. We proposed an effective pre-processing scheme for the SVM and applied it for the identification of splice sites. This is a simple yet effective splice site detection method, which shows a better classification accuracy and computational speed than some other more complex methods.BackgroundRecent advances and automation in DNA sequencing technology has created a vast amount of DNA sequence data. This increasing growth of sequence data demands better and efficient analysis methods. Identifying genes in this newly accumulated data is an important issue in bioinformatics, and it requires the prediction of the complete gene structure. Accurate identification of splice sites in DNA sequences plays one of the central roles of gene structural prediction in eukaryotes. Effective detection of splice sites requires the knowledge of characteristics, dependencies, and relationship of nucleotides in the splice site surrounding region. A higher-order Markov model is generally regarded as a useful technique for modeling higher-order dependencies. However, their implementation requires estimating a large number of parameters, which is computationally expensive.ResultsThe proposed method for splice site detection consists of two stages: a first order Markov model (MM1) is used in the first stage and a support vector machine (SVM) with polynomial kernel is used in the second stage. The MM1 serves as a pre-processing step for the SVM and takes DNA sequences as its input. It models the compositional features and dependencies of nucleotides in terms of probabilistic parameters around splice site regions. The probabilistic parameters are then fed into the SVM, which combines them nonlinearly to predict splice sites. When the proposed MM1-SVM model is compared with other existing standard splice site detection methods, it shows a superior performance in all the cases.ConclusionWe proposed an effective pre-processing scheme for the SVM and applied it for the identification of splice sites. This is a simple yet effective splice site detection method, which shows a better classification accuracy and computational speed than some other more complex methods.
Genetic Programming and Evolvable Machines | 2004
Bill C. H. Chang; Asanga Ratnaweera; Saman K. Halgamuge; Harry C. Watson
In this paper, a modified particle swarm optimisation algorithm is proposed for protein sequence motif discovery. Protein sequences are represented as a chain of symbols and a protein sequence motif is a short sequence that exists in most of the protein sequence families. Protein sequence symbols are converted into numbers using a one to one amino acid translation table. The simulation uses EGF protein and C2H2 Zinc Finger protein families obtained from the PROSITE database. Simulation results show that the modified particle swarm optimisation algorithm is effective in obtaining global optimum sequence patterns, achieving 96.9 and 99.5 classification accuracy respectively in EGF and C2H2 Zinc Finger protein families. A better true positive hit result is achieved when compared to the motifs published in PROSITE database.
Nature Communications | 2016
Pasi K. Korhonen; Edoardo Pozio; Giuseppe La Rosa; Bill C. H. Chang; Anson V. Koehler; Eric P. Hoberg; Peter R. Boag; Patrick Tan; Aaron R. Jex; Andreas Hofmann; Paul W. Sternberg; Neil D. Young; Robin B. Gasser
Trichinellosis is a globally important food-borne parasitic disease of humans caused by roundworms of the Trichinella complex. Extensive biological diversity is reflected in substantial ecological and genetic variability within and among Trichinella taxa, and major controversy surrounds the systematics of this complex. Here we report the sequencing and assembly of 16 draft genomes representing all 12 recognized Trichinella species and genotypes, define protein-coding gene sets and assess genetic differences among these taxa. Using thousands of shared single-copy orthologous gene sequences, we fully reconstruct, for the first time, a phylogeny and biogeography for the Trichinella complex, and show that encapsulated and non-encapsulated Trichinella taxa diverged from their most recent common ancestor ∼21 million years ago (mya), with taxon diversifications commencing ∼10−7 mya.
International Journal for Parasitology | 2014
Namitha Mohandas; Edoardo Pozio; Giuseppe La Rosa; Pasi K. Korhonen; Neil D. Young; Anson V. Koehler; Ross S. Hall; Paul W. Sternberg; Peter R. Boag; Aaron R. Jex; Bill C. H. Chang; Robin B. Gasser
In the present study we sequenced or re-sequenced, assembled and annotated 15 mitochondrial genomes representing the 12 currently recognised taxa of Trichinella using a deep sequencing-coupled approach. We then defined and compared the gene order in individual mitochondrial genomes (14 to 17.7 kb), evaluated genetic differences among species/genotypes and re-assessed the relationships among these taxa using the mitochondrial nucleic acid or amino acid sequence data sets. In addition, a rich source of mitochondrial genetic markers was defined that could be used in future systematic, epidemiological and population genetic studies of Trichinella. The sequencing-bioinformatic approach employed herein should be applicable to a wide range of eukaryotic parasites.
G3: Genes, Genomes, Genetics | 2014
Ya-Yi Huang; Chueh-Pai Lee; Jason L. Fu; Bill C. H. Chang; Antonius J. M. Matzke; Marjori Matzke
Coconut palm (Cocos nucifera) is a symbol of the tropics and a source of numerous edible and nonedible products of economic value. Despite its nutritional and industrial significance, coconut remains under-represented in public repositories for genomic and transcriptomic data. We report de novo transcript assembly from RNA-seq data and analysis of gene expression in seed tissues (embryo and endosperm) and leaves of a dwarf coconut variety. Assembly of 10 GB sequencing data for each tissue resulted in 58,211 total unigenes in embryo, 61,152 in endosperm, and 33,446 in leaf. Within each unigene pool, 24,857 could be annotated in embryo, 29,731 could be annotated in endosperm, and 26,064 could be annotated in leaf. A KEGG analysis identified 138, 138, and 139 pathways, respectively, in transcriptomes of embryo, endosperm, and leaf tissues. Given the extraordinarily large size of coconut seeds and the importance of small RNA-mediated epigenetic regulation during seed development in model plants, we used homology searches to identify putative homologs of factors required for RNA-directed DNA methylation in coconut. The findings suggest that RNA-directed DNA methylation is important during coconut seed development, particularly in maturing endosperm. This dataset will expand the genomics resources available for coconut and provide a foundation for more detailed analyses that may assist molecular breeding strategies aimed at improving this major tropical crop.
Bioinformatics | 2015
Duleepa Jayasundara; Isaam Saeed; Suhinthan Maheswararajah; Bill C. H. Chang; Sen-Lin Tang; Saman K. Halgamuge
MOTIVATION The combined effect of a high replication rate and the low fidelity of the viral polymerase in most RNA viruses and some DNA viruses results in the formation of a viral quasispecies. Uncovering information about quasispecies populations significantly benefits the study of disease progression, antiviral drug design, vaccine design and viral pathogenesis. We present a new analysis pipeline called ViQuaS for viral quasispecies spectrum reconstruction using short next-generation sequencing reads. ViQuaS is based on a novel reference-assisted de novo assembly algorithm for constructing local haplotypes. A significantly extended version of an existing global strain reconstruction algorithm is also used. RESULTS Benchmarking results showed that ViQuaS outperformed three other previously published methods named ShoRAH, QuRe and PredictHaplo, with improvements of at least 3.1-53.9% in recall, 0-12.1% in precision and 0-38.2% in F-score in terms of strain sequence assembly and improvements of at least 0.006-0.143 in KL-divergence and 0.001-0.035 in root mean-squared error in terms of strain frequency estimation, over the next-best algorithm under various simulation settings. We also applied ViQuaS on a real read set derived from an in vitro human immunodeficiency virus (HIV)-1 population, two independent datasets of foot-and-mouth-disease virus derived from the same biological sample and a real HIV-1 dataset and demonstrated better results than other methods available.
international symposium on neural networks | 2007
A. K. M. A. Baten; Saman K. Halgamuge; Bill C. H. Chang; N. Wickramarachchi
The increasing growth of biological sequence data demands better and efficient analysis methods. Effective detection of various regulatory signals in these sequences requires the knowledge of characteristics, dependencies, and relationship of nucleotides in the surrounding region of the regulatory signals. A higher order Markov model is generally regarded as a useful technique for modeling higher order dependencies of the nucleotides. However, its implementation requires estimating a large number of computationally expensive parameters. In this paper, we propose a hybrid method consisting of a first order Markov model for sequence data preprocessing and a multilayer perceptron neural network for classification. The Markov model captures the compositional features and dependencies of nucleotides in terms of probabilistic parameters which are used as inputs to the classifier. The classifier combines the Markov probabilities nonlinearly for signal detection. When applied to the splice site detection problem using three widely used data sets, it is observed that the proposed hybrid method is able to model higher order dependencies with better classification accuracies.
International Journal of Approximate Reasoning | 2003
Bill C. H. Chang; Saman K. Halgamuge
Abstract In protein sequences, often two sequences that share similar substrings have similar functional properties. Learning of the characteristics and properties of an unknown protein is much easier if its likely functional properties can be predicted by finding the substrings already known from other protein sequences. The sequence pattern search algorithm proposed in this paper searches for similar matches between a pattern and a sequence by using fuzzy logic and calculates the degree of similarity from a sequence inference step. Proteins from 11 domain families are used for simulation and the result shows that the proposed algorithm is capable of identifying sequences that have a similar pattern compared to their family protein motifs.
PLOS ONE | 2015
Jian-Zhi Huang; Chih-Peng Lin; Ting-Chi Cheng; Bill C. H. Chang; Shu-Yu Cheng; Yi-Wen Chen; Chen-Yu Lee; Shih-Wen Chin; Fure-Chyi Chen
Phalaenopsis has a zygomorphic floral structure, including three outer tepals, two lateral inner tepals and a highly modified inner median tepal called labellum or lip; however, the regulation of its organ development remains unelucidated. We generated RNA-seq reads with the Illumina platform for floral organs of the Phalaenopsis wild-type and peloric mutant with a lip-like petal. A total of 43,552 contigs were obtained after de novo assembly. We used differentially expressed gene profiling to compare the transcriptional changes in floral organs for both the wild-type and peloric mutant. Pair-wise comparison of sepals, petals and labellum between peloric mutant and its wild-type revealed 1,838, 758 and 1,147 contigs, respectively, with significant differential expression. PhAGL6a (CUFF.17763), PhAGL6b (CUFF.17763.1), PhMADS1 (CUFF.36625.1), PhMADS4 (CUFF.25909) and PhMADS5 (CUFF.39479.1) were significantly upregulated in the lip-like petal of the peloric mutant. We used real-time PCR analysis of lip-like petals, lip-like sepals and the big lip of peloric mutants to confirm the five genes’ expression patterns. PhAGL6a, PhAGL6b and PhMADS4 were strongly expressed in the labellum and significantly upregulated in lip-like petals and lip-like sepals of peloric-mutant flowers. In addition, PhAGL6b was significantly downregulated in the labellum of the big lip mutant, with no change in expression of PhAGL6a. We provide a comprehensive transcript profile and functional analysis of Phalaenopsis floral organs. PhAGL6a PhAGL6b, and PhMADS4 might play crucial roles in the development of the labellum in Phalaenopsis. Our study provides new insights into how the orchid labellum differs and why the petal or sepal converts to a labellum in Phalaenopsis floral mutants.
PeerJ | 2016
Jian-Zhi Huang; Chih-Peng Lin; Ting-Chi Cheng; Ya-Wen Huang; Yi-Jung Tsai; Shu-Yun Cheng; Yi-Wen Chen; Chueh-Pai Lee; Wan-Chia Chung; Bill C. H. Chang; Shih-Wen Chin; Chen-Yu Lee; Fure-Chyi Chen
The Phalaenopsis orchid is an important potted flower of high economic value around the world. We report the 3.1 Gb draft genome assembly of an important winter flowering Phalaenopsis ‘KHM190’ cultivar. We generated 89.5 Gb RNA-seq and 113 million sRNA-seq reads to use these data to identify 41,153 protein-coding genes and 188 miRNA families. We also generated a draft genome for Phalaenopsis pulcherrima ‘B8802,’ a summer flowering species, via resequencing. Comparison of genome data between the two Phalaenopsis cultivars allowed the identification of 691,532 single-nucleotide polymorphisms. In this study, we reveal that the key role of PhAGL6b in the regulation of labellum organ development involves alternative splicing in the big lip mutant. Petal or sepal overexpressing PhAGL6b leads to the conversion into a lip-like structure. We also discovered that the gibberellin pathway that regulates the expression of flowering time genes during the reproductive phase change is induced by cool temperature. Our work thus depicted a valuable resource for the flowering control, flower architecture development, and breeding of the Phalaenopsis orchids.