Thomas K. F. Wong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thomas K. F. Wong is active.

Explore More

Publication

Featured researches published by Thomas K. F. Wong.

Nature Methods | 2017

ModelFinder: fast model selection for accurate phylogenetic estimates

Subha Kalyaanamoorthy; Bui Quang Minh; Thomas K. F. Wong; Arndt von Haeseler; Lars S. Jermiin

Model-based molecular phylogenetics plays an important role in comparisons of genomic data, and model selection is a key step in all such analyses. We present ModelFinder, a fast model-selection method that greatly improves the accuracy of phylogenetic estimates by incorporating a model of rate heterogeneity across sites not previously considered in this context and by allowing concurrent searches of model space and tree space.

Bioinformatics | 2012

SOAP3: ultra-fast GPU-based parallel alignment tool for short reads

Chi-Man Liu; Thomas K. F. Wong; Edward Wu; Ruibang Luo; Siu-Ming Yiu; Yingrui Li; Bingqiang Wang; Chang Yu; Xiaowen Chu; Kaiyong Zhao; Ruiqiang Li; Tak Wah Lam

SOAP3 is the first short read alignment tool that leverages the multi-processors in a graphic processing unit (GPU) to achieve a drastic improvement in speed. We adapted the compressed full-text index (BWT) used by SOAP2 in view of the advantages and disadvantages of GPU. When tested with millions of Illumina Hiseq 2000 length-100 bp reads, SOAP3 takes < 30 s to align a million read pairs onto the human reference genome and is at least 7.5 and 20 times faster than BWA and Bowtie, respectively. For aligning reads with up to four mismatches, SOAP3 aligns slightly more reads than BWA and Bowtie; this is because SOAP3, unlike BWA and Bowtie, is not heuristic-based and always reports all answers.

PLOS ONE | 2013

SOAP3-dp: Fast, Accurate and Sensitive GPU-Based Short Read Aligner

Ruibang Luo; Thomas K. F. Wong; Jianqiao Zhu; Chi-Man Liu; Xiaoqian Zhu; Edward Wu; Lap-Kei Lee; Haoxiang Lin; Wenjuan Zhu; David W. Cheung; Hing-Fung Ting; Siu-Ming Yiu; Shaoliang Peng; Chang Yu; Yingrui Li; Ruiqiang Li; Tak Wah Lam

To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, CUSHAW2, GEM and GPU-based aligners BarraCUDA and CUSHAW, SOAP3-dp was found to be two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60%. Real data evaluation using human genome demonstrates SOAP3-dps power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1% FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides the same scoring scheme as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.

Systematic Biology | 2014

Mixture Models of Nucleotide Sequence Evolution that Account for Heterogeneity in the Substitution Process Across Sites and Across Lineages

Vivek Jayaswal; Thomas K. F. Wong; John Robinson; Leon Poladian; Lars S. Jermiin

Molecular phylogenetic studies of homologous sequences of nucleotides often assume that the underlying evolutionary process was globally stationary, reversible, and homogeneous (SRH), and that a model of evolution with one or more site-specific and time-reversible rate matrices (e.g., the GTR rate matrix) is enough to accurately model the evolution of data over the whole tree. However, an increasing body of data suggests that evolution under these conditions is an exception, rather than the norm. To address this issue, several non-SRH models of molecular evolution have been proposed, but they either ignore heterogeneity in the substitution process across sites (HAS) or assume it can be modeled accurately using the distribution. As an alternative to these models of evolution, we introduce a family of mixture models that approximate HAS without the assumption of an underlying predefined statistical distribution. This family of mixture models is combined with non-SRH models of evolution that account for heterogeneity in the substitution process across lineages (HAL). We also present two algorithms for searching model space and identifying an optimal model of evolution that is less likely to over- or underparameterize the data. The performance of the two new algorithms was evaluated using alignments of nucleotides with 10 000 sites simulated under complex non-SRH conditions on a 25-tipped tree. The algorithms were found to be very successful, identifying the correct HAL model with a 75% success rate (the average success rate for assigning rate matrices to the trees 48 edges was 99.25%) and, for the correct HAL model, identifying the correct HAS model with a 98% success rate. Finally, parameter estimates obtained under the correct HAL-HAS model were found to be accurate and precise. The merits of our new algorithms were illustrated with an analysis of 42 337 second codon sites extracted from a concatenation of 106 alignments of orthologous genes encoded by the nuclear genomes of Saccharomyces cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, S. castellii, S. kluyveri, S. bayanus, and Candida albicans. Our results show that second codon sites in the ancestral genome of these species contained 49.1% invariable sites, 39.6% variable sites belonging to one rate category (V1), and 11.3% variable sites belonging to a second rate category (V2). The ancestral nucleotide content was found to differ markedly across these three sets of sites, and the evolutionary processes operating at the variable sites were found to be non-SRH and best modeled by a combination of eight edge-specific rate matrices (four for V1 and four for V2). The number of substitutions per site at the variable sites also differed markedly, with sites belonging to V1 evolving slower than those belonging to V2 along the lineages separating the seven species of Saccharomyces. Finally, sites belonging to V1 appeared to have ceased evolving along the lineages separating S. cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, and S. bayanus, implying that they might have become so selectively constrained that they could be considered invariable sites in these species.

Journal of Computational Biology | 2011

Structural alignment of RNA with complex pseudoknot structure.

Thomas K. F. Wong; Tak Wah Lam; Wing-Kin Sung; Brenda Wing-Yan Cheung; Siu-Ming Yiu

The secondary structure of an ncRNA molecule is known to play an important role in its biological functions. Aligning a known ncRNA to a target candidate to determine the sequence and structural similarity helps in identifying de novo ncRNA molecules that are in the same family of the known ncRNA. However, existing algorithms cannot handle complex pseudoknot structures which are found in nature. In this article, we propose algorithms to handle two types of complex pseudoknots: simple non-standard pseudoknots and recursive pseudoknots. Although our methods are not designed for general pseudoknots, it already covers all known ncRNAs in both Rfam and PseudoBase databases. An evaluation of our algorithms shows that it is useful to identify ncRNA molecules in other species which are in the same family of a known ncRNA.

Scientific Reports | 2017

Mitochondrial DNA and trade data support multiple origins of Helicoverpa armigera (Lepidoptera, Noctuidae) in Brazil

Wee Tek Tay; Tom Walsh; Sharon Downes; Craig S. Anderson; Lars S. Jermiin; Thomas K. F. Wong; Melissa C. Piper; Ester Silva Chang; Isabella Barony Macedo; Cecilia Czepak; Gajanan T. Behere; Pierre Silvie; Miguel F. Soria; Marie Frayssinet; Karl H.J. Gordon

The Old World bollworm Helicoverpa armigera is now established in Brazil but efforts to identify incursion origin(s) and pathway(s) have met with limited success due to the patchiness of available data. Using international agricultural/horticultural commodity trade data and mitochondrial DNA (mtDNA) cytochrome oxidase I (COI) and cytochrome b (Cyt b) gene markers, we inferred the origins and incursion pathways into Brazil. We detected 20 mtDNA haplotypes from six Brazilian states, eight of which were new to our 97 global COI-Cyt b haplotype database. Direct sequence matches indicated five Brazilian haplotypes had Asian, African, and European origins. We identified 45 parsimoniously informative sites and multiple substitutions per site within the concatenated (945 bp) nucleotide dataset, implying that probabilistic phylogenetic analysis methods are needed. High diversity and signatures of uniquely shared haplotypes with diverse localities combined with the trade data suggested multiple incursions and introduction origins in Brazil. Increasing agricultural/horticultural trade activities between the Old and New Worlds represents a significant biosecurity risk factor. Identifying pest origins will enable resistance profiling that reflects countries of origin to be included when developing a resistance management strategy, while identifying incursion pathways will improve biosecurity protocols and risk analysis at biosecurity hotspots including national ports.

Journal of Computational Biology | 2012

Structural Alignment of RNA with Triple Helix Structure

Thomas K. F. Wong; Siu-Ming Yiu

Structural alignment is useful in identifying members of ncRNAs. Existing tools are all based on the secondary structures of the molecules. There is evidence showing that tertiary interactions (the interaction between a single-stranded nucleotide and a base-pair) in triple helix structures are critical in some functions of ncRNAs. In this article, we address the problem of structural alignment of RNAs with the triple helix. We provide a formal definition to capture a simplified model of a triple helix structure, then develop an algorithm of O(mn(3)) time to align a query sequence (of length m) with known triple helix structure with a target sequence (of length n) with an unknown structure. The resulting algorithm is shown to be useful in identifying ncRNA members in a simulated genome.

BMC Genomics | 2010

Refining orthologue groups at the transcript level

Yizhen Jia; Thomas K. F. Wong; You-Qiang Song; Siu-Ming Yiu; David K. Smith

BackgroundOrthologues are genes in different species that are related through divergent evolution from a common ancestor and are expected to have similar functions. Many databases have been created to describe orthologous genes based on existing sequence data. However, alternative splicing (in eukaryotes) is usually disregarded in the determination of orthologue groups and the functional consequences of alternative splicing have not been considered. Most multi-exon genes can encode multiple protein isoforms which often have different functions and can be disease-related. Extending the definition of orthologue groups to take account of alternate splicing and the functional differences it causes requires further examination.ResultsA subset of the orthologous gene groups between human and mouse was selected from the InParanoid database for this study. Each orthologue group was divided into sub-clusters, at the transcript level, using a method based on the sequence similarity of the isoforms. Transcript based sub-clusters were verified by functional signatures of the cluster members in the InterPro database. Functional similarity was higher within than between transcript-based sub-clusters of a defined orthologous group. In certain cases, cancer-related isoforms of a gene could be distinguished from other isoforms of the gene. Predictions of intrinsic disorder in protein regions were also correlated with the isoform sub-clusters within an orthologue group.ConclusionsSub-clustering of orthologue groups at the transcript level is an important step to more accurately define functionally equivalent orthologue groups. This work appears to be the first effort to refine orthologous groupings of genes based on the consequences of alternative splicing on function. Further investigation and refinement of the methodology to classify and verify isoform sub-clusters is needed, particularly to extend the technique to more distantly related species.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2012

Memory Efficient Algorithms for Structural Alignment of RNAs with Pseudoknots

Thomas K. F. Wong; Y. S. Chiu; Tak Wah Lam; Siu-Ming Yiu

In this paper, we consider the problem of structural alignment of a target RNA sequence of length n and a query RNA sequence of length m with known secondary structure that may contain simple pseudoknots or embedded simple pseudoknots. The best known algorithm for solving this problem runs in O(mn3) time for simple pseudoknot or O(mn4) time for embedded simple pseudoknot with space complexity of O(mn3) for both structures, which require too much memory making it infeasible for comparing noncoding RNAs (ncRNAs) with length several hundreds or more. We propose memory efficient algorithms to solve the same problem. We reduce the space complexity to O(n3) for simple pseudoknot and O(mn2 + n3) for embedded simple pseudoknot while maintaining the same time complexity. We also show how to modify our algorithm to handle a restricted class of recursive simple pseudoknot which is found abundant in real data with space complexity of O(mn2 + n3) and time complexity of O(mn4). Experimental results show that our algorithms are feasible for comparing ncRNAs of length more than 500.

Bioinformatics | 2011

RNASAlign: RNA Structural Alignment System

Thomas K. F. Wong; Kwok-Lung Wan; Bay-Yuan Hsu; Brenda Wing-Yan Cheung; Wing-Kai Hon; Tak Wah Lam; Siu-Ming Yiu

MOTIVATION Structural alignment of RNA is found to be a useful computational technique for idenitfying non-coding RNAs (ncRNAs). However, existing tools do not handle structures with pseudoknots. Although algorithms exist that can handle structural alignment for different types of pseudoknots, no software tools are available and users have to determine the type of pseudoknots to select the appropriate algoirthm to use which limits the usage of structural alignment in identifying novel ncRNAs. RESULTS We implemented the first web server, RNASAlign, which can automatically identify the pseudoknot type of a secondary structure and perform structural alignment of a folded RNA with every region of a target DNA/RNA sequence. Regions with high similarity scores and low e-values, together with the detailed alignments will be reported to the user. Experiments on more than 350 ncRNA families show that RNASAlign is effective. AVAILABILITY http://www.bio8.cs.hku.hk/RNASAlign.

Explore More