Thomas C. Conway | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thomas C. Conway is active.

Explore More

Publication

Featured researches published by Thomas C. Conway.

Journal of Logic Programming | 1996

The execution algorithm of mercury, an efficient purely declarative logic programming language

Zoltan Somogyi; Fergus Henderson; Thomas C. Conway

We introduce Mercury, a new purely declarative logic programming language designed to provide the support that groups of application programmers need when building large programs. Mercurys strong type, mode, and determinism systems improve program reliability by catching many errors at compile time. We present a new and relatively simple execution model that takes advantage of the information these systems provide, yielding very efficient code. The Mercury compiler uses this execution model to generate portable C code. Our benchmarking shows that the code generated by our implementation is significantly faster than the code generated by mature optimizing implementations of other logic programming languages.

Bioinformatics | 2011

Succinct data structures for assembling large genomes

Thomas C. Conway; Andrew J. Bromage

MOTIVATION Second-generation sequencing technology makes it feasible for many researches to obtain enough sequence reads to attempt the de novo assembly of higher eukaryotes (including mammals). De novo assembly not only provides a tool for understanding wide scale biological variation, but within human biomedicine, it offers a direct way of observing both large-scale structural variation and fine-scale sequence variation. Unfortunately, improvements in the computational feasibility for de novo assembly have not matched the improvements in the gathering of sequence data. This is for two reasons: the inherent computational complexity of the problem and the in-practice memory requirements of tools. RESULTS In this article, we use entropy compressed or succinct data structures to create a practical representation of the de Bruijn assembly graph, which requires at least a factor of 10 less storage than the kinds of structures used by deployed methods. Moreover, because our representation is entropy compressed, in the presence of sequencing errors it has better scaling behaviour asymptotically than conventional approaches. We present results of a proof-of-concept assembly of a human genome performed on a modest commodity server.

Bioinformatics | 2012

Xenome—a tool for classifying reads from xenograft samples

Thomas C. Conway; Jeremy Wazny; Andrew J. Bromage; Martin Tymms; Dhanya Sooraj; Elizabeth D. Williams; Bryan Beresford-Smith

Motivation: Shotgun sequence read data derived from xenograft material contains a mixture of reads arising from the host and reads arising from the graft. Classifying the read mixture to separate the two allows for more precise analysis to be performed. Results: We present a technique, with an associated tool Xenome, which performs fast, accurate and specific classification of xenograft-derived sequence read data. We have evaluated it on RNA-Seq data from human, mouse and human-in-mouse xenograft datasets. Availability: Xenome is available for non-commercial use from http://www.nicta.com.au/bioinformatics Contact: [email protected]

PLOS Genetics | 2011

Epigenetic regulation of cell type-specific expression patterns in the human mammary epithelium

Reo Maruyama; Sibgat Choudhury; Adam Kowalczyk; Marina Bessarabova; Bryan Beresford-Smith; Thomas C. Conway; Antony Kaspi; Zhenhua Wu; Tatiana Nikolskaya; Vanessa F. Merino; Pang Kuo Lo; X. Shirley Liu; Yuri Nikolsky; Saraswati Sukumar; Izhak Haviv; Kornelia Polyak

Differentiation is an epigenetic program that involves the gradual loss of pluripotency and acquisition of cell type–specific features. Understanding these processes requires genome-wide analysis of epigenetic and gene expression profiles, which have been challenging in primary tissue samples due to limited numbers of cells available. Here we describe the application of high-throughput sequencing technology for profiling histone and DNA methylation, as well as gene expression patterns of normal human mammary progenitor-enriched and luminal lineage-committed cells. We observed significant differences in histone H3 lysine 27 tri-methylation (H3K27me3) enrichment and DNA methylation of genes expressed in a cell type–specific manner, suggesting their regulation by epigenetic mechanisms and a dynamic interplay between the two processes that together define developmental potential. The technologies we developed and the epigenetically regulated genes we identified will accelerate the characterization of primary cell epigenomes and the dissection of human mammary epithelial lineage-commitment and luminal differentiation.

BMC Genomics | 2012

Short read sequence typing (SRST): multi-locus sequence types from short reads

Michael Inouye; Thomas C. Conway; Justin Zobel; Kathryn E. Holt

BackgroundMulti-locus sequence typing (MLST) has become the gold standard for population analyses of bacterial pathogens. This method focuses on the sequences of a small number of loci (usually seven) to divide the population and is simple, robust and facilitates comparison of results between laboratories and over time. Over the last decade, researchers and population health specialists have invested substantial effort in building up public MLST databases for nearly 100 different bacterial species, and these databases contain a wealth of important information linked to MLST sequence types such as time and place of isolation, host or niche, serotype and even clinical or drug resistance profiles. Recent advances in sequencing technology mean it is increasingly feasible to perform bacterial population analysis at the whole genome level. This offers massive gains in resolving power and genetic profiling compared to MLST, and will eventually replace MLST for bacterial typing and population analysis. However given the wealth of data currently available in MLST databases, it is crucial to maintain backwards compatibility with MLST schemes so that new genome analyses can be understood in their proper historical context.ResultsWe present a software tool, SRST, for quick and accurate retrieval of sequence types from short read sets, using inputs easily downloaded from public databases. SRST uses read mapping and an allele assignment score incorporating sequence coverage and variability, to determine the most likely allele at each MLST locus. Analysis of over 3,500 loci in more than 500 publicly accessible Illumina read sets showed SRST to be highly accurate at allele assignment. SRST output is compatible with common analysis tools such as eBURST, Clonal Frame or PhyloViz, allowing easy comparison between novel genome data and MLST data. Alignment, fastq and pileup files can also be generated for novel alleles.ConclusionsSRST is a novel software tool for accurate assignment of sequence types using short read data. Several uses for the tool are demonstrated, including quality control for high-throughput sequencing projects, plasmid MLST and analysis of genomic data during outbreak investigation. SRST is open-source, requires Python, BWA and SamTools, and is available from http://srst.sourceforge.net.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2012

Iterative Dictionary Construction for Compression of Large DNA Data Sets

Shanika Kuruppu; Bryan Beresford-Smith; Thomas C. Conway; Justin Zobel

Genomic repositories increasingly include individual as well as reference sequences, which tend to share long identical and near-identical strings of nucleotides. However, the sequential processing used by most compression algorithms, and the volumes of data involved, mean that these long-range repetitions are not detected. An order-insensitive, disk-based dictionary construction method can detect this repeated content and use it to compress collections of sequences. We explore a dictionary construction method that improves repeat identification in large DNA data sets. Our adaptation, Comrad, of an existing disk-based method identifies exact repeated content in collections of sequences with similarities within and across the set of input sequences. Comrad compresses the data over multiple passes, which is an expensive process, but allows Comrad to compress large data sets within reasonable time and space. Comrad allows for random access to individual sequences and subsequences without decompressing the whole data set. Comrad has no competitor in terms of the size of data sets that it can compress (extending to many hundreds of gigabytes) and, even for smaller data sets, the results are competitive compared to alternatives; as an example, 39 S. cerevisiae genomes compressed to 0.25 bits per base.

Cell Stem Cell | 2013

Molecular profiling of human mammary gland links breast cancer risk to a p27(+) cell population with progenitor characteristics.

Sibgat Choudhury; Vanessa Almendro; Vanessa F. Merino; Zhenhua Wu; Reo Maruyama; Ying Su; Filipe C. Martins; Mary Jo Fackler; Marina Bessarabova; Adam Kowalczyk; Thomas C. Conway; Bryan Beresford-Smith; Geoff Macintyre; Yu Kang Cheng; Zoila Lopez-Bujanda; Antony Kaspi; Rong Hu; Judith Robens; Tatiana Nikolskaya; Vilde D. Haakensen; Stuart J. Schnitt; Pedram Argani; Gabrielle Ethington; Laura Panos; Michael P. Grant; Jason Clark; William Herlihy; S. Joyce Lin; Grace L. Chew; Erik W. Thompson

Early full-term pregnancy is one of the most effective natural protections against breast cancer. To investigate this effect, we have characterized the global gene expression and epigenetic profiles of multiple cell types from normal breast tissue of nulliparous and parous women and carriers of BRCA1 or BRCA2 mutations. We found significant differences in CD44(+) progenitor cells, where the levels of many stem cell-related genes and pathways, including the cell-cycle regulator p27, are lower in parous women without BRCA1/BRCA2 mutations. We also noted a significant reduction in the frequency of CD44(+)p27(+) cells in parous women and showed, using explant cultures, that parity-related signaling pathways play a role in regulating the number of p27(+) cells and their proliferation. Our results suggest that pathways controlling p27(+) mammary epithelial cells and the numbers of these cells relate to breast cancer risk and can be explored for cancer risk assessment and prevention.

PLOS ONE | 2010

Reference-Free Validation of Short Read Data

Jan Schröder; James Bailey; Thomas C. Conway; Justin Zobel

Background High-throughput DNA sequencing techniques offer the ability to rapidly and cheaply sequence material such as whole genomes. However, the short-read data produced by these techniques can be biased or compromised at several stages in the sequencing process; the sources and properties of some of these biases are not always known. Accurate assessment of bias is required for experimental quality control, genome assembly, and interpretation of coverage results. An additional challenge is that, for new genomes or material from an unidentified source, there may be no reference available against which the reads can be checked. Results We propose analytical methods for identifying biases in a collection of short reads, without recourse to a reference. These, in conjunction with existing approaches, comprise a methodology that can be used to quantify the quality of a set of reads. Our methods involve use of three different measures: analysis of base calls; analysis of k-mers; and analysis of distributions of k-mers. We apply our methodology to wide range of short read data and show that, surprisingly, strong biases appear to be present. These include gross overrepresentation of some poly-base sequences, per-position biases towards some bases, and apparent preferences for some starting positions over others. Conclusions The existence of biases in short read data is known, but they appear to be greater and more diverse than identified in previous literature. Statistical analysis of a set of short reads can help identify issues prior to assembly or resequencing, and should help guide chemical or statistical methods for bias rectification.

Pathogenetics | 2014

WGS Analysis and Interpretation in Clinical and Public Health Microbiology Laboratories: What Are the Requirements and How Do Existing Tools Compare?

Kelly L. Wyres; Thomas C. Conway; Saurabh Kumar Garg; Carlos Queiroz; Matthias Reumann; Kathryn E. Holt; Laura I. Rusu

Recent advances in DNA sequencing technologies have the potential to transform the field of clinical and public health microbiology, and in the last few years numerous case studies have demonstrated successful applications in this context. Among other considerations, a lack of user-friendly data analysis and interpretation tools has been frequently cited as a major barrier to routine use of these techniques. Here we consider the requirements of microbiology laboratories for the analysis, clinical interpretation and management of bacterial whole-genome sequence (WGS) data. Then we discuss relevant, existing WGS analysis tools. We highlight many essential and useful features that are represented among existing tools, but find that no single tool fulfils all of the necessary requirements. We conclude that to fully realise the potential of WGS analyses for clinical and public health microbiology laboratories of all scales, we will need to develop tools specifically with the needs of these laboratories in mind.

Bioinformatics | 2012

Gossamer — a resource-efficient de novo assembler

Thomas C. Conway; Jeremy Wazny; Andrew J. Bromage; Justin Zobel; Bryan Beresford-Smith

MOTIVATION The de novo assembly of short read high-throughput sequencing data poses significant computational challenges. The volume of data is huge; the reads are tiny compared to the underlying sequence, and there are significant numbers of sequencing errors. There are numerous software packages that allow users to assemble short reads, but most are either limited to relatively small genomes (e.g. bacteria) or require large computing infrastructure or employ greedy algorithms and thus often do not yield high-quality results. RESULTS We have developed Gossamer, an implementation of the de Bruijn approach to assembly that requires close to the theoretical minimum of memory, but still allows efficient processing. Our results show that it is space efficient and produces high-quality assemblies. AVAILABILITY Gossamer is available for non-commercial use from http://www.genomics.csse.unimelb.edu.au/product-gossamer.php.

Explore More