Trevor I. Dix | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Trevor I. Dix is active.

Explore More

Publication

Featured researches published by Trevor I. Dix.

Information Processing Letters | 1986

A bit-string longest-common-subsequence algorithm

Lloyd Allison; Trevor I. Dix

Abstract A longest-common-subsequence algorithm is described which operates in terms of bit or bit-string operations. It offers a speedup of the order of the word-length on a conventional computer.

data compression conference | 2007

A Simple Statistical Algorithm for Biological Sequence Compression

Minh Duc Cao; Trevor I. Dix; Lloyd Allison; Chris Mears

This paper introduces a novel algorithm for biological sequence compression that makes use of both statistical properties and repetition within sequences. A panel of experts is maintained to estimate the probability distribution of the next symbol in the sequence to be encoded. Expert probabilities are combined to obtain the final distribution. The resulting information sequence provides insight for further study of the biological sequence. Each symbol is then encoded by arithmetic coding. Experiments show that our algorithm outperforms existing compressors on typical DNA and protein sequence datasets while maintaining a practical running time

Computational Biology and Chemistry | 2000

Sequence complexity for biological sequence analysis

Lloyd Allison; Linda Stern; Timothy Edgoose; Trevor I. Dix

A new statistical model for DNA considers a sequence to be a mixture of regions with little structure and regions that are approximate repeats of other subsequences, i.e. instances of repeats do not need to match each other exactly. Both forward- and reverse-complementary repeats are allowed. The model has a small number of parameters which are fitted to the data. In general there are many explanations for a given sequence and how to compute the total probability of the data given the model is shown. Computer algorithms are described for these tasks. The model can be used to compute the information content of a sequence, either in total or base by base. This amounts to looking at sequences from a data-compression point of view and it is argued that this is a good way to tackle intelligent sequence analysis in general.

BMC Bioinformatics | 2007

Comparative analysis of long DNA sequences by per element information content using different contexts

Trevor I. Dix; David R. Powell; Lloyd Allison; Julie Bernal; Samira Jaeger; Linda Stern

BackgroundFeatures of a DNA sequence can be found by compressing the sequence under a suitable model; good compression implies low information content. Good DNA compression models consider repetition, differences between repeats, and base distributions. From a linear DNA sequence, a compression model can produce a linear information sequence. Linear space complexity is important when exploring long DNA sequences of the order of millions of bases. Compressing a sequence in isolation will include information on self-repetition. Whereas compressing a sequence Y in the context of another X can find what new information X gives about Y. This paper presents a methodology for performing comparative analysis to find features exposed by such models.ResultsWe apply such a model to find features across chromosomes of Cyanidioschyzon merolae. We present a tool that provides useful linear transformations to investigate and save new sequences. Various examples illustrate the methodology, finding features for sequences alone and in different contexts. We also show how to highlight all sets of self-repetition features, in this case within Plasmodium falciparum chromosome 2.ConclusionThe methodology finds features that are significant and that biologists confirm. The exploration of long information sequences in linear time and space is fast and the saved results are self documenting.

Molecular and Biochemical Parasitology | 2001

Discovering patterns in Plasmodium falciparum genomic DNA

Linda Stern; Lloyd Allison; Ross L. Coppel; Trevor I. Dix

A method has been developed for discovering patterns in DNA sequences. Loosely based on the well-known Lempel Ziv model for text compression, the model detects repeated sequences in DNA. The repeats can be forward or inverted, and they need not be exact. The method is particularly useful for detecting distantly related sequences, and for finding patterns in sequences of biased nucleotide composition, where spurious patterns are often observed because the bias leads to coincidental nucleotide matches. We show here the utility of the method by applying it to genomic sequences of Plasmodium falciparum. A single scan of chromosomes 2 and 3 of P. falciparum, using our method and no other a priori information about the sequences, reveals regions of low complexity in both telomeric and central regions, long repeats in the subtelomeric regions, and shorter repeat areas in dense coding regions. Application of the method to a recently sequenced contig of chromosome 10 that has a particularly biased base composition detects a long internal repeat more readily than does the conventional dot matrix plot. Space requirements are linear, so the method can be used on large sequences. The observed repeat patterns may be related to large-scale chromosomal organization and control of gene expression. The method has general application in detecting patterns of potential interest in newly sequenced genomic material.

Bioinformatics | 1988

Errors between sites in restriction site mapping

Trevor I. Dix; Dorota H. Kieronska

Restriction site mapping programs construct maps by generating permutations of fragments and checking for consistency. Unfortunately many consistent maps often are obtained within the experimental error bounds, even though there is only one actual map. A particularly efficient algorithm is presented that aims to minimize error bounds between restriction sites. The method is generalized for linear and circular maps. The time complexity is derived and execution times are given for multiple enzymes and a range of error bounds.

ieee international conference on evolutionary computation | 2006

Fuzzy Model for Gene Regulatory Network

Ramesh Ram; Madhu Chetty; Trevor I. Dix

Gene regulatory networks influence development and evolution in living organism. The advent of microarray technology has challenged computer scientists to develop better algorithms for modeling the underlying regulatory relationship in between the genes. Recently, a fuzzy logic model has been proposed to search microarray datasets for activator/repressor regulatory relationship. We improve this model for searching regulatory triplets by means of predicting changes in expression level of the target over interval time points based on input expression level, and comparing them with actual changes. This method eliminates possible false predictions from the classical fuzzy model thereby allowing a wider search space for inferring regulatory relationship. We also introduce a novel pre-processing technique using fuzzy logic that can group genes having similar changes in expression profile over all available intervals in the microarray data. This technique eliminates redundant computation performed by the proposed model. Saccharomyces cerevisiae data was applied to the model and 548 activator/repressor regulatory triplets were inferred from the data. These improvements will increase feasibility of using fuzzy logic for understanding the relationship between genes using microarray technology.

computational intelligence in bioinformatics and computational biology | 2006

Causal Modeling of Gene Regulatory Network

Ramesh Ram; Madhu Chetty; Trevor I. Dix

The analysis of high-throughput experimental data, such as microarray gene expression data, is currently seen as a promising way of finding regulatory relationships between genes. Network inference algorithms are powerful computational tools for identifying putative causal interactions among variables from observed data. In this paper, we propose a network reconstruction technique to predict not only the structure but also the direction and sign of regulation using a genetic algorithm (GA). The networks consisting of nodes (genes), directed edges (gene-gene interactions) and dynamics of regulation are assigned scores using the presented causal model based on partial correlation. The highest scoring network best fits the expression data. As GAs are stochastic, the algorithm is repeated several times and the final network is reconstructed by combining the most significant connections identified from the high scoring networks. The presented technique is applied to the well known Saccharomyces cerevisiae microarray dataset and the reconstructed network is observed to be consistent with the results found in literature

Information Processing Letters | 1999

A versatile divide and conquer technique for optimal string alignment

David R. Powell; Lloyd Allison; Trevor I. Dix

Abstract Common string alignment algorithms such as the basic dynamic programming algorithm (DPA) and the time efficient Ukkonen algorithm use quadratic space to determine an alignment between two strings. In this paper we present a technique that can be applied to these algorithms to obtain an alignment using only linear space, while having little or no effect on the time complexity. This new technique has several advantages over previous methods for determining alignments in linear space, such as: simplicity, the ability to use essentially the same technique when using different cost functions, and the practical advantage of easily being able to trade available memory for running time.

australasian joint conference on artificial intelligence | 2007

Building classification models from microarray data with tree-based classification algorithms

Peter J. Tan; David L. Dowe; Trevor I. Dix

Building classification models plays an important role in DNA mircroarray data analyses. An essential feature of DNA microarray data sets is that the number of input variables (genes) is far greater than the number of samples. As such, most classification schemes employ variable selection or feature selection methods to pre-process DNA microarray data. This paper investigates various aspects of building classification models from microarray data with tree-based classification algorithms by using Partial Least-Squares (PLS) regression as a feature selection method. Experimental results show that the Partial Least-Squares (PLS) regression method is an appropriate feature selection method and tree-based ensemble models are capable of delivering high performance classification models for microarray data.

Explore More