Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ian A. Wood is active.

Publication


Featured researches published by Ian A. Wood.


PLOS Genetics | 2014

Analysis of the genome and transcriptome of Cryptococcus neoformans var. grubii reveals complex RNA expression and microevolution leading to virulence attenuation.

Guilhem Janbon; Kate L. Ormerod; Damien Paulet; Edmond J. Byrnes; Vikas Yadav; Gautam Chatterjee; Nandita Mullapudi; Chung Chau Hon; R. Blake Billmyre; François Brunel; Yong Sun Bahn; Weidong Chen; Yuan Chen; Eve W. L. Chow; Jean Yves Coppée; Anna Floyd-Averette; Claude Gaillardin; Kimberly J. Gerik; Jonathan M. Goldberg; Sara Gonzalez-Hilarion; Sharvari Gujja; Joyce L. Hamlin; Yen-Ping Hsueh; Giuseppe Ianiri; Steven J.M. Jones; Chinnappa D. Kodira; Lukasz Kozubowski; Woei Lam; Marco A. Marra; Larry D. Mesner

Cryptococcus neoformans is a pathogenic basidiomycetous yeast responsible for more than 600,000 deaths each year. It occurs as two serotypes (A and D) representing two varieties (i.e. grubii and neoformans, respectively). Here, we sequenced the genome and performed an RNA-Seq-based analysis of the C. neoformans var. grubii transcriptome structure. We determined the chromosomal locations, analyzed the sequence/structural features of the centromeres, and identified origins of replication. The genome was annotated based on automated and manual curation. More than 40,000 introns populating more than 99% of the expressed genes were identified. Although most of these introns are located in the coding DNA sequences (CDS), over 2,000 introns in the untranslated regions (UTRs) were also identified. Poly(A)-containing reads were employed to locate the polyadenylation sites of more than 80% of the genes. Examination of the sequences around these sites revealed a new poly(A)-site-associated motif (AUGHAH). In addition, 1,197 miscRNAs were identified. These miscRNAs can be spliced and/or polyadenylated, but do not appear to have obvious coding capacities. Finally, this genome sequence enabled a comparative analysis of strain H99 variants obtained after laboratory passage. The spectrum of mutations identified provides insights into the genetics underlying the micro-evolution of a laboratory strain, and identifies mutations involved in stress responses, mating efficiency, and virulence.


Bioinformatics | 2007

Classification based upon gene expression data

Ian A. Wood; Peter M. Visscher; Kerrie Mengersen

MOTIVATION Gene expression data offer a large number of potentially useful predictors for the classification of tissue samples into classes, such as diseased and non-diseased. The predictive error rate of classifiers can be estimated using methods such as cross-validation. We have investigated issues of interpretation and potential bias in the reporting of error rate estimates. The issues considered here are optimization and selection biases, sampling effects, measures of misclassification rate, baseline error rates, two-level external cross-validation and a novel proposal for detection of bias using the permutation mean. RESULTS Reporting an optimal estimated error rate incurs an optimization bias. Downward bias of 3-5% was found in an existing study of classification based on gene expression data and may be endemic in similar studies. Using a simulated non-informative dataset and two example datasets from existing studies, we show how bias can be detected through the use of label permutations and avoided using two-level external cross-validation. Some studies avoid optimization bias by using single-level cross-validation and a test set, but error rates can be more accurately estimated via two-level cross-validation. In addition to estimating the simple overall error rate, we recommend reporting class error rates plus where possible the conditional risk incorporating prior class probabilities and a misclassification cost matrix. We also describe baseline error rates derived from three trivial classifiers which ignore the predictors. AVAILABILITY R code which implements two-level external cross-validation with the PAMR package, experiment code, dataset details and additional figures are freely available for non-commercial use from http://www.maths.qut.edu.au/profiles/wood/permr.jsp


Molecular Biology and Evolution | 2012

Microevolution of Cryptococcus neoformans Driven by Massive Tandem Gene Amplification

Eve W. L. Chow; Carl A. Morrow; Julianne T. Djordjevic; Ian A. Wood; James A. Fraser

The subtelomeric regions of organisms ranging from protists to fungi undergo a much higher rate of rearrangement than is observed in the rest of the genome. While characterizing these ~40-kb regions of the human fungal pathogen Cryptococcus neoformans, we have identified a recent gene amplification event near the right telomere of chromosome 3 that involves a gene encoding an arsenite efflux transporter (ARR3). The 3,177-bp amplicon exists in a tandem array of 2-15 copies and is present exclusively in strains with the C. neoformans var. grubii subclade VNI A5 MLST profile. Strains bearing the amplification display dramatically enhanced resistance to arsenite that correlates with the copy number of the repeat; the origin of increased resistance was verified as transport-related by functional complementation of an arsenite transporter mutant of Saccharomyces cerevisiae. Subsequent experimental evolution in the presence of increasing concentrations of arsenite yielded highly resistant strains with the ARR3 amplicon further amplified to over 50 copies, accounting for up to ~1% of the whole genome and making the copy number of this repeat as high as that seen for the ribosomal DNA. The example described here therefore represents a rare evolutionary intermediate-an array that is currently in a state of dynamic flux, in dramatic contrast to relatively common, static relics of past tandem duplications that are unable to further amplify due to nucleotide divergence. Beyond identifying and engineering fungal isolates that are highly resistant to arsenite and describing the first reported instance of microevolution via massive gene amplification in C. neoformans, these results suggest that adaptation through gene amplification may be an important mechanism that C. neoformans employs in response to environmental stresses, perhaps including those encountered during infection. More importantly, the ARR3 array will serve as an ideal model for further molecular genetic analyses of how tandem gene duplications arise and expand.


IEEE Transactions on Biomedical Engineering | 2010

Nonlinear Features for Single-Channel Diagnosis of Sleep-Disordered Breathing Diseases

Suren I. Rathnayake; Ian A. Wood; Udantha R. Abeyratne; Craig Hukins

Studies have shown that algorithms based on single-channel airflow records are effective in screening for sleep-disordered breathing diseases (SDB). In this study, we investigate the diagnostic effectiveness of a classifier trained on a set of features derived from single-channel airflow measurements. The features considered are based on recurrence quantification analysis (RQA) of the measurement time series and are optionally augmented with single measurements of neck circumference and body mass index. The airflow measurement utilized is the nasal pressure (NP). The study used an overnight recording from each of 77 patients undergoing PSG testing. Mixture discriminant analysis was used to obtain a classifier, which predicts whether or not a measurement segment contains an SDB event. Patients were diagnosed as having SDB disease if the recording contained measurement segments predicted to include an SDB event at a rate exceeding a threshold value. A patient can be diagnosed as having SDB disease if the rate of SDB events per hour of sleep, the respiratory disturbance index (RDI), is ≥15 or sometimes ≥5. Here we trained and evaluated the classifier under each assumption, obtaining areas under receiver operating curves using fivefold cross-validation of 0.96 and 0.93, respectively. We used a two-layer structure to select the optimal operating point and assess the resulting classifier to avoid unbiased estimates. The resulting estimates for diagnostic sensitivity/specificity were 71.5%/89.5% for disease classification when RDI ≥ 15 and 63.3%/100% for RDI ≥ 5. These results were found assuming that the costs of misclassifying healthy and diseased subjects are equal, but we provide a framework to vary these costs. The results suggest that a classifier based on RQA features derived from NP measurements could be used in an automated SDB screening device.


Annals of Epidemiology | 2008

Investigation of the relationship between smoking and appendicitis in Australian twins.

Christopher Oldmeadow; Ian A. Wood; Kerrie Mengersen; Peter M. Visscher; Nicholas G. Martin; David L. Duffy

PURPOSE Appendicitis is an inflammation of the appendix, the etiology of which is still poorly understood. Previous studies have shown an increased risk for cigarette smokers but no accounts for the timing of exposure to smoking relative to appendectomy were made. METHODS Based on questionnaire data, both cohort and co-twin case-control analyses were conducted to assess the effect of active cigarette smoking on appendectomy in 3808 Australian twin pairs. Smoking status was defined as a time-dependent covariate to account for differences in timing of smoking initiation and onset of appendicitis. RESULTS The questionnaire had a 65% pairwise response rate. After controlling for sex, age, and year of birth, appendectomy risk in current smokers was statistically significantly increased by 65% relative to never-smokers. This was largely unchanged by the duration or intensity of smoking and was not affected by socioeconomic status or fathers occupation. The effect was stronger in females. Among former smokers, increased time since quitting significantly reduced the odds ratio of appendectomy by 15% for every year since quitting. CONCLUSION After adjustment for age and other confounders, there was an increase in risk of appendectomy among current smokers relative to never-smokers, particularly in females. This study adds to the body of knowledge on the effects of tobacco smoking on the gastrointestinal tract.


Genetics Selection Evolution | 2006

A meta-analytic assessment of a Thyroglobulin marker for marbling in beef cattle

Ian A. Wood; G. Moser; Daniel L. Burrell; Kerrie Mengersen; D. Jay S. Hetzel

A meta-analysis was undertaken reporting on the association between a polymorphism in the Thyroglobulin gene (TG5) and marbling in beef cattle. A Bayesian hierarchical model was adopted, with alternative representations assessed through sensitivity analysis. Based on the overall posterior means and posterior probabilities, there is substantial support for an additive association between the TG5 marker and marbling. The marker effect was also assessed across various breed groups, with each group displaying a high probability of positive association between the T allele and marbling. The WinBUGS program code used to simulate the model is included as an Appendix available online at http://www.edpsciences.org/gse.


Computational Statistics & Data Analysis | 2012

Approximating the tail of the Anderson-Darling distribution

Adam W. Grace; Ian A. Wood

The Anderson-Darling distribution plays an important role in the statistical testing of uniformity. However, it is difficult to evaluate, especially in its tail. We consider a new Monte Carlo approach to approximate the tail probabilities of the Anderson-Darling distribution. The estimates are compared with existing tables and recent numerical approximations, obtained via numerical inversion and naive Monte Carlo. Our results demonstrate improved accuracy over existing tables and approximating functions for small tail probabilities. We also present an approximating function for tail probabilities of less than 3x10^-^2.


congress on evolutionary computation | 2007

Bayesian inference in estimation of distribution algorithms

Marcus Gallagher; Ian A. Wood; Jonathan M. Keith; George Y. Sofronov

Metaheuristics such as Estimation of Distribution Algorithms and the Cross-Entropy method use probabilistic modelling and inference to generate candidate solutions in optimization problems. The model fitting task in this class of algorithms has largely been carried out to date based on maximum likelihood. An alternative approach that is prevalent in statistics and machine learning is to use Bayesian inference. In this paper, we provide a framework for the application of Bayesian inference techniques in probabilistic model-based optimization. Based on this framework, a simple continuous Bayesian Estimation of Distribution Algorithm is described. We evaluate and compare this algorithm experimentally with its maximum likelihood equivalent, UMDAG c.


Genetics | 2014

Characterizing Uncertainty in High-Density Maps from Multiparental Populations

Daniel C. Ahfock; Ian A. Wood; Stuart Stephen; Colin Cavanagh; B. Emma Huang

Multiparental populations are of considerable interest in high-density genetic mapping due to their increased levels of polymorphism and recombination relative to biparental populations. However, errors in map construction can have significant impact on QTL discovery in later stages of analysis, and few methods have been developed to quantify the uncertainty attached to the reported order of markers or intermarker distances. Current methods are computationally intensive or limited to assessing uncertainty only for order or distance, but not both simultaneously. We derive the asymptotic joint distribution of maximum composite likelihood estimators for intermarker distances. This approach allows us to construct hypothesis tests and confidence intervals for simultaneously assessing marker-order instability and distance uncertainty. We investigate the effects of marker density, population size, and founder distribution patterns on map confidence in multiparental populations through simulations. Using these data, we provide guidelines on sample sizes necessary to map markers at sub-centimorgan densities with high certainty. We apply these approaches to data from a bread wheat Multiparent Advanced Generation Inter-Cross (MAGIC) population genotyped using the Illumina 9K SNP chip to assess regions of uncertainty and validate them against the recently released pseudomolecule for the wheat chromosome 3B.


international conference on data mining | 2011

Approximate Record Matching Using Hash Grams

Mohammed Gollapalli; Xue Li; Ian A. Wood; Guido Governatori

Accurately identifying duplicate records between multiple data sources is a persistent problem that continues to plague organizations and researchers alike. Small inconsistencies between records can prevent detection between two otherwise identical records. In this paper, we present a new probabilistic h-gram (hash gram) record matching technique by extending traditional n-grams and utilizing scale based hashing for equality testing. h-gram matching highly reduces the number of comparisons to be performed for duplicate record detection applicable to a variety of data types and data sizes by transforming data into its equivalent numerical realities. One of the key features of h-gram matching is that it is highly extensible providing more intuitive and flexible results. With the sampling technique in place, our method can be applied on variable size databases to perform data linkage and probabilistic results can be quickly obtained. We have extensively evaluated h-gram matching on large samples of real-world data and the results show higher level of accuracy as well as reduction in required time when compared with existing techniques.

Collaboration


Dive into the Ian A. Wood's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kerrie Mengersen

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Tom Downs

University of Queensland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xue Li

University of Queensland

View shared research outputs
Top Co-Authors

Avatar

Eve W. L. Chow

University of Queensland

View shared research outputs
Top Co-Authors

Avatar

Guido Governatori

Commonwealth Scientific and Industrial Research Organisation

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge