Aaron J. Mackey | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aaron J. Mackey is active.

Explore More

Publication

Featured researches published by Aaron J. Mackey.

Nature | 2012

The Drosophila melanogaster Genetic Reference Panel

Trudy F. C. Mackay; Stephen Richards; Eric A. Stone; Antonio Barbadilla; Julien F. Ayroles; Dianhui Zhu; Sònia Casillas; Yi Han; Michael M. Magwire; Julie M. Cridland; Mark F. Richardson; Robert R. H. Anholt; Maite Barrón; Crystal Bess; Kerstin P. Blankenburg; Mary Anna Carbone; David Castellano; Lesley S. Chaboub; Laura H. Duncan; Zeke Harris; Mehwish Javaid; Joy Jayaseelan; Shalini N. Jhangiani; Katherine W. Jordan; Fremiet Lara; Faye Lawrence; Sandra L. Lee; Pablo Librado; Raquel S. Linheiro; Richard F. Lyman

A major challenge of biology is understanding the relationship between molecular genetic variation and variation in quantitative traits, including fitness. This relationship determines our ability to predict phenotypes from genotypes and to understand how evolutionary forces shape variation within and between species. Previous efforts to dissect the genotype–phenotype map were based on incomplete genotypic information. Here, we describe the Drosophila melanogaster Genetic Reference Panel (DGRP), a community resource for analysis of population genomics and quantitative traits. The DGRP consists of fully sequenced inbred lines derived from a natural population. Population genomic analyses reveal reduced polymorphism in centromeric autosomal regions and the X chromosome, evidence for positive and negative selection, and rapid evolution of the X chromosome. Many variants in novel genes, most at low frequency, are associated with quantitative traits and explain a large fraction of the phenotypic variance. The DGRP facilitates genotype–phenotype mapping using the power of Drosophila genetics.

Nature | 2004

The genome of Cryptosporidium hominis

Ping Xu; Giovanni Widmer; Yingping Wang; Luiz Shozo Ozaki; João M. P. Alves; Myrna G. Serrano; Daniela Puiu; Patricio Manque; Aaron J. Mackey; William R. Pearson; Paul H. Dear; Alan T. Bankier; Darrell L. Peterson; Mitchell S. Abrahamsen; Vivek Kapur; Saul Tzipori; Gregory A. Buck

Cryptosporidium species cause acute gastroenteritis and diarrhoea worldwide. They are members of the Apicomplexa—protozoan pathogens that invade host cells by using a specialized apical complex and are usually transmitted by an invertebrate vector or intermediate host. In contrast to other Apicomplexans, Cryptosporidium is transmitted by ingestion of oocysts and completes its life cycle in a single host. No therapy is available, and control focuses on eliminating oocysts in water supplies. Two species, C. hominis and C. parvum, which differ in host range, genotype and pathogenicity, are most relevant to humans. C. hominis is restricted to humans, whereas C. parvum also infects other mammals. Here we describe the eight-chromosome ∼9.2-million-base genome of C. hominis. The complement of C. hominis protein-coding genes shows a striking concordance with the requirements imposed by the environmental niches the parasite inhabits. Energy metabolism is largely from glycolysis. Both aerobic and anaerobic metabolisms are available, the former requiring an alternative electron transport system in a simplified mitochondrion. Biosynthesis capabilities are limited, explaining an extensive array of transporters. Evidence of an apicoplast is absent, but genes associated with apical complex organelles are present. C. hominis and C. parvum exhibit very similar gene complements, and phenotypic differences between these parasites must be due to subtle sequence divergence.

Proceedings of the National Academy of Sciences of the United States of America | 2010

Insights into evolution of multicellular fungi from the assembled chromosomes of the mushroom Coprinopsis cinerea (Coprinus cinereus)

Jason E. Stajich; Sarah K. Wilke; Dag Ahrén; Chun Hang Au; Bruce W. Birren; Mark Borodovsky; Claire Burns; Björn Canbäck; Lorna A. Casselton; Chi Keung Cheng; Jixin Deng; Fred S. Dietrich; David C. Fargo; Mark L. Farman; Allen C. Gathman; Jonathan M. Goldberg; Roderic Guigó; Patrick J. Hoegger; James Hooker; Ashleigh Huggins; Timothy Y. James; Takashi Kamada; Sreedhar Kilaru; Chinnapa Kodira; Ursula Kües; Doris M. Kupfer; Hoi Shan Kwan; Alexandre Lomsadze; Weixi Li; Walt W. Lilly

The mushroom Coprinopsis cinerea is a classic experimental model for multicellular development in fungi because it grows on defined media, completes its life cycle in 2 weeks, produces some 108 synchronized meiocytes, and can be manipulated at all stages in development by mutation and transformation. The 37-megabase genome of C. cinerea was sequenced and assembled into 13 chromosomes. Meiotic recombination rates vary greatly along the chromosomes, and retrotransposons are absent in large regions of the genome with low levels of meiotic recombination. Single-copy genes with identifiable orthologs in other basidiomycetes are predominant in low-recombination regions of the chromosome. In contrast, paralogous multicopy genes are found in the highly recombining regions, including a large family of protein kinases (FunK1) unique to multicellular fungi. Analyses of P450 and hydrophobin gene families confirmed that local gene duplications drive the expansions of paralogous copies and the expansions occur in independent lineages of Agaricomycotina fungi. Gene-expression patterns from microarrays were used to dissect the transcriptional program of dikaryon formation (mating). Several members of the FunK1 kinase family are differentially regulated during sexual morphogenesis, and coordinate regulation of adjacent duplications is rare. The genomes of C. cinerea and Laccaria bicolor, a symbiotic basidiomycete, share extensive regions of synteny. The largest syntenic blocks occur in regions with low meiotic recombination rates, no transposable elements, and tight gene spacing, where orthologous single-copy genes are overrepresented. The chromosome assembly of C. cinerea is an essential resource in understanding the evolution of multicellularity in the fungi.

Genome Biology | 2007

Creating a honey bee consensus gene set

Christine G. Elsik; Aaron J. Mackey; Justin T. Reese; Natalia V. Milshina; David S. Roos; George M. Weinstock

BackgroundWe wished to produce a single reference gene set for honey bee (Apis mellifera). Our motivation was twofold. First, we wished to obtain an improved set of gene models with increased coverage of known genes, while maintaining gene model quality. Second, we wished to provide a single official gene list that the research community could further utilize for consistent and comparable analyses and functional annotation.ResultsWe created a consensus gene set for honey bee (Apis mellifera) using GLEAN, a new algorithm that uses latent class analysis to automatically combine disparate gene prediction evidence in the absence of known genes. The consensus gene models had increased representation of honey bee genes without sacrificing quality compared with any one of the input gene predictions. When compared with manually annotated gold standards, the consensus set of gene models was similar or superior in quality to each of the input sets.ConclusionMost eukaryotic genome projects produce multiple gene sets because of the variety of gene prediction programs. Each of the gene prediction programs has strengths and weaknesses, and so the multiplicity of gene sets offers users a more comprehensive collection of genes to use than is available from a single program. On the other hand, the availability of multiple gene sets is also a cause for uncertainty among users as regards which set they should use. GLEAN proved to be an effective method to combine gene lists into a single reference set.

Molecular & Cellular Proteomics | 2002

Getting More from Less Algorithms for Rapid Protein Identification with Multiple Short Peptide Sequences

Aaron J. Mackey; Timothy A. J. Haystead; William R. Pearson

We describe two novel sequence similarity search algorithms, FASTS and FASTF, that use multiple short peptide sequences to identify homologous sequences in protein or DNA databases. FASTS searches with peptide sequences of unknown order, as obtained by mass spectrometry-based sequencing, evaluating all possible arrangements of the peptides. FASTF searches with mixed peptide sequences, as generated by Edman sequencing of unseparated mixtures of peptides. FASTF deconvolutes the mixture, using a greedy heuristic that allows rapid identification of high scoring alignments while reducing the total number of explored alternatives. Both algorithms use the heuristic FASTA comparison strategy to accelerate the search but use alignment probability, rather than similarity score, as the criterion for alignment optimality. Statistical estimates are calculated using an empirical correction to a theoretical probability. These calculated estimates were accurate within a factor of 10 for FASTS and 1000 for FASTF on our test dataset. FASTS requires only 15–20 total residues in three or four peptides to robustly identify homologues sharing 50% or greater protein sequence identity. FASTF requires about 25% more sequence data than FASTS for equivalent sensitivity, but additional sequence data are usually available from mixed Edman experiments. Thus, both algorithms can identify homologues that diverged 100 to 500 million years ago, allowing proteomic identification from organisms whose genomes have not been sequenced.

Nucleic Acids Research | 2005

Composite genome map and recombination parameters derived from three archetypal lineages of Toxoplasma gondii

Asis Khan; Sonya Taylor; C. Su; Aaron J. Mackey; Jon P. Boyle; Robert H. Cole; Darius Glover; Keliang Tang; Ian T. Paulsen; Matthew Berriman; John C. Boothroyd; E.R. Pfefferkorn; J. P. Dubey; James W. Ajioka; David S. Roos; John C. Wootton; L. David Sibley

Toxoplasma gondii is a highly successful protozoan parasite in the phylum Apicomplexa, which contains numerous animal and human pathogens. T.gondii is amenable to cellular, biochemical, molecular and genetic studies, making it a model for the biology of this important group of parasites. To facilitate forward genetic analysis, we have developed a high-resolution genetic linkage map for T.gondii. The genetic map was used to assemble the scaffolds from a 10X shotgun whole genome sequence, thus defining 14 chromosomes with markers spaced at ∼300 kb intervals across the genome. Fourteen chromosomes were identified comprising a total genetic size of ∼592 cM and an average map unit of ∼104 kb/cM. Analysis of the genetic parameters in T.gondii revealed a high frequency of closely adjacent, apparent double crossover events that may represent gene conversions. In addition, we detected large regions of genetic homogeneity among the archetypal clonal lineages, reflecting the relatively few genetic outbreeding events that have occurred since their recent origin. Despite these unusual features, linkage analysis proved to be effective in mapping the loci determining several drug resistances. The resulting genome map provides a framework for analysis of complex traits such as virulence and transmission, and for comparative population genetic studies.

BMC Biology | 2005

The transcriptome of Toxoplasma gondii

Jay R. Radke; Michael S. Behnke; Aaron J. Mackey; Josh B. Radke; David S. Roos; Michael W. White

BackgroundToxoplasma gondii gives rise to toxoplasmosis, among the most prevalent parasitic diseases of animals and man. Transformation of the tachzyoite stage into the latent bradyzoite-cyst form underlies chronic disease and leads to a lifetime risk of recrudescence in individuals whose immune system becomes compromised. Given the importance of tissue cyst formation, there has been intensive focus on the development of methods to study bradyzoite differentiation, although the molecular basis for the developmental switch is still largely unknown.ResultsWe have used serial analysis of gene expression (SAGE) to define the Toxoplasma gondii transcriptome of the intermediate-host life cycle that leads to the formation of the bradyzoite/tissue cyst. A broad view of gene expression is provided by >4-fold coverage from nine distinct libraries (~300,000 SAGE tags) representing key developmental transitions in primary parasite populations and in laboratory strains representing the three canonical genotypes. SAGE tags, and their corresponding mRNAs, were analyzed with respect to abundance, uniqueness, and antisense/sense polarity and chromosome distribution and developmental specificity.ConclusionThis study demonstrates that phenotypic transitions during parasite development were marked by unique stage-specific mRNAs that accounted for 18% of the total SAGE tags and varied from 1–5% of the tags in each developmental stage. We have also found that Toxoplasma mRNA pools have a unique parasite-specific composition with 1 in 5 transcripts encoding Apicomplexa-specific genes functioning in parasite invasion and transmission. Developmentally co-regulated genes were dispersed across all Toxoplasma chromosomes, as were tags representing each abundance class, and a variety of biochemical pathways indicating that trans-acting mechanisms likely control gene expression in this parasite. We observed distinct similarities in the specificity and expression levels of mRNAs in primary populations (Day-6 post-sporozoite infection) that occur prior to the onset of bradyzoite development that were uniquely shared with the virulent Type I-RH laboratory strain suggesting that development of RH may be arrested. By contrast, strains from Type II-Me49B7 and Type III-VEGmsj contain SAGE tags corresponding to bradyzoite genes, which suggests that priming of developmental expression likely plays a role in the greater capacity of these strains to complete bradyzoite development.

Molecular & Cellular Proteomics | 2007

Periplasmic Proteins of the Extremophile Acidithiobacillus ferrooxidans A High Throughput Proteomics Analysis

An Chi; Lissette Valenzuela; Simon Beard; Aaron J. Mackey; Jeffrey Shabanowitz; Donald F. Hunt; Carlos A. Jerez

Acidithiobacillus ferrooxidans is a chemolithoautotrophic acidophile capable of obtaining energy by oxidizing ferrous iron or sulfur compounds such as metal sulfides. Some of the proteins involved in these oxidations have been described as forming part of the periplasm of this extremophile. The detailed study of the periplasmic components constitutes an important area to understand the physiology and environmental interactions of microorganisms. Proteomics analysis of the periplasmic fraction of A. ferrooxidans ATCC 23270 was performed by using high resolution linear ion trap-FT MS. We identified a total of 131 proteins in the periplasm of the microorganism grown in thiosulfate. When possible, functional categories were assigned to the proteins: 13.8% were transport and binding proteins, 14.6% were several kinds of cell envelope proteins, 10.8% were involved in energy metabolism, 10% were related to protein fate and folding, 10% were proteins with unknown functions, and 26.1% were proteins without homologues in databases. These last proteins are most likely characteristic of A. ferrooxidans and may have important roles yet to be assigned. The majority of the periplasmic proteins from A. ferrooxidans were very basic compared with those of neutrophilic microorganisms such as Escherichia coli, suggesting a special adaptation of the chemolithoautotrophic bacterium to its very acidic environment. The high throughput proteomics approach used here not only helps to understand the physiology of this extreme acidophile but also offers an important contribution to the functional annotation for the available genomes of biomining microorganisms such as A. ferrooxidans for which no efficient genetic systems are available to disrupt genes by procedures such as homologous recombination.

Bioinformatics | 2008

Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction

Qian Liu; Aaron J. Mackey; David S. Roos; Fernando Pereira

MOTIVATION The increasing diversity and variable quality of evidence relevant to gene annotation argues for a probabilistic framework that automatically integrates such evidence to yield candidate gene models. RESULTS Evigan is an automated gene annotation program for eukaryotic genomes, employing probabilistic inference to integrate multiple sources of gene evidence. The probabilistic model is a dynamic Bayes network whose parameters are adjusted to maximize the probability of observed evidence. Consensus gene predictions are then derived by maximum likelihood decoding, yielding n-best models (with probabilities for each). Evigan is capable of accommodating a variety of evidence types, including (but not limited to) gene models computed by diverse gene finders, BLAST hits, EST matches, and splice site predictions; learned parameters encode the relative quality of evidence sources. Since separate training data are not required (apart from the training sets used by individual gene finders), Evigan is particularly attractive for newly sequenced genomes where little or no reliable manually curated annotation is available. The ability to produce a ranked list of alternative gene models may facilitate identification of alternatively spliced transcripts. Experimental application to ENCODE regions of the human genome, and the genomes of Plasmodium vivax and Arabidopsis thaliana show that Evigan achieves better performance than any of the individual data sources used as evidence. AVAILABILITY The source code is available at http://www.seas.upenn.edu/~strctlrn/evigan/evigan.html.

Molecular & Cellular Proteomics | 2002

A Strategy for the Rapid Identification of Phosphorylation Sites in the Phosphoproteome

Justin A. MacDonald; Aaron J. Mackey; William R. Pearson; Timothy A. J. Haystead

Edman phosphate (32P) release sequencing provides a high sensitivity means of identifying phosphorylation sites in proteins that complements mass spectrometry techniques. We have developed a bioinformatic assessment tool, the cleavage of radiolabeled protein (CRP) program, which enables experimental identification of phosphorylation sites via 32P labeling and Edman degradation of cleaved proteins obtained at femtomole levels. By observing the Edman cycle(s) in which radioactivity is found, candidate phosphorylation sites are identified by determining which residues occur at the observed number of cycles downstream from a peptide cleavage site. In cases where more than one residue could be responsible for the observed radioactivity, additional experiments with cleavage reagents having alternative specificities may resolve the ambiguity. Given a protein sequence and a cleavage site, CRP performs these experiments in silico, identifying resolved sites based on user-supplied experimental data, as well as suggesting combinations of reagents for additional analyses. Analysis of the PhosphoBase protein sequence database suggests that CRP data from two cleavage experiments can be used to identify unambiguously 60% of known phosphorylation sites. Data from additional cleavage experiments may increase the overall coverage to 70% of known sites. By comparing theoretical data obtained from the CRP program with 32P release data obtained from an Edman sequencer, a known phosphorylation site was identified unambiguously and correctly. In addition, our results show that in vivo phosphorylation sites can be determined routinely by differential proteolysis analysis and Edman cycling with less than 1 fmol of protein and 1000 cpm.

Explore More