Omar Al-Azzam
North Dakota State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Omar Al-Azzam.
BMC Genomics | 2012
Ajay Kumar; Kristin Simons; Muhammad J. Iqbal; Monika Michalak de Jiménez; Filippo M. Bassi; Farhad Ghavami; Omar Al-Azzam; Thomas Drader; Yi Wang; Ming-Cheng Luo; Yong Q. Gu; Anne M. Denton; Gerard R. Lazo; Steven S. Xu; Jan Dvorak; Penny M.A. Kianian; Shahryar F. Kianian
BackgroundDevelopment of a high quality reference sequence is a daunting task in crops like wheat with large (~17Gb), highly repetitive (>80%) and polyploid genome. To achieve complete sequence assembly of such genomes, development of a high quality physical map is a necessary first step. However, due to the lack of recombination in certain regions of the chromosomes, genetic mapping, which uses recombination frequency to map marker loci, alone is not sufficient to develop high quality marker scaffolds for a sequence ready physical map. Radiation hybrid (RH) mapping, which uses radiation induced chromosomal breaks, has proven to be a successful approach for developing marker scaffolds for sequence assembly in animal systems. Here, the development and characterization of a RH panel for the mapping of D-genome of wheat progenitor Aegilops tauschii is reported.ResultsRadiation dosages of 350 and 450 Gy were optimized for seed irradiation of a synthetic hexaploid (AABBDD) wheat with the D-genome of Ae. tauschii accession AL8/78. The surviving plants after irradiation were crossed to durum wheat (AABB), to produce pentaploid RH1s (AABBD), which allows the simultaneous mapping of the whole D-genome. A panel of 1,510 RH1 plants was obtained, of which 592 plants were generated from the mature RH1 seeds, and 918 plants were rescued through embryo culture due to poor germination (<3%) of mature RH1 seeds. This panel showed a homogenous marker loss (2.1%) after screening with SSR markers uniformly covering all the D-genome chromosomes. Different marker systems mostly detected different lines with deletions. Using markers covering known distances, the mapping resolution of this RH panel was estimated to be <140kb. Analysis of only 16 RH lines carrying deletions on chromosome 2D resulted in a physical map with cM/cR ratio of 1:5.2 and 15 distinct bins. Additionally, with this small set of lines, almost all the tested ESTs could be mapped. A set of 399 most informative RH lines with an average deletion frequency of ~10% were identified for developing high density marker scaffolds of the D-genome.ConclusionsThe RH panel reported here is the first developed for any wild ancestor of a major cultivated plant species. The results provided insight into various aspects of RH mapping in plants, including the genetically effective cell number for wheat (for the first time) and the potential implementation of this technique in other plant species. This RH panel will be an invaluable resource for mapping gene based markers, developing a complete marker scaffold for the whole genome sequence assembly, fine mapping of markers and functional characterization of genes and gene networks present on the D-genome.
international conference on machine learning and applications | 2011
Omar Al-Azzam; Loai M. Alnemer; Charith Chitraranjan; Anne M. Denton; Ajay Kumar; Filippo M. Bassi; Muhammad J. Iqbal; Shahryar F. Kianian
Genome mapping, or the experimental determination of the ordering of DNA markers on a chromosome, is an important step in genome sequencing and ultimate assembly of sequenced genomes. The presented research addresses the problem of identifying markers that cannot be placed reliably. If such markers are included in standard mapping procedures they can result in an overall poor mapping. Traditional techniques for identifying markers that cannot be placed consistently are based on resampling, which requires an already computationally expensive process to be done for a large ensemble of resampled populations. We propose a network-based approach that uses pair wise similarities between markers and demonstrate that the results from this approach largely match the more computationally expensive conventional approaches. The evaluation of the proposed approach is done on data from the radiation hybrid mapping of the wheat genome.
data mining in bioinformatics | 2013
Raed I. Seetan; Ajay Kumar; Anne M. Denton; M. Javed Iqbal; Omar Al-Azzam; Shahryar F. Kianian
The process of mapping markers from radiation hybrid mapping (RHM) experiments is equivalent to the traveling salesman problem and, thereby, has combinatorial complexity. As an additional problem, experiments typically result in some unreliable markers that reduce the overall quality of the map. We propose a clustering approach for addressing both problems efficiently by eliminating unreliable markers without the need for mapping the complete set of markers. Traditional approaches for eliminating markers use resampling of the full data set, which has an even higher computational complexity than the original mapping problem. In contrast, the proposed approach uses a divide and conquer strategy to construct framework maps based on clusters that exclude unreliable markers. Clusters are ordered using parallel processing and are then combined to form the complete map. Using an RHM data set of the human genome, we compare the framework maps from our proposed approaches with published physical maps and with the Carthagene tool. Overall, our approach has a very low computational complexity and produces solid framework maps with good chromosome coverage and high agreement with the physical map marker order.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2014
Raed I. Seetan; Anne M. Denton; Omar Al-Azzam; Ajay Kumar; M. Javed Iqbal; Shahryar F. Kianian
The process of mapping markers from radiation hybrid mapping (RHM) experiments is equivalent to the traveling salesman problem and, thereby, has combinatorial complexity. As an additional problem, experiments typically result in some unreliable markers that reduce the overall quality of the map. We propose a clustering approach for addressing both problems efficiently by eliminating unreliable markers without the need for mapping the complete set of markers. Traditional approaches for eliminating markers use resampling of the full data set, which has an even higher computational complexity than the original mapping problem. In contrast, the proposed approach uses a divide-and-conquer strategy to construct framework maps based on clusters that exclude unreliable markers. Clusters are ordered using parallel processing and are then combined to form the complete map. We present three algorithms that explore the trade-off between the number of markers included in the map and placement accuracy. Using an RHM data set of the human genome, we compare the framework maps from our proposed approaches with published physical maps and with the results of using the Carthagene tool. Overall, our approaches have a very low computational complexity and produce solid framework maps with good chromosome coverage and high agreement with the physical map marker order.
international conference on machine learning and applications | 2011
Charith Chitraranjan; Loai M. Alnemer; Omar Al-Azzam; Saeed Salem; Anne M. Denton; Muhammad J. Iqbal; Shahryar F. Kianian
We propose a frequent pattern-based algorithm for predicting functions and localizations of proteins from their primary structure (amino acid sequence). We use reduced alphabets that capture the higher rate of substitution between amino acids that are physiochemically similar. Frequent sub strings are mined from the training sequences, transformed into different alphabets, and used as features to train an ensemble of SVMs. We evaluate the performance of our algorithm using protein sub-cellular localization and protein function datasets. Pair-wise sequence-alignment-based nearest neighbor and basic SVM k-gram classifiers are included as comparison algorithms. Results show that the frequent sub string-based SVM classifier demonstrates better performance compared with other classifiers on the sub-cellular localization datasets and it performs competitively with the nearest neighbor classifier on the protein function datasets. Our results also show that the use of reduced alphabets provides statistically significant performance improvements for half of the classes studied.
international conference on bioinformatics | 2014
Raed I. Seetan; Anne M. Denton; Omar Al-Azzam; Ajay Kumar; M. Javed Iqbal; Shahryar F. Kianian
The large numbers of markers in high-resolution radiation hybrid (RH) maps, increasingly necessitates the use of data mining techniques for reducing both the computational complexity and the impact of noise of the original data. Traditionally, the RH mapping process has been treated as equivalent to the traveling salesman problem, with the correspondingly high computational complexity. These techniques are also susceptible to noise, and unreliable marker can result in major disruptions of the overall order. In this paper, we propose a new approach that recognizes that the focus on nearest-neighbor distances that characterizes the traveling-salesman model, is no longer appropriate for the large number of markers in modern high-resolution mapping experiments. The proposed approach splits the mapping process into two levels, where the higher level only operates on the most stable markers of the lower level. A divide and conquer strategy, which is applied at the lower level, removes much of the impact of noise. Because of the high density of markers, only the most stable representatives from the lower level are then used at the higher level. The groupings within the lower level are so small that exhaustive search can be used. Markers are then mapped iteratively, while excluding problematic markers. The results for RH mapping dataset of the human genome show that the proposed approach can construct high-resolution maps with high agreement with the physical maps in a comparatively very short time.
Molecular Plant Pathology | 2012
Javier A. Delgado; Omar Al-Azzam; Anne M. Denton; Samuel G. Markell; Rubella S. Goswami
The goal of this study was to develop a tool specifically designed to identify iterative polyketide synthases (iPKSs) from predicted fungal proteomes. A fungi-based PKS prediction model, specifically for fungal iPKSs, was developed using profile hidden Markov models (pHMMs) based on two essential iPKS domains, the β-ketoacyl synthase (KS) domain and acyltransferase (AT) domain, derived from fungal iPKSs. This fungi-based PKS prediction model was initially tested on the well-annotated proteome of Fusarium graminearum, identifying 15 iPKSs that matched previous predictions and gene disruption studies. These fungi-based pHMMs were subsequently applied to the predicted fungal proteomes of Alternaria brassicicola, Fusarium oxysporum f.sp. lycopersici, Verticillium albo-atrum and Verticillium dahliae. The iPKSs predicted were compared against those predicted by the currently available mixed-kingdom PKS models that include both bacterial and fungal sequences. These mixed-kingdom models have been proven previously by others to be better in predicting true iPKSs from non-iPKSs compared with other available models (e.g. Pfam and TIGRFAM). The fungi-based model was found to perform significantly better on fungal proteomes than the mixed-kingdom PKS model in accuracy, sensitivity, specificity and precision. In addition, the model was capable of predicting the reducing nature of fungal iPKSs by comparison of the bit scores obtained from two separate reducing and nonreducing pHMMs for each domain, which was confirmed by phylogenetic analysis of the KS domain. Biological confirmation of the predictions was obtained by polymerase chain reaction (PCR) amplification of the KS and AT domains of predicted iPKSs from V. dahliae using domain-specific primers and genomic DNA, followed by sequencing of the PCR products. It is expected that the fungi-based PKS model will prove to be a useful tool for the identification and annotation of fungal PKSs from predicted proteomes.
international conference on machine learning and applications | 2011
Loai M. Alnemer; Omar Al-Azzam; Charith Chitraranjan; Anne M. Denton; Filippo M. Bassi; Muhammad J. Iqbal; Shahryar F. Kianian
In data mining applications it is common to have more than one data source available to describe the same record. For example, in biological sciences, the same genes may be characterized through many types of experiments. Which of the data sources proves to be most reliable in predictions may depend on the record in question. For some records pieces of information may be unavailable because an experiment has not yet been done, or certain type of inferences may not be applicable, such as when a gene does not have a homologue in some species. We demonstrate how multi-classifier systems can allow classification in cases where any individual source is scarce or unreliable to provide an accurate prediction model by itself. We propose a method to predict a class label using statistical significance of individual classification results. We show that the proposed approach increases the accuracy of results compared with conventional techniques in a problem related to gene mapping in wheat.
Archive | 2014
Omar Al-Azzam; Jianfei Wu; Loai Al-Nimer; Charith Chitraranjan; Anne M. Denton
A large part of scientific knowledge is confined to the text of publications. An algorithm is presented for distinguishing those pieces of information that can be predicted from the text of publication abstracts from those, for successes in prediction are spurious. The significance of relationships between textual data and information that is represented in standardized ontologies and protein domains is evaluated using a density-based approach. The approach also integrates a weighting system to account for many-to-many relationships between the abstracts and the genes they represent as well as between genes and the items that describe them. We evaluate the approach using data related from the model species yeast, and show that our results are in better agreement with biological expectations than a comparison algorithm.
international conference on computational advances in bio and medical sciences | 2013
Raed I. Seetan; Anne M. Denton; Omar Al-Azzam; Ajay Kumar; M. Javed Iqbal; Shahryar F. Kianian
Radiation Hybrid Mapping (RHM) provides a means of ordering markers on a chromosome. The process of mapping markers is equivalent to the traveling salesman problem, and thereby has combinatorial complexity. Addressing this computational complexity for a large number of markers is one problem addressed in the presented research. A related problem is that the quality of mapping information differs across markers. If unreliable markers are included in the mapping process, the overall accuracy of the map is decreased. A common approach to the latter or both problems is to start by building a framework map that only uses the most reliable markers. However, we will show that commonly used framework building techniques neither have the necessary chromosome coverage nor map quality, and most are prohibitively slow. The proposed approaches use a divide and conquer strategy that allows the mapping to be done more computationally efficiently, and that reduces the effect of unreliable markers on the map construction. Data from RHM of the human genome are used to test and evaluate the proposed approaches comparing the generated framework maps with physical maps and other framework maps. The proposed maps show good coverage of the chromosomes and high agreement with the physical map marker order.
Collaboration
Dive into the Omar Al-Azzam's collaboration.
International Center for Agricultural Research in the Dry Areas
View shared research outputs