Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Paul D. Yoo is active.

Publication


Featured researches published by Paul D. Yoo.


IEEE Transactions on Nanobioscience | 2008

DomNet: Protein Domain Boundary Prediction Using Enhanced General Regression Network and New Profiles

Paul D. Yoo; Abdur R. Sikder; Javid Taheri; Bing Bing Zhou; Albert Y. Zomaya

The accurate and stable prediction of protein domain boundaries is an important avenue for the prediction of protein structure, function, evolution, and design. Recent research on protein domain boundary prediction has been mainly based on widely known machine learning techniques. In this paper, we propose a new machine learning based domain predictor namely, DomNet that can show a more accurate and stable predictive performance than the existing state-of-the-art models. The DomNet is trained using a novel compact domain profile, secondary structure, solvent accessibility information, and interdomain linker index to detect possible domain boundaries for a target sequence. The performance of the proposed model was compared to nine different machine learning models on the Benchmark_2 dataset in terms of accuracy, sensitivity, specificity, and correlation coefficient. The DomNet achieved the best performance with 71% accuracy for domain boundary identification in multidomains proteins. With the CASP7 benchmark dataset, it again demonstrated superior performance to contemporary domain boundary predictors such as DOMpro, DomPred, DomSSEA, DomCut, and DomainDiscovery.


IEEE Transactions on Vehicular Technology | 2014

Opportunistic Spectrum Access in Cognitive Radio Networks Under Imperfect Spectrum Sensing

Omar Altrad; Sami Muhaidat; Arafat J. Al-Dweik; Abdallah Shami; Paul D. Yoo

In this paper, we investigate the effect of imperfect sensing on the performance of opportunistic spectrum access (OSA) in cognitive radio (CR) networks. We consider a system modeled as a continuous-time Markov chain (CTMC), and then evaluate its performance in terms of the probabilities of users being blocked or dropped. Our results demonstrate that the performance of the underlying system significantly degrades when imperfect sensing is considered; thus, there is a pressing need for a reliable spectrum sensing scheme to improve the overall quality of service in practical scenarios. A simulation study is presented to corroborate the analytical results and to demonstrate the performance of OSA under imperfect sensing conditions.


IEEE Transactions on Systems, Man, and Cybernetics | 2014

Sample Subset Optimization Techniques for Imbalanced and Ensemble Learning Problems in Bioinformatics Applications

Pengyi Yang; Paul D. Yoo; Juanita Fernando; Bing Bing Zhou; Zili Zhang; Albert Y. Zomaya

Data sampling is a widely used technique in a broad range of machine learning problems. Traditional sampling approaches generally rely on random resampling from a given dataset. However, these approaches do not take into consideration additional information, such as sample quality and usefulness. We recently proposed a data sampling technique, called sample subset optimization (SSO). The SSO technique relies on a cross-validation procedure for identifying and selecting the most useful samples as subsets. In this paper, we describe the application of SSO techniques to imbalanced and ensemble learning problems, respectively. For imbalanced learning, the SSO technique is employed as an under-sampling technique for identifying a subset of highly discriminative samples in the majority class. In ensemble learning, the SSO technique is utilized as a generic ensemble technique where multiple optimized subsets of samples from each class are selected for building an ensemble classifier. We demonstrate the utilities and advantages of the proposed techniques on a variety of bioinformatics applications where class imbalance, small sample size, and noisy data are prevalent.


BMC Bioinformatics | 2008

Improved general regression network for protein domain boundary prediction

Paul D. Yoo; Abdur R. Sikder; Bing Bing Zhou; Albert Y. Zomaya

BackgroundProtein domains present some of the most useful information that can be used to understand protein structure and functions. Recent research on protein domain boundary prediction has been mainly based on widely known machine learning techniques, such as Artificial Neural Networks and Support Vector Machines. In this study, we propose a new machine learning model (IGRN) that can achieve accurate and reliable classification, with significantly reduced computations. The IGRN was trained using a PSSM (Position Specific Scoring Matrix), secondary structure, solvent accessibility information and inter-domain linker index to detect possible domain boundaries for a target sequence.ResultsThe proposed model achieved average prediction accuracy of 67% on the Benchmark_2 dataset for domain boundary identification in multi-domains proteins and showed superior predictive performance and generalisation ability among the most widely used neural network models. With the CASP7 benchmark dataset, it also demonstrated comparable performance to existing domain boundary predictors such as DOMpro, DomPred, DomSSEA, DomCut and DomainDiscovery with 70.10% prediction accuracy.ConclusionThe performance of proposed model has been compared favourably to the performance of other existing machine learning based methods as well as widely known domain boundary predictors on two benchmark datasets and excels in the identification of domain boundaries in terms of model bias, generalisation and computational requirements.


IEEE Transactions on Systems, Man, and Cybernetics | 2016

Data Randomization and Cluster-Based Partitioning for Botnet Intrusion Detection

Omar Y. Al-Jarrah; Omar Alhussein; Paul D. Yoo; Sami Muhaidat; Kamal Taha; Kwangjo Kim

Botnets, which consist of remotely controlled compromised machines called bots, provide a distributed platform for several threats against cyber world entities and enterprises. Intrusion detection system (IDS) provides an efficient countermeasure against botnets. It continually monitors and analyzes network traffic for potential vulnerabilities and possible existence of active attacks. A payload-inspection-based IDS (PI-IDS) identifies active intrusion attempts by inspecting transmission control protocol and user datagram protocol packets payload and comparing it with previously seen attacks signatures. However, the PI-IDS abilities to detect intrusions might be incapacitated by packet encryption. Traffic-based IDS (T-IDS) alleviates the shortcomings of PI-IDS, as it does not inspect packet payload; however, it analyzes packet header to identify intrusions. As the networks traffic grows rapidly, not only the detection-rate is critical, but also the efficiency and the scalability of IDS become more significant. In this paper, we propose a state-of-the-art T-IDS built on a novel randomized data partitioned learning model (RDPLM), relying on a compact network feature set and feature selection techniques, simplified subspacing and a multiple randomized meta-learning technique. The proposed model has achieved 99.984% accuracy and 21.38 s training time on a well-known benchmark botnet dataset. Experiment results demonstrate that the proposed methodology outperforms other well-known machine-learning models used in the same detection task, namely, sequential minimal optimization, deep neural network, C4.5, reduced error pruning tree, and randomTree.


BMC Bioinformatics | 2008

SiteSeek: Post-translational modification analysis using adaptive locality-effective kernel methods and new profiles

Paul D. Yoo; Yung Shwen Ho; Bing Bing Zhou; Albert Y. Zomaya

BackgroundPost-translational modifications have a substantial influence on the structure and functions of protein. Post-translational phosphorylation is one of the most common modification that occur in intracellular proteins. Accurate prediction of protein phosphorylation sites is of great importance for the understanding of diverse cellular signalling processes in both the human body and in animals. In this study, we propose a new machine learning based protein phosphorylation site predictor, SiteSeek. SiteSeek is trained using a novel compact evolutionary and hydrophobicity profile to detect possible protein phosphorylation sites for a target sequence. The newly proposed method proves to be more accurate and exhibits a much stable predictive performance than currently existing phosphorylation site predictors.ResultsThe performance of the proposed model was compared to nine existing different machine learning models and four widely known phosphorylation site predictors with the newly proposed PS-Benchmark_1 dataset to contrast their accuracy, sensitivity, specificity and correlation coefficient. SiteSeek showed better predictive performance with 86.6% accuracy, 83.8% sensitivity, 92.5% specificity and 0.77 correlation-coefficient on the four main kinase families (CDK, CK2, PKA, and PKC).ConclusionOur newly proposed methods used in SiteSeek were shown to be useful for the identification of protein phosphorylation sites as it performed much better than widely known predictors on the newly built PS-Benchmark_1 dataset.


BMC Genomics | 2010

Hierarchical kernel mixture models for the prediction of AIDS disease progression using HIV structural gp120 profiles

Paul D. Yoo; Yung Shwen Ho; Jason W. P. Ng; Michael A. Charleston; Nitin K. Saksena; Pengyi Yang; Albert Y. Zomaya

Changes to the glycosylation profile on HIV gp120 can influence viral pathogenesis and alter AIDS disease progression. The characterization of glycosylation differences at the sequence level is inadequate as the placement of carbohydrates is structurally complex. However, no structural framework is available to date for the study of HIV disease progression. In this study, we propose a novel machine-learning based framework for the prediction of AIDS disease progression in three stages (RP, SP, and LTNP) using the HIV structural gp120 profile. This new intelligent framework proves to be accurate and provides an important benchmark for predicting AIDS disease progression computationally. The model is trained using a novel HIV gp120 glycosylation structural profile to detect possible stages of AIDS disease progression for the target sequences of HIV+ individuals. The performance of the proposed model was compared to seven existing different machine-learning models on newly proposed gp120-Benchmark_1 dataset in terms of error-rate (MSE), accuracy (CCI), stability (STD), and complexity (TBM). The novel framework showed better predictive performance with 67.82% CCI, 30.21 MSE, 0.8 STD, and 2.62 TBM on the three stages of AIDS disease progression of 50 HIV+ individuals. This framework is an invaluable bioinformatics tool that will be useful to the clinical assessment of viral pathogenesis.


BMC Genomics | 2009

A modular kernel approach for integrative analysis of protein domain boundaries.

Paul D. Yoo; Bing Bing Zhou; Albert Y. Zomaya

BackgroundIn this paper, we introduce a novel inter-range interaction integrated approach for protein domain boundary prediction. It involves (1) the design of modular kernel algorithm, which is able to effectively exploit the information of non-local interactions in amino acids, and (2) the development of a novel profile that can provide suitable information to the algorithm. One of the key features of this profiling technique is the use of multiple structural alignments of remote homologues to create an extended sequence profile and combines the structural information with suitable chemical information that plays an important role in protein stability. This profile can capture the sequence characteristics of an entire structural superfamily and extend a range of profiles generated from sequence similarity alone.ResultsOur novel profile that combines homology information with hydrophobicity from SARAH1 scale was successful in providing more structural and chemical information. In addition, the modular approach adopted in our algorithm proved to be effective in capturing information from non-local interactions. Our approach achieved 82.1%, 50.9% and 31.5% accuracies for one-domain, two-domain, and three- and more domain proteins respectively.ConclusionThe experimental results in this study are encouraging, however, more work is need to extend it to a broader range of applications. We are currently developing a novel interactive (human in the loop) profiling that can provide information from more distantly related homology. This approach will further enhance the current study.


IEEE Transactions on Industrial Informatics | 2015

Simplified Subspaced Regression Network for Identification of Defect Patterns in Semiconductor Wafer Maps

Fatima Adly; Omar Alhussein; Paul D. Yoo; Yousof Al-Hammadi; Kamal Taha; Sami Muhaidat; Young-Seon Jeong; Ui-Hyoung Lee; Mohammed Ismail

Wafer defects, which are primarily defective chips on a wafer, are of the key challenges facing the semiconductor manufacturing companies, as they could increase the yield losses to hundreds of millions of dollars. Fortunately, these wafer defects leave unique patterns due to their spatial dependence across wafer maps. It is thus possible to identify and predict them in order to find the point of failure in the manufacturing process accurately. This paper introduces a novel simplified subspaced regression framework for the accurate and efficient identification of defect patterns in semiconductor wafer maps. It can achieve a test error comparable to or better than the state-of-the-art machine-learning (ML)-based methods, while maintaining a low computational cost when dealing with large-scale wafer data. The effectiveness and utility of the proposed approach has been demonstrated by our experiments on real wafer defect datasets, achieving detection accuracy of 99.884% and R2 of 99.905%, which are far better than those of any existing methods reported in the literature.


global communications conference | 2014

A unified approach for representing wireless channels using EM-based finite mixture of gamma distributions

Omar Alhussein; Sami Muhaidat; Jie Liang; Paul D. Yoo

We present a unified framework to evaluate the error rate performance of wireless networks over generalized fading channels. In particular, we propose a new approach to represent different fading distributions by mixture of Gamma distributions. The new approach relies on the expectation-maximization (EM) algorithm in conjunction with the so-called Newton-Raphson maximization algorithm. We show that our model provides similar performance to other existing state-of-art models in both accuracy and simplicity, where accuracy is analyzed by means of mean square error (MSE). In addition, we demonstrate that this algorithm may potentially approximate any fading channel, and thus we utilize it to model both composite and non-composite fading models. We derive novel closed form expression of the raw moments of a dual-hop fixed-gain cooperative network. We also study the effective capacity of the end-to-end SNR in such networks. Numerical simulation results are provided to corroborate the analytical findings.

Collaboration


Dive into the Paul D. Yoo's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Abdallah Shami

University of Western Ontario

View shared research outputs
Researchain Logo
Decentralizing Knowledge