Is this you? Create Your Porfile

Alok Sharma

University of the South Pacific

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alok Sharma is active.

Explore More

Publication

Featured researches published by Alok Sharma.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2012

A Top-r Feature Selection Algorithm for Microarray Gene Expression Data

Alok Sharma; Seiya Imoto; Satoru Miyano

Most of the conventional feature selection algorithms have a drawback whereby a weakly ranked gene that could perform well in terms of classification accuracy with an appropriate subset of genes will be left out of the selection. Considering this shortcoming, we propose a feature selection algorithm in gene expression data analysis of sample classifications. The proposed algorithm first divides genes into subsets, the sizes of which are relatively small (roughly of size h), then selects informative smaller subsets of genes (of size r <; h) from a subset and merges the chosen genes with another gene subset (of size r) to update the gene subset. We repeat this process until all subsets are merged into one informative subset. We illustrate the effectiveness of the proposed algorithm by analyzing three distinct gene expression data sets. Our method shows promising classification accuracy for all the test data sets. We also show the relevance of the selected genes in terms of their biological functions.

Journal of Theoretical Biology | 2013

A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition

Alok Sharma; James Lyons; Abdollah Dehzangi; Kuldip Kumar Paliwal

Discovering a three dimensional structure of a protein is a challenging task in biological science. Classifying a protein into one of its folds is an intermediate step for deciphering the three dimensional protein structure. The protein fold recognition can be done by developing feature extraction techniques to accurately extract all the relevant information from a protein sequence and then by employing a suitable classifier to label an unknown protein. Several feature extraction techniques have been developed in the past but with limited recognition accuracy only. In this work, we have developed a feature extraction technique which is based on bi-grams computed directly from Position Specific Scoring Matrices and demonstrated its effectiveness on a benchmark dataset. The proposed technique exhibits an absolute improvement of around 10% compared with existing feature extraction techniques.

International Journal of Machine Learning and Cybernetics | 2012

Null space based feature selection method for gene expression data

Alok Sharma; Seiya Imoto; Satoru Miyano; Vandana Sharma

Feature selection is quite an important process in gene expression data analysis. Feature selection methods discard unimportant genes from several thousands of genes for finding important genes or pathways for the target biological phenomenon like cancer. The obtained gene subset is used for statistical analysis for prediction such as survival as well as functional analysis for understanding biological characteristics. In this paper we propose a null space based feature selection method for gene expression data in terms of supervised classification. The proposed method discards the redundant genes by applying the information of null space of scatter matrices. We derive the method theoretically and demonstrate its effectiveness on several DNA gene expression datasets. The method is easy to implement and computationally efficient.

data and knowledge engineering | 2008

Cancer classification by gradient LDA technique using microarray gene expression data

Alok Sharma; Kuldip Kumar Paliwal

Cancer classification is one of the major applications of the microarray technology. When standard machine learning techniques are applied for cancer classification, they face the small sample size (SSS) problem of gene expression data. The SSS problem is inherited from large dimensionality of the feature space (due to large number of genes) compared to the small number of samples available. In order to overcome the SSS problem, the dimensionality of the feature space is reduced either through feature selection or through feature extraction. Linear discriminant analysis (LDA) is a well-known technique for feature extraction-based dimensionality reduction. However, this technique cannot be applied for cancer classification because of the singularity of the within-class scatter matrix due to the SSS problem. In this paper, we use Gradient LDA technique which avoids the singularity problem associated with the within-class scatter matrix and shown its usefulness for cancer classification. The technique is applied on three gene expression datasets; namely, acute leukemia, small round blue-cell tumour (SRBCT) and lung adenocarcinoma. This technique achieves lower misclassification error as compared to several other previous techniques.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2013

A Combination of Feature Extraction Methods with an Ensemble of Different Classifiers for Protein Structural Class Prediction Problem

Abdollah Dehzangi; Kuldip Kumar Paliwal; Alok Sharma; Omid Dehzangi; Abdul Sattar

Better understanding of structural class of a given protein reveals important information about its overall folding type and its domain. It can also be directly used to provide critical information on general tertiary structure of a protein which has a profound impact on protein function determination and drug design. Despite tremendous enhancements made by pattern recognition-based approaches to solve this problem, it still remains as an unsolved issue for bioinformatics that demands more attention and exploration. In this study, we propose a novel feature extraction model that incorporates physicochemical and evolutionary-based information simultaneously. We also propose overlapped segmented distribution and autocorrelation-based feature extraction methods to provide more local and global discriminatory information. The proposed feature extraction methods are explored for 15 most promising attributes that are selected from a wide range of physicochemical-based attributes. Finally, by applying an ensemble of different classifiers namely, Adaboost.M1, LogitBoost, naive Bayes, multilayer perceptron (MLP), and support vector machine (SVM) we show enhancement of the protein structural class prediction accuracy for four popular benchmarks.

Pattern Recognition | 2012

A new perspective to null linear discriminant analysis method and its fast implementation using random matrix multiplication with scatter matrices

Alok Sharma; Kuldip Kumar Paliwal

Null linear discriminant analysis (LDA) method is a popular dimensionality reduction method for solving small sample size problem. The implementation of null LDA method is, however, computationally very expensive. In this paper, we theoretically derive the null LDA method from a different perspective and present a computationally efficient implementation of this method. Eigenvalue decomposition (EVD) of ST^+SB (where SB is the between-class scatter matrix and ST^+ is the pseudoinverse of the total scatter matrix ST) is shown here to be a sufficient condition for the null LDA method. As EVD of ST^+SBis computationally expensive, we show that the utilization of random matrix together with ST^+SB is also a sufficient condition for null LDA method. This condition is used here to derive a computationally fast implementation of the null LDA method. We show that the computational complexity of the proposed implementation is significantly lower than the other implementations of the null LDA method reported in the literature. This result is also confirmed by conducting classification experiments on several datasets.

IEEE Transactions on Nanobioscience | 2014

A Tri-Gram Based Feature Extraction Technique Using Linear Probabilities of Position Specific Scoring Matrix for Protein Fold Recognition

Kuldip Kumar Paliwal; Alok Sharma; James Lyons; Abdollah Dehzangi

In biological sciences, the deciphering of a three dimensional structure of a protein sequence is considered to be an important and challenging task. The identification of protein folds from primary protein sequences is an intermediate step in discovering the three dimensional structure of a protein. This can be done by utilizing feature extraction technique to accurately extract all the relevant information followed by employing a suitable classifier to label an unknown protein. In the past, several feature extraction techniques have been developed but with limited recognition accuracy only. In this study, we have developed a feature extraction technique based on tri-grams computed directly from Position Specific Scoring Matrices. The effectiveness of the feature extraction technique has been shown on two benchmark datasets. The proposed technique exhibits up to 4.4% improvement in protein fold recognition accuracy compared to the state-of-the-art feature extraction techniques.

BMC Bioinformatics | 2013

A strategy to select suitable physicochemical attributes of amino acids for protein fold recognition

Alok Sharma; Kuldip Kumar Paliwal; Abdollah Dehzangi; James Lyons; Seiya Imoto; Satoru Miyano

BackgroundAssigning a protein into one of its folds is a transitional step for discovering three dimensional protein structure, which is a challenging task in bimolecular (biological) science. The present research focuses on: 1) the development of classifiers, and 2) the development of feature extraction techniques based on syntactic and/or physicochemical properties.ResultsApart from the above two main categories of research, we have shown that the selection of physicochemical attributes of the amino acids is an important step in protein fold recognition and has not been explored adequately. We have presented a multi-dimensional successive feature selection (MD-SFS) approach to systematically select attributes. The proposed method is applied on protein sequence data and an improvement of around 24% in fold recognition has been noted when selecting attributes appropriately.ConclusionThe MD-SFS has been applied successfully in selecting physicochemical attributes of the amino acids. The selected attributes show improved protein fold recognition performance.

machine vision applications | 2014

A feature selection method using improved regularized linear discriminant analysis

Alok Sharma; Kuldip Kumar Paliwal; Seiya Imoto; Satoru Miyano

Investigation of genes, using data analysis and computer-based methods, has gained widespread attention in solving human cancer classification problem. DNA microarray gene expression datasets are readily utilized for this purpose. In this paper, we propose a feature selection method using improved regularized linear discriminant analysis technique to select important genes, crucial for human cancer classification problem. The experiment is conducted on several DNA microarray gene expression datasets and promising results are obtained when compared with several other existing feature selection methods.

IEEE Transactions on Nanobioscience | 2015

Predict Gram-Positive and Gram-Negative Subcellular Localization via Incorporating Evolutionary Information and Physicochemical Features Into Chou's General PseAAC

Ronesh Sharma; Abdollah Dehzangi; James Lyons; Kuldip Kumar Paliwal; Tatsuhiko Tsunoda; Alok Sharma

In this study, we used structural and evolutionary based features to represent the sequences of gram-positive and gram-negative subcellular localizations. To do this, we proposed a normalization method to construct a normalize Position Specific Scoring Matrix (PSSM) using the information from original PSSM. To investigate the effectiveness of the proposed method we compute feature vectors from normalize PSSM and by applying support vector machine (SVM) and naïve Bayes classifier, respectively, we compared achieved results with the previously reported results. We also computed features from original PSSM and normalized PSSM and compared their results. The archived results show enhancement in gram-positive and gram-negative subcellular localizations. Evaluating localization for each feature, our results indicate that employing SVM and concatenating features (amino acid composition feature, Dubchak feature (physicochemical-based features), normalized PSSM based auto-covariance feature and normalized PSSM based bigram feature) have higher accuracy while employing naïve Bayes classifier with normalized PSSM based auto-covariance feature proves to have high sensitivity for both benchmarks. Our reported results in terms of overall locative accuracy is 84.8% and overall absolute accuracy is 85.16% for gram-positive dataset; and, for gram-negative dataset, overall locative accuracy is 85.4% and overall absolute accuracy is 86.3%.

Explore More