Karthikeyan Subbiah
Banaras Hindu University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Karthikeyan Subbiah.
Computers in Biology and Medicine | 2013
Abhigyan Nath; Radha Chaube; Karthikeyan Subbiah
Antifreeze proteins (AFPs) prevent the growth of ice-crystals in order to enable certain organisms to survive under sub-zero temperature surroundings. These AFPs have evolved from different types of proteins without having any significant structural and sequence similarities among them. However, all the AFPs perform the same function of anti-freeze activity and are a classical example of convergent evolution. We have analyzed fish AFPs at the sequence level, the residue level and the physicochemical property group composition to discover molecular basis for this convergent evolution. Our study on amino acid distribution does not reveal any distinctive feature among AFPs, but comparative study of the AFPs with their close non-AFP homologs based on the physicochemical property group residues revealed some useful information. In particular (a) there is a similar pattern of avoidance and preference of amino acids in Fish AFP subtypes II, III and IV-Aromatic residues are avoided whereas small residues are preferred, (b) like other psychrophilic proteins, AFPs have a similar pattern of preference/avoidance for most of the residues except for Ile, Leu and Arg, and (c) most of the computed amino acids in preferred list are the key functional residues as obtained in previous predicted model of Doxey et al. For the first time this study revealed common patterns of avoidance/preference in fish AFP subtypes II, III and IV. These avoidance/preference lists can further facilitate the identification of key functional residues and can shed more light into the mechanism of antifreeze function.
Computational Biology and Chemistry | 2015
Abhigyan Nath; Karthikeyan Subbiah
Lipocalins are short in sequence length and perform several important biological functions. These proteins are having less than 20% sequence similarity among paralogs. Experimentally identifying them is an expensive and time consuming process. The computational methods based on the sequence similarity for allocating putative members to this family are also far elusive due to the low sequence similarity existing among the members of this family. Consequently, the machine learning methods become a viable alternative for their prediction by using the underlying sequence/structurally derived features as the input. Ideally, any machine learning based prediction method must be trained with all possible variations in the input feature vector (all the sub-class input patterns) to achieve perfect learning. A near perfect learning can be achieved by training the model with diverse types of input instances belonging to the different regions of the entire input space. Furthermore, the prediction performance can be improved through balancing the training set as the imbalanced data sets will tend to produce the prediction bias towards majority class and its sub-classes. This paper is aimed to achieve (i) the high generalization ability without any classification bias through the diversified and balanced training sets as well as (ii) enhanced the prediction accuracy by combining the results of individual classifiers with an appropriate fusion scheme. Instead of creating the training set randomly, we have first used the unsupervised Kmeans clustering algorithm to create diversified clusters of input patterns and created the diversified and balanced training set by selecting an equal number of patterns from each of these clusters. Finally, probability based classifier fusion scheme was applied on boosted random forest algorithm (which produced greater sensitivity) and K nearest neighbour algorithm (which produced greater specificity) to achieve the enhanced predictive performance than that of individual base classifiers. The performance of the learned models trained on Kmeans preprocessed training set is far better than the randomly generated training sets. The proposed method achieved a sensitivity of 90.6%, specificity of 91.4% and accuracy of 91.0% on the first test set and sensitivity of 92.9%, specificity of 96.2% and accuracy of 94.7% on the second blind test set. These results have established that diversifying training set improves the performance of predictive models through superior generalization ability and balancing the training set improves prediction accuracy. For smaller data sets, unsupervised Kmeans based sampling can be an effective technique to increase generalization than that of the usual random splitting method.
Computational Biology and Chemistry | 2014
Abhigyan Nath; Karthikeyan Subbiah
Organisms thriving at extreme cold surroundings are called as psychrophiles and they present a wealth of knowledge about sequence adjustments in proteins that had occurred during the adaptation to low temperatures. In this paper, we propose a new cascading model to investigate the basis for psychrophilicity. In this model, a superior classifier was used to discriminate psychrophilic from mesophilic protein sequences, and then the PART rule generating algorithm was applied on the input instances that are correctly classified by the classifier, to generate human interpretable rules. These derived rules were further validated on a structural dataset and finally analyzed to discover the underlying biological basis about the psychrophilicity. In this study, we have used one of the key features of psychrophilic proteins accountable for remaining functional in extreme cold temperature surroundings i.e., global patterns of amino acid composition as the input features. The rotation forest classifier outperformed all the other classifiers with maximum accuracy of 70.5% and maximum AUC of 0.78. The effect of sequence length on the classification accuracy was also investigated. The analysis of the derived rules and interpretation of the analyzed results had revealed some interesting phenomena such as the amino acids A, D, G, F, and S are over-represented, and T is under-represented in psychrophilic proteins. These findings augment the existing domain knowledge for psychrophilic sequence features.
Computers in Biology and Medicine | 2016
Abhigyan Nath; Karthikeyan Subbiah
Bioluminescence plays an important role in nature, for example, it is used for intracellular chemical signalling in bacteria. It is also used as a useful reagent for various analytical research methods ranging from cellular imaging to gene expression analysis. However, identification and annotation of bioluminescent proteins is a difficult task as they share poor sequence similarities among them. In this paper, we present a novel approach for within-class and between-class balancing as well as diversifying of a training dataset by effectively combining unsupervised K-Means algorithm with Synthetic Minority Oversampling Technique (SMOTE) in order to achieve the true performance of the prediction model. Further, we experimented by varying different levels of balancing ratio of positive data to negative data in the training dataset in order to probe for an optimal class distribution which produces the best prediction accuracy. The appropriately balanced and diversified training set resulted in near complete learning with greater generalization on the blind test datasets. The obtained results strongly justify the fact that optimal class distribution with a high degree of diversity is an essential factor to achieve near perfect learning. Using random forest as the weak learners in boosting and training it on the optimally balanced and diversified training dataset, we achieved an overall accuracy of 95.3% on a tenfold cross validation test, and an accuracy of 91.7%, sensitivity of 89. 3% and specificity of 91.8% on a holdout test set. It is quite possible that the general framework discussed in the current work can be successfully applied to other biological datasets to deal with imbalance and incomplete learning problems effectively.
international conference on computing, communication and automation | 2015
Sunil Kumar; Manish Kumar Pandey; Abhigyan Nath; Karthikeyan Subbiah; Manoj Kumar Singh
This is an era of Internet computing and computing as a service on the internet is called cloud computing. Mainly three services like SaaS (applications), PaaS, and IaaS are being accessed through internet on demand, pay as per usage basis. Quality of Service (QoS) is the main issue in internet based computing for service providers and user-dependent as well as user-independent QoS parameters. In the current work we compared different machine learning algorithms for predicting the response time and throughput QoS values using past usage data. Bagging and support vector machines are found to be better performing prediction methods in comparison with other learning algorithms.
computer and information technology | 2016
Manish Kumar Pandey; Karthikeyan Subbiah
Analytics of health big data are very crucial for providing cost effective quality health care. Over recent years, the analytics on healthcare big data has evolved into a challenging task for getting insights into a very large data set for improving the health services. This enormous amount of data, which is being generated incessantly over a long period of time, has put a great deal of stress on the write performance as well as on scalability. Moreover, there is a requirement of efficient storage and meaningful processing of these data which is an another challenging issue. The traditional relational databases, which were used in the storage of health data, are now unable to handle due to its massive and varied nature. Besides, these databases have some inherent weakness in terms of scalability, storing varied data format, etc. So there is a necessity for a new kind of data storage management system. This paper proposes a new big data storage architecture consisting of application cluster and a storage cluster to facilitate read/write/update speedup as well as data optimization. The application cluster is used to provide efficient storage and retrieval functions from the users. The storage services will be provided through the storage cluster.
Journal of Theoretical Biology | 2016
Abhigyan Nath; Karthikeyan Subbiah
Piezophiles are the organisms which can successfully survive at extreme pressure conditions. However, the molecular basis of piezophilic adaptation is still poorly understood. Analysis of the protein sequence adjustments that had taken place during evolution can help to reveal the sequence adaptation parameters responsible for protein functional and structural adaptation at such high pressure conditions. In this current work we have used SVM classifier for filtering strong instances and generated human interpretable rules from these strong instances by using the PART algorithm. These generated rules were analyzed for getting insights into the molecular signature patterns present in the piezophilic proteins. The experiments were performed on three different temperature ranges piezophilic groups, namely psychrophilic-piezophilic, mesophilic-piezophilic, and thermophilic-piezophilic for the detailed comparative study. The best classification results were obtained as we move up the temperature range from psychrophilic-piezophilic to thermophilic-piezophilic. Based on the physicochemical classification of amino acids and using feature ranking algorithms, hydrophilic and polar amino acid groups have higher discriminative ability for psychrophilic-piezophilic and mesophilic-piezophilic groups along with hydrophobic and nonpolar amino acids for the thermophilic-piezophilic groups. We also observed an overrepresentation of polar, hydrophilic and small amino acid groups in the discriminatory rules of all the three temperature range piezophiles along with aliphatic, nonpolar and hydrophobic groups in the mesophilic-piezophilic and thermophilic-piezophilic groups.
International Conference on Internet of Vehicles | 2016
Manish Kumar Pandey; Karthikeyan Subbiah
The devices are becoming ubiquitous and interconnected due to rapid advancements in computing and communication technology. The Internet of Vehicles (IoV) is one such example which consists of vehicles that converse with each other as well as with the public networks through V2V (vehicle-to-vehicle), V2P (vehicle-to-pedestrian) and V2I (vehicle-to-infrastructure) communications. The social relationships amongst vehicles create a social network where the participants are intelligent objects rather than the human beings and this leads to emergence of Social Internet of Vehicles (SIoV). The big data generated from these networks of devices are needed to be processed intelligently for making these systems smart. The security and privacy issues such as authentication and recognition attacks, accessibility attacks, privacy attacks, routing attacks, data genuineness attacks etc. are to be addressed to make these cyber physical network systems very reliable. This paper presents a comprehensive survey on SIoV and proposes a novel social recommendation model that could establish links between social networking and SIoV for reliable exchange of information and intelligently analyze the information to draw authentic conclusions for making right assessment. The future Intelligent IoV system which should be capable to learn and explore the cyber physical system could be designed.
Archive | 2018
Manish Kumar Pandey; Karthikeyan Subbiah
There is an immense concern on our vigilance for controlling the spread of pandemics such as Ebola, Zika, and H1N1 etc. through state of art technology. The dynamics become very complex of epidemics in sweeping population. Efficient descriptive, predictive, preventive and prescriptive analyses on the huge data generated by SMAC are very crucial for valuable arrangement and associated responsive tactics. In this paper, we have proposed the use of machine learning techniques for performance evaluation of time series forecasting of Ebola casualties. By experimenting without lag creation, we achieved the best results in the MAE of 7.85%, RMSE value of 61.14%, and Direction Accuracy of 85.99% with Random Tree Classifier. Thus we can conclude that by using these models for forecasting epidemic spread and developing public health policies leads the health authorities to ensure the appropriate actions for the control of the outbreak.
Archive | 2018
Anoop Kumar Tiwari; Shivam Shreevastava; Karthikeyan Subbiah; Tanmoy Som
In this paper, the learning performance of different machine learning algorithms is investigated by applying fuzzy-rough feature selection (FRFS) technique on optimally balanced training and testing sets, consisting of the piezophilic and nonpiezophilic proteins. By experimenting using FRFS technique followed by Synthetic Minority Over-sampling Technique (SMOTE) at optimal balancing ratios, we obtain the best results by achieving sensitivity of 79.60%, specificity of 74.50%, average accuracy of 77.10%, AUC of 0.841, and MCC of 0.542 with random forest algorithm. The ranking of input features according to their differentiating ability of piezophilic and nonpiezophilic proteins is presented by using fuzzy-rough attribute evaluator. From the results, it is observed that the performance of classification algorithms can be improved by selecting the reduced optimally balanced training and testing sets. This can be obtained by selecting the relevant and non-redundant features from training sets using FRFS approach followed by suitably modifying the class distribution.