Siddhaling Urolagin
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Siddhaling Urolagin.
PLOS ONE | 2015
Abhishek Niroula; Siddhaling Urolagin; Mauno Vihinen
More reliable and faster prediction methods are needed to interpret enormous amounts of data generated by sequencing and genome projects. We have developed a new computational tool, PON-P2, for classification of amino acid substitutions in human proteins. The method is a machine learning-based classifier and groups the variants into pathogenic, neutral and unknown classes, on the basis of random forest probability score. PON-P2 is trained using pathogenic and neutral variants obtained from VariBench, a database for benchmark variation datasets. PON-P2 utilizes information about evolutionary conservation of sequences, physical and biochemical properties of amino acids, GO annotations and if available, functional annotations of variation sites. Extensive feature selection was performed to identify 8 informative features among altogether 622 features. PON-P2 consistently showed superior performance in comparison to existing state-of-the-art tools. In 10-fold cross-validation test, its accuracy and MCC are 0.90 and 0.80, respectively, and in the independent test, they are 0.86 and 0.71, respectively. The coverage of PON-P2 is 61.7% in the 10-fold cross-validation and 62.1% in the test dataset. PON-P2 is a powerful tool for screening harmful variants and for ranking and prioritizing experimental characterization. It is very fast making it capable of analyzing large variant datasets. PON-P2 is freely available at http://structure.bmc.lu.se/PON-P2/.
Human Mutation | 2014
Heidi Ali; Siddhaling Urolagin; Omer Gurarslan; Mauno Vihinen
Many proteins contain intrinsically disordered regions, which may be crucial for function, but on the other hand be related to the pathogenicity of variants. Prediction programs have been developed to detect disordered regions from sequences and used to predict the consequences of variants, although their performance for this task has not been assessed. We tested the performance of protein disorder prediction programs in detecting changes to disorder caused by amino acid substitutions. We assessed the performance of 29 protein disorder predictors and versions with 101 amino acid substitutions, whose effects have been experimentally validated. Disorder predictors detected the true positives at most with 6% success rate and true negatives with 34% rate for variants. The corresponding rates for the wild‐type forms are 7% and 90%, respectively. The analysis revealed that disorder programs cannot reliably predict the effects of substitutions; consequently, the tested methods, and possibly similar programs, cannot be recommended for variant analysis without other information indicating to the relevance of disorder. These results inspired us to develop a new method, PON‐Diso (http://structure.bmc.lu.se/PON‐Diso), for disorder‐related amino acid substitutions. With 50% success rate for independent test set and 70.5% rate in cross‐validation, it outperforms the evaluated methods.
international conference on advanced computing | 2011
Siddhaling Urolagin; K. V. Prema; N. V. Subba Reddy
In any real world application, the performance of Artificial Neural Networks (ANN) is mostly depends upon its generalization capability. Generalization of the ANN is ability to handle unseen data. The generalization capability of the network is mostly determined by system complexity and training of the network. Poor generalization is observed when the network is over-trained or system complexity (or degree of freedom) is relatively more than the training data. A smaller network which can fit the data will have the k good generalization ability. Network parameter pruning is one of the promising methods to reduce the degree of freedom of a network and hence improve its generalization. In recent years various pruning methods have been developed and found effective in real world applications. Next, it is important to estimate the improvement in generalization and rate of improvement as pruning being incorporated in the network. A method is developed in this research to evaluate generalization capability and rate of convergence towards the generalization. Using the proposed method, experiments have been conducted to evaluate Multi-Layer Perceptron neural network with pruning being incorporated for handwritten numeral recognition.
international conference on industrial and information systems | 2010
Siddhaling Urolagin; K. V. Prema; N. V. Subba Reddy
OCR system plays an important role in automatic identification of a script in a given document image, which provides important applications. A country like India, most of the people use more than one language in their day to day life; the requirement of OCR system is very much essential. There is not much work in developing OCR system for south Indian languages such as Kannada are reported in the literature. Recognition of the Kannada character is more complex and challenging, because it has large set of character with more similarity in properties among characters and characters belonging to same class have higher variability among different set of fonts. Moreover, the Kannada characters are formed by combination of basic symbols; a natural approach for recognition is to segment characters into basic symbol and recognize each symbol subsequently. Therefore a character level segmentation method is highly desirable. Also precise segmentation will certainly reduce the number of classes to recognize. The naïve use of characters images for segmentation may not yield more accurate results. With previous studies which have confirmed that the multi-channel Gabor decomposition represents an excellent tool for image segmentation and texture analysis, we propose a novel character segmentation method using Gabor filters. On comparing with manually segmented benchmark data we obtained overall accuracy of 93.82%.
computational intelligence | 2007
Siddhaling Urolagin; K.V. Prema; N.V.S. Reddy
The project of converting Indian language document to Bharti Braille script has many challenges. The illumination invariant character recognition is one of such challenge which is addressed in this paper. The Gabor features provide illumination invariance up to certain extend, but in recent developments such as local binary pattern and binarizing the directional filters response and then computing features from them have made feature highly tolerant to lighting changes. In this context we are proposing the new idea of binarized Gabor feature which is to binarize the Gabor response then compute directional features using a grid structure. To binarize Gabor response we are proposing a threshold such that most vital part of response is highlighted in its binary form. We are demonstrating the feature extraction technique for numeral recognition. The database consists of 1260 scanned numeral images at different scanning parameters and 12000 generated numeral images with varying intensity. The binarized Gabor features are compared with Gabor features based on classification rates obtained. In all our experimental results better classification rates are observed for the proposed method.
International Journal of Computer Applications | 2012
Siddhaling Urolagin
Neural networks have found many applications in the real world. One of the important issues while designing the neural network is the size of the architecture. Dynamic learning algorithms aim to determine appropriate size of the network during learning phase. The dynamic learning algorithm by pruning involves in removing networks elements such as nodes, weights or biases from the network to reduce its size and make network size appropriate to solve a problem. In this paper two dynamic learning by pruning methods have been integrated with multilayer feed-forward neural network. The Optimal Brain Damage method is the connections (weights or biases) pruning method and Bottom Up Freezing method involves in freezing and pruning of nodes. The experiments have been conducted on MNIST handwritten database. The learning behavior of the multilayer feed-forward neural network integrated with OBD and BUF method has been analyzed.
international conference on industrial and information systems | 2010
Siddhaling Urolagin; K. V. Prema; N. V. Subba Reddy
In recent years, Gabor filters have found effective for feature extraction as they possess many properties such as tunable to specific orientation, spectrally localized, spatially localized etc. In this paper, a rotation invariant object recognition system is proposed using Gabor filters. A set of Gabor filters are considered and directional features are extracted from an image. A Gabor Vector Set is created from an unknown image sample, which may be rotated. A combined classification approach using K-Nearest Neighbor classifier and Minimum distance classifier is developed to predict the class label of the unknown sample. Experiments are conducted on electric component images which are rotated between 0° to 360° angle. An overall recognition rate of 96.02% is observed on database of size 3971 images.
International Journal of Molecular Sciences | 2018
Yang Yang; Siddhaling Urolagin; Abhishek Niroula; Xuesong Ding; Bairong Shen; Mauno Vihinen
Several methods have been developed to predict effects of amino acid substitutions on protein stability. Benchmark datasets are essential for method training and testing and have numerous requirements including that the data is representative for the investigated phenomenon. Available machine learning algorithms for variant stability have all been trained with ProTherm data. We noticed a number of issues with the contents, quality and relevance of the database. There were errors, but also features that had not been clearly communicated. Consequently, all machine learning variant stability predictors have been trained on biased and incorrect data. We obtained a corrected dataset and trained a random forests-based tool, PON-tstab, applicable to variants in any organism. Our results highlight the importance of the benchmark quality, suitability and appropriateness. Predictions are provided for three categories: stability decreasing, increasing and those not affecting stability.
international conference on control and automation | 2017
Siddhaling Urolagin
In present days, the social media and networking act as one of the key platforms for sharing information and opinions. Many people share ideas, express their view points and opinions on various topic of their interest. Social media text has rich information about the companies, their products and various services offered by them. In this research we focus exploring the association of sentiments of social media text and stock prices of a company. The tweets of several company has been extracted and performed sentiment classification using Naïve Bayes classifier and SVM classifier. To perform the classification, N-gram based feature vectors are constructed using important words of tweets. Further, the pattern of association between number of tweets which are positive or negative and stock prices has been explored. Motivated by such an association, the features related to tweets such as number of positive, negative, neutral tweets and total number of tweets are used to predict the stock market status using Support Vector Machine classifier.
ieee international conference on intelligent systems and control | 2015
Siddhaling Urolagin; Anusha Anigol
Script identification is an important step in success of multilingual OCR with specialized OCR for each script. Language like Kannada has a wide variety of font style and OCR for Kannada should handle all font type. A multi-OCR with specialized recognizer for each font type is most suitable for Kannada script. Font type identification is a key step in such as solution. We have proposed font identification technique using Gabor features on sub image level. Representatives of Gabor feature are formed and a confidence measure based on Euclidean distance is used as closeness measure. A bin is used which keep track of highest confidence occur at word level and based on maximum bin count font type of a document is identified. Experiments are conducted on scanned Kannada document with 100% as font type identification rate at document level.