T. Sobha Rani
University of Hyderabad
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by T. Sobha Rani.
Bioinformatics | 2007
T. Sobha Rani; S. Durga Bhavani; Raju S. Bapi
MOTIVATION Patterns in the promoter sequences within a species are known to be conserved but there exist many exceptions to this rule which makes the promoter recognition a complex problem. Although many complex feature extraction schemes coupled with several classifiers have been proposed for promoter recognition in the current literature, the problem is still open. RESULTS A dinucleotide global feature extraction method is proposed for the recognition of sigma-70 promoters in Escherichia coli in this article. The positive data set consists of sigma-70 promoters with known transcription starting points which are part of regulonDB and promec databases. Four different kinds of negative data sets are considered, two of them biological sets (Gordon et al., 2003) and the other two synthetic data sets. Our results reveal that a single-layer perceptron using dinucleotide features is able to achieve an accuracy of 80% against a background of biological non-promoters and 96% for random data sets. A scheme for locating the promoter regions in a given genome sequence is proposed. A deeper analysis of the data set shows that there is a bifurcation of the data set into two distinct classes, a majority class and a minority class. Our results point out that majority class constituting the majority promoter and the majority non-promoter signal is linearly separable. Also the minority class is linearly separable. We further show that the feature extraction and classification methods proposed in the paper are generic enough to be applied to the more complex problem of eucaryotic promoter recognition. We present Drosophila promoter recognition as a case study. AVAILABILITY http://202.41.85.117/htmfiles/faculty/tsr/tsr.html.
in Silico Biology | 2009
T. Sobha Rani; Raju S. Bapi
Promoter prediction is an important and complex problem. Pattern recognition algorithms typically require features that could capture this complexity. A special bias towards certain combinations of base pairs in the promoter sequences may be possible. In order to determine these biases n-grams are usually extracted and analyzed. An n-gram is a selection of n contiguous characters from a given character stream, DNA sequence segments in this case. Here a systematic study is made to discover the efficacy of n-grams for n = 2, 3, 4, 5 in promoter prediction. A study of n-grams as features for a neural network classifier for E. coli and Drosophila promoters is made. In case of E. coli n=3 and in case of Drosophila n=4 seem to give optimal prediction values. Using the 3-gram features, promoter prediction in the genome sequence of E. coli is done. The results are encouraging in positive identification of promoters in the genome compared to software packages such as BPROM, NNPP, and SAK. Whole genome promoter prediction in Drosophila genome was also performed but with 4-gram features.
international conference on intelligent sensing and information processing | 2004
S. Sharma; V. Kumar; T. Sobha Rani; S. Durga Bhavani; S. Bapi Raju
Protein sequence classification is modelled as a binary classification problem where an unlabeled protein sequence is checked to see if it belongs to a known set of protein superfamilies or not. In this paper we used multilayer perceptrons with supervised learning algorithm to learn the binary classification. The training data consists of two sets-a positive set belonging to an identified set of protein superfamily and a negative set comprising sequences from other superfamilies. When applying neural networks the first problem to be addressed is feature extraction. In this paper we used the new feature extraction techniques proposed by Wang et al. Simulations reveal that the neural network is able to classify with good precision for myosin and photochrome superfamilies in the data set that we have chosen as positive. Also the results for globin superfamily are good, thus validating the methodology of feature extraction and the application of neural networks for protein sequence classification as suggested by Wang et al. But, for Actin and Ribonuclease superfamilies the network showed poor performance. One possible reason for this may be that the choice of sequences in the negative data set is not optimal. We conclude from this work that the classification performance depends upon a proper selection of sequences for positive and negative data sets.
international conference information processing | 2011
A Sankara Rao; S. Durga Bhavani; T. Sobha Rani; Raju S. Bapi; G. Narahari Sastry
ZINC is a freely available chemical database which contains 27 million compounds including Drug-like, Natural Products, FDA etc., along with 9 molecular features. In this paper firstly we compute an additional number of 49 molecular features and represent the entire chemical space in the 58-length finger print space. Tanimoto metric, a popular similarity measure is used to mine the chemical space for extracting similar and diverse fingerprints. One of the important issues is that of choosing a proper reference string. Experiments with different reference strings are carried out to assess the appropriateness of a reference string. A finger print which is constituted by mandating non-trivial presence of each feature is found to be the best. Further a method which is independent of reference string is proposed using pairwise distribution but this raises the time complexity from linear to quadratic. A subgoal of this paper is also to propose a scheme that extracts a small sample data set that reflects the similarity and diversity of the population. Towards this, we conduct stratified sampling of Natural Products Database(NPD) which has 90,000 chemical compounds by dividing the space along strata representing distinct structures (rings) and then compute pairwise similarity profile. This scheme can be extended to other data bases that reside in ZINC.
international conference on contemporary computing | 2013
T. Sobha Rani; P. V. Soujanya
Nearly all aspects of cell life and death are controlled by the phosphorylation of proteins which are catalyzed by kinases. Malfunctioning of kinases results in cell disorders causing cancers and other diseases. The present study deals with the identification of predominant features present in the inhibitors targeting these enzymes and classification of the kinase and non-kinase inhibitors using machine learning algorithms. The present work deals with two challenges. The first challenge is the classification of unbalanced data sets. Unbalanced data sets are the data sets in which there is an imbalance in the size of data sets that constitute these sets. The second challenge is the concept complexity (closely related minority and majority data sets in the feature space). Our approach deals with the binary classification of approved human inhibitors present in the Drug bank database into kinase and non-kinase inhibitors. Clustering of the inhibitors followed by classification using an ensemble consisting of several classification models is generated. Classification is done in two levels. Weighted voting is used after each level. Finally an overall accuracy of 80% is obtained after two levels of classification. Thus we established a new a type of approach for the classification of unbalanced data sets and the data sets in which there is an overlap between instances belonging to dierent classes. Finally we established a signature specific to kinase inhibitors.
international conference on contemporary computing | 2012
A. Poorna Chandrasekhar; T. Sobha Rani
Storing and querying are two important issues that need to be addressed while designing an information retrieval system for a large and high-dimensional data set. In this work, we discuss about tackling such data, specifically about the nearest neighbour search and the efficient storage layout to store such data. The data set used in the current work has been taken from an online source called ZINC, a repository for drug like chemical structures. Processing a high dimensional data is a tough task hence dimensionality reduction should be employed. Here for dimensionality reduction is achieved through a filter-based feature selection method, based on correlation fractal dimension (CFD) discrimination measure, is used. The number of dimensions using the correlation fractal dimension are reduced from 58 to 7. To identify the nearest neighbours for a given chemical structure Tanimoto similarity coefficient is used with these reduced set of features. The nearest neighbours identified using the Tanimoto measure are stored in a storage layout known as modified inverted file. Nearest neighbours for a query can be retrieved back from the storage layout, with just one read operation from the data file thereby reducing the time for retrieval.
international conference on distributed computing and internet technology | 2018
Y. Divya Brahmani; T. Sobha Rani; S. Durga Bhavani
Interactions between proteins in a cell can be modeled as a graphical network. The problem addressed in this paper is to model the network evolution in biological networks in order to understand the underlying mechanism that morphs a normal cell into a disease (cancer) cell. In this paper, concepts from social networks are utilized for this purpose. Though many models for network evolution exist in the literature, they have not been applied in the context of evolution of normal cell into a disease state. In this work, target network is evolved in two ways: (i) starting from common subgraph of the normal and cancer networks and (ii) using a divide and conquer approach, the network is grown from communities using preferential attachment models. Triadic model yields good performance with respect to the global characteristics, but actual edge prediction performance is very low when applied on the entire network. In the case of community approach, the results of edge prediction for two dense communities are satisfactory with precision of 62% and recall 62%. Since edge prediction is a challenging problem, the approach needs to be refined further so that it works for small and sparse communities as well before it can become a full-fledged algorithm.
Archive | 2015
V. Hema Madhuri; T. Sobha Rani
Organizing and searching the data tries to detect groups where objects exhibit similar properties. As the dimensionality d increases, the space in which data is represented increases rapidly therefore the available data becomes sparse. When d is high, all objects appear to be sparse and dissimilar in many ways. Here, a study is made to reduce the number of dimensions using biclustering method to rank the features/dimensions. Classification rate is used as validation criteria for the selection of appropriate dimensions. Ranking algorithms such as Relief F, Symmetrical uncertainty and Information gain are compared with the proposed ranking using biclustering. It is found that for large data sets with large number of dimensions, ranking using biclustering achieves classification rates with less number of features than the other ranking algorithms.
international conference on contemporary computing | 2014
D Sandhya Rani; T. Sobha Rani; S. Durga Bhavani
Dimensionality reduction continues to be a challenging problem with huge amounts of data being generated in the domains of bio-informatics, social networks etc. We propose a novel dimensionality reduction algorithm based on the idea of consensus clustering using genetic algorithms. Classification is used as validation and the algorithm is evaluated on benchmark data sets of dimensionality ranging from 8 to 617 features. The results are on par with the latest approaches proposed in the literature.
international conference on contemporary computing | 2012
P. Manasa; M. R. Prasad; T. Sobha Rani
Tremendous growth in traffic is witnessed over the Internet where backbone links of several gigabits per second are commonly deployed. In order to handle these gigabit-per-second traffic rates, backbone routers must forward millions of packets per second on each of their ports. Routing tables of the core routers consists of IP addresses of the order of 200,000-500,000 and changes dynamically. A major challenge is to determine the next-hop address with as low as possible number of accesses of the routing table. IP address lookup in the routers uses the packets destination address to determine the next hop for each packet and is therefore crucial to achieve the required packet forwarding rates. IP address lookup is difficult because it requires a longest common prefix (LCP) match search. In the last couple of years, various algorithms for high-performance IP address lookup have been proposed. The objective of this paper is to use a specific data structure and develop the lookup algorithm that is required to meet the demands like fast lookup, memory efficiency and fast incremental updates. We have used a novel data structure y-fast trie for the routing table in this work. We adapted the algorithm for predecessor/successor search in x-fast trie via dynamic perfect hashing technique to find the longest common prefix between the incoming packets destination address and the next-hop address. By looking at this longest common prefix, we identify the next-hop address. As an improvement over this method, we also have used indirection using balanced BSTs (y-fast trie). On average the routing table creation takes 51703 μsec for 100000 IP addresses in the method using indirection. Average lookup time using dynamic perfect table takes 0.83 μsec.