Is this you? Create Your Porfile

Venu Dasigi

Southern Polytechnic State University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Venu Dasigi is active.

Explore More

Publication

Featured researches published by Venu Dasigi.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2005

Text Mining Biomedical Literature for Discovering Gene-to-Gene Relationships: A Comparative Study of Algorithms

Ying Liu; Shamkant B. Navathe; Jorge Civera; Venu Dasigi; Ashwin Ram; Brian J. Ciliax; Raymond Dingledine

Partitioning closely related genes into clusters has become an important element of practically all statistical analyses of microarray data. A number of computer algorithms have been developed for this task. Although these algorithms have demonstrated their usefulness for gene clustering, some basic problems remain. This paper describes our work on extracting functional keywords from MEDLINE for a set of genes that are isolated for further study from microarray experiments based on their differential expression patterns. The sharing of functional keywords among genes is used as a basis for clustering in a new approach called BEA-PARTITION in this paper. Functional keywords associated with genes were extracted from MEDLINE abstracts. We modified the Bond Energy Algorithm (BEA), which is widely accepted in psychology and database design but is virtually unknown in bioinformatics, to cluster genes by functional keyword associations. The results showed that BEA-PARTITION and hierarchical clustering algorithm outperformed k\hbox{-}{\rm{means}} clustering and self-organizing map by correctly assigning 25 of 26 genes in a test set of four known gene groups. To evaluate the effectiveness of BEA-PARTITION for clustering genes identified by microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle and have been widely studied in the literature were used as a second test set. Using established measures of cluster quality, the results produced by BEA-PARTITION had higher purity, lower entropy, and higher mutual information than those produced by k\hbox{-}{\rm{means}} and self-organizing map. Whereas BEA-PARTITION and the hierarchical clustering produced similar quality of clusters, BEA-PARTITION provides clear cluster boundaries compared to the hierarchical clustering. BEA-PARTITION is simple to implement and provides a powerful approach to clustering genes or to any clustering problem where starting matrices are available from experimental observations.

computational systems bioinformatics | 2004

Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering

Ying Liu; Brian J. Ciliax; Karin Borges; Venu Dasigi; Ashwin Ram; Shamkant B. Navathe; Raymond Dingledine

One of the key challenges of microarray studies is to derive biological insights from the unprecedented quantities of data on gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the nature of the functional links among genes within the derived clusters. However, the quality of the keyword lists extracted from biomedical literature for each gene significantly affects the clustering results. We extracted keywords from MEDLINE that describe the most prominent functions of the genes, and used the resulting weights of the keywords as feature vectors for gene clustering. By analyzing the resulting cluster quality, we compared two keyword weighting schemes: normalized z-score and term frequency-inverse document frequency (TFIDF). The best combination of background comparison set, stop list and stemming algorithm was selected based on precision and recall metrics. In a test set of four known gene groups, a hierarchical algorithm correctly assigned 25 of 26 genes to the appropriate clusters based on keywords extracted by the TDFIDF weighting scheme, but only 23 of 26 with the z-score method. To evaluate the effectiveness of the weighting schemes for keyword extraction for gene clusters from microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle were used as a second test set. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords had higher purity, lower entropy, and higher mutual information than those produced from normalized z-score weighted keywords. The optimized algorithms should be useful for sorting genes from microarray lists into functionally discrete clusters.

technical symposium on computer science education | 2001

Striving for mathematical thinking

Peter B. Henderson; Doug Baldwin; Venu Dasigi; Marcel Dupras; Jane M. Fritz; David Ginat; Don Goelman; John Hamer; Lewis E. Hitchner; Will Lloyd; Bill Marion; Charles Riedesel; Henry M. Walker

Computer science and software engineering are young, maturing disciplines. As with other mathematically based disciplines, such as the natural sciences, economics, and engineering, it takes time for the mathematical roots to grow and flourish. For computer science and software engineering, others have planted these seeds over many years, and it is our duty to nurture them. This working group is dedicated to promoting mathematics as an important tool for problem-solving and conceptual understanding in computing.

european conference on information retrieval | 2002

Text Categorization: An Experiment Using Phrases

Madhusudhan Kongovi; Juan Carlos Guzmán; Venu Dasigi

Typical text classifiers learn from example and training documents that have been manually categorized. In this research, our experiment dealt with the classification of news wire articles using category profiles. We built these profiles by selecting feature words and phrases from the training documents. For our experiments we decided on using the text corpus Reuters-21578. We used precision and recall to measure the effectiveness of our classifier. Though our experiments with words yielded good results, we found instances where the phrase-based approach produced more effectiveness. This could be due to the fact that when a word along with its adjoining word - a phrase - is considered towards building a category profile, it could be a good discriminator. This tight packaging of word pairs could bring in some semantic value. The packing of word pairs also filters out words occurring frequently in isolation that do not bear much weight towards characterizing that category.

data mining in bioinformatics | 2006

Text analysis of MEDLINE for discovering functional relationships among genes: evaluation of keyword extraction weighting schemes

Ying Liu; Shamkant B. Navathe; Alex Pivoshenko; Venu Dasigi; Raymond Dingledine; Brian J. Ciliax

One of the key challenges of microarray studies is to derive biological insights from the gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the functional links among genes. However, the quality of the keyword lists significantly affects the clustering results. We compared two keyword weighting schemes: normalised z-score and term frequency-inverse document frequency (TFIDF). Two gene sets were tested to evaluate the effectiveness of the weighting schemes for keyword extraction for gene clustering. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords outperformed those produced from normalised z-score weighted keywords. The optimised algorithms should be useful for partitioning genes from microarray lists into functionally discrete clusters.

Information Technology | 1998

Information fusion experiments for text classification

Venu Dasigi

We summarize our experiments and results in employing information fusion for automatic classification of free text documents into a given number of categories. We try to characterize this information fusion work in terms of the Joint Directors of Laboratories scheme. The text used in the experiments is taken from the Reuters-22173 collection, which not only comes pre-analyzed, but facilitates training of the neural networks, as well as evaluation of the classification decisions. We use different kinds of feature extractors to derive information from documents, and use neural networks for both learning and fusion. We compare the effectiveness of individual feature extractors in classifying the text with that of information fusion from different interesting combinations of feature extractors. The results indicate that information fusion almost always performs better than the individual feature extractors, and certain combinations seem to do better than the others. Additional parameters can have varying degrees of effectiveness, and remain to be investigated.

national aerospace and electronics conference | 1991

On the relationship between parsimonious covering and Boolean minimization

Venu Dasigi; Krishnaprasad Thirunarayan

The authors explain some of the relationships of the Boolean minimization problem (BMP) to a formalization of abductive inference called parsimonious covering (PC). Abductive inference often occurs in diagnostic problems such as finding the causes of circuit faults or determining the disease causing the symptoms reported by a patient. Parsimonious covering involves covering all observed facts by means of a parsimonious set of explanations that can account for the observation. It is shown that only the prime implicants of a given Boolean function in a BMP, rather than any general product terms, are considered analogous to disorders in a PC problem.<<ETX>>

data and text mining in bioinformatics | 2009

LITSEEK: public health literature search by metadata enhancement with external knowledge bases

Priyanka Sharad Prabhu; Shamkant B. Navathe; Stephen Tyler; Venu Dasigi; Neha Narkhede; Balaji Palanisamy

Biomedical literature is an important source of information in any researchers investigation of genes, risk factors, diseases and drugs. Often the information searched by public health researchers is distributed across multiple disparate sources that may include publications from PubMed, genomic, proteomic and pathway databases, gene expression and clinical resources and biomedical ontologies. The unstructured nature of this information makes it difficult to find relevant parts from it manually and comprehensive knowledge is further difficult to synthesize automatically. In this paper we report on LITSEEK (LITerature Search by metadata Enhancement with External Knowledgebases), a system we have developed for the benefit of researchers at the Centers for Disease Control (CDC) to enable them to search the HuGE (Human Genome for Epidemiology) database of PubMed articles, from a pharmacogenomic perspective. Besides analyzing text using TFIDF ranking and indexing of the important terms, the proposed system incorporates an automatic consultation with PharmGKB - a human-curated knowledge base about drugs, related diseases and genes, as well as with the Gene Ontology, a human-curated, well accepted ontology. We highlight the main components of our approach and illustrate how the search is enhanced by incorporating additional concepts in terms of genes/drugs/diseases (called metadata for ease of reference) from PharmGKB. Various measurements are reported with respect to the addition of these metadata terms. Preliminary results in terms of precision based on expert user feedback from CDC are encouraging. Further evaluation of the search procedure by actual researchers is under way.

acm symposium on applied computing | 1998

An experiment in medical information retrieval

Venu Dasigi

This work represents a feasibility study of effective conceptual information retrieval for medical / health care information. We focused on the OHSUMED collection because it supports the algorithms chosen in the study. Much of the focus has been on adopting the method of combining Latent Semantic Indexing (LSI) [3] with neural network learning, which showed substantial effectiveness in the task of classification. A related technique, that of using LSI in a straight-forward way has also been studied. Our conclusion is that the learning approach, which worked well for classification, has not done significantly better than randomly, because of inadequate training data. However, the LSI method has done better than expected, and better than many other algorithms. The precision obtained using the LSI method is in the neighborhood of 40%. We conclude with a few lessons learned and future research directions.

International Journal of Intelligent Systems | 1994

Logical Form Generation as Abduction - Part I. Representation of Linguistic Concepts

Venu Dasigi

For some time, researchers have become increasingly aware that some aspects of natural language processing can be viewed as abductive inference. This article describes knowledge representation in dual‐route parsimonious covering theory, based on an existing diagnostic abductive inference model, extended to address issues specific to logic form generation. the two routes of covering deal with syntactic and semantic aspects of language, and are integrated by attributing both syntactic and semantic facets to each “open class” concept. Such extensions reflect some fundamental differences between the two task domains. the syntactic aspect of covering is described to show the differences, and some interesting properties are established. the semantic associations are characterized in terms of how they can be used in an abductive model. A major significance of this work is that it paves the way for a nondeductive inference method for word sense disambiguation and logical form generation, exploiting the associative linguistic knowledge. This approach sharply contrasts with others, where knowledge has usually been laboriously encoded into pattern‐action rules that are hard to modify. Further, this work represents yet another application for the general principle of parsimonious covering.

Explore More