Saket S. R. Mengle | Researchain

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Saket S. R. Mengle is active.

Explore More

Publication

Featured researches published by Saket S. R. Mengle.

acm symposium on applied computing | 2008

Using ambiguity measure feature selection algorithm for support vector machine classifier

Saket S. R. Mengle; Nazli Goharian

With the ever-increasing number of documents on the web, digital libraries, news sources, etc., the need of a text classifier that can classify massive amount of data is becoming more critical and difficult. The major problem in text classification is the high dimensionality of feature space. The Support Vector Machine (SVM) classifier is shown to perform consistently better than other text classification algorithms. However, the time taken for training a SVM model is more than other algorithms. We explore the use of the Ambiguity Measure (AM) feature selection method that uses only the most unambiguous keywords to predict the category of a document. Our analysis shows that AM reduces the training time by more than 50% than the scenario when no feature selection is used, while maintaining the accuracy of the text classifier equivalent to or better than using the whole feature set. We empirically show the effectiveness of our approach in outperforming seven different feature selection methods using two standard benchmark datasets.

acm symposium on applied computing | 2008

Discovering relationships among categories using misclassification information

Saket S. R. Mengle; Nazli Goharian; Alana Platt

Knowledge of relationships among categories is of the interest in different domains such as text classification, content analysis, and text mining. We propose and evaluate approaches to effectively identify relationships among document categories. Our proposed novel method capitalizes on the misclassification results of a text classifier to identify potential relationships among categories. We demonstrate that our system detects such relationships, even those relationships that assessors failed to identify in manual evaluation. Furthermore, we favorably compare the effectiveness of our methods with the state of art method and demonstrate a significant improvement in precision (34%) and recall (5%).

intelligence and security informatics | 2007

FACT: Fast Algorithm for Categorizing Text

Saket S. R. Mengle; Nazli Goharian; Alana Platt

With the ever-increasing number of digital documents, the ability to automatically classify those documents both quickly and accurately is becoming more critical and difficult. We present Fast Algorithm for Categorizing Text (FACT), which is a statistical based multi-way classifier with our proposed feature selection, Ambiguity Measure (AM), which uses only the most unambiguous keywords to predict the category of a document. Our empirical results show that FACT outperforms the best results on the best performing feature selection for the Naive Bayes classifier namely, Odds Ratio. We empirically show the effectiveness of our approach in outperforming Odds Ratio using four benchmark datasets with a statistical significance of 99% confidence level. Furthermore, the performance of FACT is comparable or better than current non-statistical based classifiers.

international acm sigir conference on research and development in information retrieval | 2010

Context aware query classification using dynamic query window and relationship net

Nazli Goharian; Saket S. R. Mengle

The context of the user queries, preceding a given query, is utilized to improve the effectiveness of query classification. Earlier efforts utilize fixed number of preceding queries to derive such context information. We propose and evaluate an approach (DQW) that identifies a set of unambiguous preceding queries in a dynamically determined window to utilize in classifying an ambiguous query. Furthermore, utilizing a relationship-net (R-net) that represents relationships among known categories, we improve the classification effectiveness for those ambiguous queries whose predicted category in this relationship-net is related to the category of a query within the window. Our results indicate that the hybrid approach (DQW+R-net) statistically significantly improves the Conditional Random Field (CRF) query classification approach when static query windowing and hierarchical taxonomy are used (SQW+Tax), in terms of precision (10.8%), recall (13.2%), and F1 measure (11.9%).

international acm sigir conference on research and development in information retrieval | 2008

On document splitting in passage detection

Nazli Goharian; Saket S. R. Mengle

Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organization. We explore the methodology to detect such hidden passages within a document. A document is divided into passages using various document splitting techniques, and a text classifier is used to categorize such passages. We present a novel document splitting technique called dynamic windowing, which significantly improves precision, recall and F1 measure.

acm symposium on applied computing | 2010

Mining temporal relationships among categories

Saket S. R. Mengle; Nazli Goharian

Temporal text mining deals with discovering temporal patterns in text over a period of time. A Theme Evolution Graph (TEG) is used to visualize when new themes are created and how they evolve with respect to time. TEG, however, does not represent relationships among themes (or categories) that share same timestamp. We focus on identifying such relationships and represent them in Relationship Evolution Graph (REG). We favorably compare passage misclassification and association rule mining with three existing approaches, namely KL divergence (KLD), Consistent bipartite spectral co-partitioning graph (CBSCG) and document misclassification. Our evaluations indicate that association rule mining approach statistically significantly (99% confidence) outperforms the other existing approaches, while passage misclassification approach is the second most effective approach.

acm symposium on applied computing | 2007

On using user query sequence to detect off-topic search

Alana Platt; Nazli Goharian; Saket S. R. Mengle

Retrieving off-topic documents to a users pre-defined area of interest via a search engine is potentially a violation of access rights and is a concern to every private, commercial, and governmental organization. We improve content-based off-topic search detection approaches by using a sequence of user queries versus the individual queries. In this approach, we reevaluate how off-topic a query is, based on the sequence of queries that preceded it. Our empirical results show that using the information from the queries in a given query window, the false alarm rate is reduced by a statistically significant amount.

international world wide web conferences | 2011

Networked hierarchies for web directories

Nazli Goharian; Saket S. R. Mengle

The hierarchical nature of existing Web directories, ontologies, and folksonomies, are known to provide meaningful information that guide users and applications. We hypothesize that such hierarchical structures provide richer information if they are further enriched by incorporating additional links besides parents, and siblings, namely, between non-sibling nodes. We call such structure a networked hierarchy. Our empirical results indicate that such a networked hierarchy introduces interesting links between nodes (non-sibling) that otherwise in a hierarchical structure are not evident.

acm symposium on applied computing | 2009

Improving classification based off-topic search detection via category relationships

Alana Platt; Saket S. R. Mengle; Nazli Goharian

The illegitimate access of documents by insiders (also known as off-topic search) is an increasingly prevalent and largely ignored problem. We propose an approach that uses text classification for off-topic search detection. Our empirical results indicate that off-topic search detection effectiveness improves by considering only a subset of documents that are retrieved for a given user query. Furthermore, we also show that the effectiveness of off-topic search detection improves by using the ontological information of document categories. Our empirical results demonstrate that utilizing sibling relationship information and relationships derived from misclassification information statistically significantly improves the results over the baseline in most cases.

Journal of the Association for Information Science and Technology | 2009

Ambiguity measure feature-selection algorithm

Saket S. R. Mengle; Nazli Goharian

Explore More

Collaboration

Dive into the Saket S. R. Mengle's collaboration.

Top Co-Authors

Nazli Goharian

Georgetown University

View shared research outputs

Top Co-Authors

Alana Platt

Illinois Institute of Technology

View shared research outputs

Explore More

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot

Dive into the research topics where Saket S. R. Mengle is active.

Publication

Featured researches published by Saket S. R. Mengle.

Using ambiguity measure feature selection algorithm for support vector machine classifier

Discovering relationships among categories using misclassification information

FACT: Fast Algorithm for Categorizing Text

Context aware query classification using dynamic query window and relationship net

On document splitting in passage detection

Mining temporal relationships among categories

On using user query sequence to detect off-topic search

Networked hierarchies for web directories

Improving classification based off-topic search detection via category relationships

Ambiguity measure feature-selection algorithm

Collaboration

Dive into the Saket S. R. Mengle's collaboration.