April Kontostathis
Ursinus College
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by April Kontostathis.
Information Processing and Management | 2006
April Kontostathis; William M. Pottenger
In this paper we present a theoretical model for understanding the performance of Latent Semantic Indexing (LSI) search and retrieval application. Many models for understanding LSI have been proposed. Ours is the first to study the values produced by LSI in the term by dimension vectors. The framework presented here is based on term co-occurrence data. We show a strong correlation between second-order term co-occurrence and the values produced by the Singular Value Decomposition (SVD) algorithm that forms the foundation for LSI. We also present a mathematical proof that the SVD algorithm encapsulates term co-occurrence information.
Archive | 2004
April Kontostathis; Leon M. Galitsky; William M. Pottenger; Soma Roy; Daniel J. Phelps
In this chapter we describe several systems that detect emerging trends in textual data. Some of the systems are semiautomatic, requiring user input to begin processing, and others are fully automatic, producing output from the input corpus without guidance. For each Emerging Trend Detection (ETD) system we describe components including linguistic and statistical features, learning algorithms, training and test set generation, visualization, and evaluation. We also provide a brief overview of several commercial products with capabilities of detecting trends in textual data, followed by an industrial viewpoint describing the importance of trend detection tools, and an overview of how such tools are used.
international conference on machine learning and applications | 2011
Kelly Reynolds; April Kontostathis; Lynne Edwards
Cyber bullying is the use of technology as a medium to bully someone. Although it has been an issue for many years, the recognition of its impact on young people has recently increased. Social networking sites provide a fertile medium for bullies, and teens and young adults who use these sites are vulnerable to attacks. Through machine learning, we can detect language patterns used by bullies and their victims, and develop rules to automatically detect cyber bullying content. The data we used for our project was collected from the website Formspring.me, a question-and-answer formatted website that contains a high percentage of bullying content. The data was labeled using a web service, Amazons Mechanical Turk. We used the labeled data, in conjunction with machine learning techniques provided by the Weka tool kit, to train a computer to recognize bullying content. Both a C4.5 decision tree learner and an instance-based learner were able to identify the true positives with 78.5% accuracy.
International Journal of Electronic Commerce | 2011
India McGhee; Jennifer Bayzick; April Kontostathis; Lynne Edwards; Alexandra McBride; Emma Jakubowski
This work integrates communication theories and computer science algorithms to create a program that can detect the occurrence of sexual predation in an online social setting. Although much work has discussed social media in general, this particular aspect of online social interaction remains largely unexplored. In previous work we developed phrase-matching and rule-based approaches to classify and label lines of chat logs. In the current work we expand these techniques and use machine learning algorithms to classify posts. Our machine learning system leveraged the phrase-matching and rule-based systems to identify appropriate attributes for our supervised learning algorithms. Our machine learning experiments confirmed that the rules we developed are adequate to identify the coding rules. Neither decision trees nor instance-based learning algorithms were able to significantly improve upon the 68 percent accuracy we were able to achieve using the rule-based methods employed by a software program called ChatCoder 2, as described here.
web science | 2013
April Kontostathis; Kelly Reynolds; Andy Garron; Lynne Edwards
In this paper we describe a close analysis of the language used in cyberbullying. We take as our corpus a collection of posts from Formspring.me. Formspring.me is a social networking site where users can ask questions of other users. It appeals primarily to teens and young adults and the cyberbullying content on the site is dense; between 7% and 14% of the posts we have analyzed contain cyberbullying content. The results presented in this article are two-fold. Our first experiments were designed to develop an understanding of both the specific words that are used by cyberbullies, and the context surrounding these words. We have identified the most commonly used cyberbullying terms, and have developed queries that can be used to detect cyberbullying content. Five of our queries achieve an average precision of 91.25% at rank 100. In our second set of experiments we extended this work by using a supervised machine learning approach for detecting cyberbullying. The machine learning experiments identify additional terms that are consistent with cyberbullying content, and identified an additional querying technique that was able to accurately assign scores to posts from Formspring.me. The posts with the highest scores are shown to have a high density of cyberbullying content.
Text Mining: Applications and Theory | 2010
April Kontostathis; Lynne Edwards; Amanda Leatherman
This chapter describes the state of technology for studying Internet crimes against children, specifically sexual predation and cyberbullying. We begin by presenting a survey of relevant research articles that are related to the study of cybercrime. This survey includes a discussion of our work on the classification of chat logs that contain bullying or predatory behavior. Many commercial enterprises have developed parental control software to monitor these behaviors, and the latest version of some of these tools provides features that profess to protect children against predators and bullies. The chapter concludes with a discussion of these products and offers suggestions for continued research in this interesting and timely sub-field of text mining.
hawaii international conference on system sciences | 2007
April Kontostathis
Latent semantic indexing (LSI) is commonly used to match queries to documents in information retrieval applications. LSI has been shown to improve retrieval performance for some, but not all, collections, when compared to traditional vector space retrieval. In this paper, we first develop a model for understanding which values in the reduced dimensional space contain the term relationship (latent semantic) information. We then test this model by developing a modified version of LSI that captures this information, essential dimensions of LSI (EDLSI). EDLSI significantly improves retrieval performance on corpora that previously did not benefit from LSI, and offers improved runtime performance when compared with traditional LSI. Traditional LSI requires the use of a dimensionality reduction parameter which must be tuned for each collection. Applying our model, we have also shown that a small, fixed dimensionality reduction parameter (k=10) can be used to capture the term relationship information in a corpus
International Journal on Artificial Intelligence Tools | 2004
Lars E. Holzman; Todd A. Fisher; Leon M. Galitsky; April Kontostathis; William M. Pottenger
Few tools exist that address the challenges facing researchers in the Textual Data Mining (TDM) field. Some are too specific to their application, or are prototypes not suitable for general use. More general tools often are not capable of processing large volumes of data. We have created a Textual Data Mining Infrastructure (TMI) that incorporates both existing and new capabilities in a reusable framework conducive to developing new tools and components. TMI adheres to strict guidelines that allow it to run in a wide range of processing environments – as a result, it accommodates the volume of computing and diversity of research occurring in TDM. A unique capability of TMI is support for optimization. This facilitates text mining research by automating the search for optimal parameters in text mining algorithms. In this article we describe a number of applications that use the TMI. A brief tutorial is provided on the use of TMI. We present several novel results that have not been published elsewhere. We also discuss how the TMI utilizes existing machine-learning libraries, thereby enabling researchers to continue and extend their endeavors with minimal effort. Towards that end, TMI is available on the web at .
international conference on tools with artificial intelligence | 2003
Lars E. Holzman; Todd A. Fisher; Leon M. Galitsky; April Kontostathis; William M. Pottenger
Few tools exist that address the challenges facing researchers in the textual data mining (TDM) field. Some are too specific to their application, or are prototypes not suitable for general use. More general tools often are not capable of processing large volumes of data. We have created a textual data mining infrastructure (TMI) that incorporates both existing and new capabilities in a reusable framework conductive to developing new tools and components. TMI adheres to strict guidelines that allow it to run in a wide range of processing environments - as a result, it accommodates the volume of computing and diversity of research occurring in TDM. A unique capability of TMI is support for optimization. This facilitates text mining research by automating the search for optimal parameters in text mining algorithms. In this article we describe a number of applications that use the TMI. We present several novel results that have not been published elsewhere. We also discuss how the TMI utilizes existing machine-learning libraries, thereby enabling researchers to continue and extend their endeavors with minimal effort. Towards that end, TMI is available on the web at hddi.cse.lehigh.edu.
Information Processing and Management | 2003
April Kontostathis; William M. Pottenger