William Eberle
Tennessee Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by William Eberle.
Knowledge Based Systems | 2013
Chih-Fong Tsai; William Eberle; Chi-Yuan Chu
Feature selection and instance selection are two important data preprocessing steps in data mining, where the former is aimed at removing some irrelevant and/or redundant features from a given dataset and the latter at discarding the faulty data. Genetic algorithms have been widely used for these tasks in related studies. However, these two data preprocessing tasks are generally considered separately in literature. It is unknown what the performance differences would be when feature and instance selection and feature or instance selection are performed individually. Therefore, the aim of this study is to perform feature selection and instance selection based on genetic algorithms using different priorities to examine the classification performances over different domain datasets. The experimental results obtained from four small and large scale datasets containing various numbers of features and data samples show that performing both feature and instance selection usually make the classifiers (i.e., support vector machines and k-nearest neighbor) perform slightly poorer than feature selection or instance selection individually. However, while there is not a significant difference in classification accuracy between these different data preprocessing methods, the combination of feature and instance selection largely reduces the computational effort of training the classifiers, as opposed to performing feature and instance selection individually. Considering both classification effectiveness and efficiency, we demonstrate that performing feature selection first and instance selection second is the optimal solution for data preprocessing in data mining. Both SVM and k-NN classifiers provide similar classification accuracy to the baselines (i.e., those without data preprocessing). The decisions regarding which data preprocessing task to perform for different dataset scales are also discussed.
international conference on data mining | 2007
William Eberle; Lawrence B. Holder
The ability to mine data represented as a graph has become important in several domains for detecting various structural patterns. One important area of data mining is anomaly detection, particularly for fraud, but less work has been done in terms of detecting anomalies in graph-based data. While there has been some work that has used statistical metrics and conditional entropy measurements, the results have been limited to certain types of anomalies and specific domains. In this paper we present graph- based approaches to uncovering anomalies in domains where the anomalies consist of unexpected entity/relationship deviations that resemble non- anomalous behavior. Using synthetic and real-world data, we evaluate the effectiveness of these algorithms at discovering anomalies in a graph-based representation of data.
Journal of Applied Security Research | 2010
William Eberle; Jeffrey A. Graves; Lawrence B. Holder
The authors present the use of graph-based approaches to discovering anomalous instances of structural patterns in data that represent insider threat activity. The approaches presented search for activities that appear to match normal transactions, but in fact are structurally different. The authors show the usefulness of applying graph theoretic approaches to discovering suspicious insider activity in domains such as social network communications, business processes, and cybercrime. The authors present some performance results to show the effectiveness of our approaches, and then conclude with some ongoing research that combines numerical analysis with structure analysis, analyzes multiple normative patterns, and extends to dynamic graphs.
intelligence and security informatics | 2009
William Eberle; Lawrence B. Holder
The ability to mine data represented as a graph has become important in several domains for detecting various structural patterns. One important area of data mining is anomaly detection, but little work has been done in terms of detecting anomalies in graph-based data. In this paper we present graph-based approaches to uncovering anomalies in applications containing information representing possible insider threat activity: e-mail, cell-phone calls, and order processing.
2009 Cybersecurity Applications & Technology Conference for Homeland Security | 2009
William Eberle; Lawrence B. Holder
Protecting our nations cyber infrastructure and securing sensitive information are critical challenges for homeland security and require the research, development and deployment of new technologies that can be transitioned into the field for combating cyber security risks. Particular areas of concern are the deliberate and intended actions associated with malicious exploitation, theft or destruction of data, or the compromise of networks, communications or other IT resources, of which the most harmful and difficult to detect threats are those propagated by an insider. However, current efforts to identify unauthorized access to information, such as what is found in document control and management systems, are limited in scope and capabilities. In order to address this issue, this effort involves performing further research and development on the existing graph-based anomaly detection (GBAD) system. GBAD discovers anomalous instances of structural patterns in data that represent entities, relationships and actions. Input to GBAD is a labeled graph in which entities are represented by labeled vertices and relationships or actions are represented by labeled edges between entities. Using the minimum description length (MDL) principle to identify the normative pattern that minimizes the number of bits needed to describe the input graph after being compressed by the pattern, GBAD implements algorithms for identifying the three possible changes to a graph: modifications, insertions and deletions. Each algorithm discovers those substructures that match the closest to the normative pattern without matching exactly.
2014 IEEE Symposium on Computational Intelligence Applications in Smart Grid (CIASG) | 2014
Vitaly Ford; Ambareen Siraj; William Eberle
Energy fraud detection is a critical aspect of smart grid security and privacy preservation. Machine learning and data mining have been widely used by researchers for extensive intelligent analysis of data to recognize normal patterns of behavior such that deviations can be detected as anomalies. This paper discusses a novel application of a machine learning technique for examining the energy consumption data to report energy fraud using artificial neural networks and smart meter fine-grained data. Our approach achieves a higher energy fraud detection rate than similar works in this field. The proposed technique successfully identifies diverse forms of fraudulent activities resulting from unauthorized energy usage.
cyber security and information intelligence research workshop | 2009
William Eberle; Lawrence B. Holder
This work presents the use of graph-based approaches to discovering anomalous instances of structural patterns in data that represent entities, relationships and actions. Using the minimum description length (MDL) principle to first identify the normative pattern, the algorithms presented in this paper identify the three possible changes to a graph: modifications, insertions and deletions. Each algorithm discovers those substructures that match the closest to the normative pattern without matching exactly. As a result, this proposed approach searches for those activities that appear to match normal (or legitimate) transactions, but in fact are structurally different. After briefly presenting the three algorithms, we then show the usefulness of applying these graph theoretic approaches to discovering illegal activity for a simulated insider threat within a passport processing scenario.
Journal of Systems and Software | 2015
Wei-Chao Lin; Chih-Fong Tsai; Shih-Wen Ke; Chia-Wen Hung; William Eberle
Abstract Instance selection is an important data pre-processing step in the knowledge discovery process. However, the dataset sizes of various domain problems are usually very large, and some are even non-stationary, composed of both old data and a large amount of new data samples. Current algorithms for solving this type of scalability problem have certain limitations, meaning they require a very high computational cost over very large scale datasets during instance selection. To this end, we introduce the ReDD ( Re presentative D ata D etection) approach, which is based on outlier pattern analysis and prediction. First, a machine learning model, or detector, is used to learn the patterns of (un)representative data selected by a specific instance selection method from a small amount of training data. Then, the detector can be used to detect the rest of the large amount of training data, or newly added data. We empirically evaluate ReDD over 50 domain datasets to examine the effectiveness of the learned detector, using four very large scale datasets for validation. The experimental results show that ReDD not only reduces the computational cost nearly two or three times by three baselines, but also maintains the final classification accuracy.
computational intelligence and data mining | 2009
William Eberle; Lawrence B. Holder
Protecting and securing sensitive information are critical challenges for businesses. Deliberate and intended actions such as malicious exploitation, theft or destruction of data, are not only harmful and difficult to detect, but frequently these threats are propagated by an insider. Unfortunately, current efforts to identify unauthorized access to information such as what is found in document control and management systems are limited in scope and capabilities. This paper presents an approach to detecting anomalies in business transactions and processes using a graph representation. In our graph-based anomaly detection (GBAD) approach, anomalous instances of structural patterns are discovered in data that represent entities, relationships and actions. A definition of graph-based anomalies and a brief description of the GBAD algorithms are presented, followed by empirical results using a discrete-event simulation of real-world business transactions and processes.
Archive | 2009
William Eberle; Lawrence B. Holder; Diane J. Cook
Much of the data collected during the monitoring of cyber and other in- frastructures is structural in nature, consisting of various types of entities and rela- tionships between them. The detection of threatening anomalies in such data is crucial to protecting these infrastructures. We present an approach to detecting anomalies in a graph-based representation of such data that explicitly represents these entities and relationships. The approach consists of first finding normative patterns in the data using graph-based data mining and then searching for small, unexpected deviations to these normative patterns, assuming illicit behavior tries to mimic legitimate, normative behavior. The approach is evaluated using several synthetic and real-world datasets. Results show that the approach has high true- positive rates, low false-positive rates, and is capable of detecting complex struc- tural anomalies in real-world domains including email communications, cell- phone calls and network traffic.