Joshua Neil
Los Alamos National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joshua Neil.
Journal of Computer Virology and Hacking Techniques | 2011
Blake Anderson; Daniel Quist; Joshua Neil; Curtis B. Storlie; Terran Lane
We introduce a novel malware detection algorithm based on the analysis of graphs constructed from dynamically collected instruction traces of the target executable. These graphs represent Markov chains, where the vertices are the instructions and the transition probabilities are estimated by the data contained in the trace. We use a combination of graph kernels to create a similarity matrix between the instruction trace graphs. The resulting graph kernel measures similarity between graphs on both local and global levels. Finally, the similarity matrix is sent to a support vector machine to perform classification. Our method is particularly appealing because we do not base our classifications on the raw n-gram data, but rather use our data representation to perform classification in graph space. We demonstrate the performance of our algorithm on two classification problems: benign software versus malware, and the Netbull virus with different packers versus other classes of viruses. Our results show a statistically significant improvement over signature-based and other machine learning-based detection methods.
Technometrics | 2013
Joshua Neil; Curtis L. Hash; Alexander William Brugh; Mike Fisk; Curtis B. Storlie
We introduce a computationally scalable method for detecting small anomalous areas in a large, time-dependent computer network, motivated by the challenge of identifying intruders operating inside enterprise-sized computer networks. Time-series of communications between computers are used to detect anomalies, and are modeled using Markov models that capture the bursty, often human-caused behavior that dominates a large subset of the time-series. Anomalies in these time-series are common, and the network intrusions we seek involve coincident anomalies over multiple connected pairs of computers. We show empirically that each time-series is nearly always independent of the time-series of other pairs of communicating computers. This independence is used to build models of normal activity in local areas from the models of the individual time-series, and these local areas are designed to detect the types of intrusions we are interested in. We define a locality statistic calculated by testing for deviations from historic behavior in each local area, and then define a scan statistic as the maximum deviation score over all local areas. We show that identifying these local anomalies is sufficient to correctly identify anomalies of various relevant shapes in the network. Supplementary material, including additional details and simulation code, are provided online.
Computers & Security | 2015
Alexander D. Kent; Lorie M. Liebrock; Joshua Neil
User authentication over the network builds a foundation of trust within large-scale computer networks. The collection of this network authentication activity provides valuable insight into user behavior within an enterprise network. Representing this authentication data as a set of user-specific graphs and graph features, including time-constrained attributes, enables novel and comprehensive analysis opportunities. We show graph-based approaches to user classification and intrusion detection with practical results. We also show a method for assessing network authentication trust risk and cyber attack mitigation within an enterprise network using bipartite authentication graphs. We demonstrate the value of these graph-based approaches on a real-world authentication data set collected from an enterprise network.
signal-image technology and internet-based systems | 2014
Aric Hagberg; Nathan Lemons; Alexander D. Kent; Joshua Neil
Modern enterprise computer networks rely on centrally managed authentication schemes that allow users to easily communicate access credentials to many computer systems and applications. The authentication events typically consist of a user connecting to a computer with an authorized credential. These credentials are often cached on the application servers which creates a risk that they may be stolen and used to hop between computers in the network. We examine computer network risk associated with credential hopping by creating and studying the structure of the authentication graph, a bipartite graph built from authentication events. We assume that an authentication graph with many short paths between computers represents a network that is more vulnerable to such attacks. Under this natural assumption, we use a measure of graph connectivity, namely the size of the largest connected component, to give a quantitative indicator of the networks susceptibility to such attacks. Motivated by graph theoretical results for component sizes in random intersection graphs, we propose a mitigation strategy, and perform experiments simulating an implementation using data from a large enterprise network. The results lead to realistic, actionable risk reduction strategies. To facilitate continued research opportunities we are also providing our authentication bipartite graph data set spanning 9 months and 708 million time-series edge records.
Statistical Analysis and Data Mining | 2015
Joseph Sexton; Curtis B. Storlie; Joshua Neil
A targeted network intrusion typically evolves through multiple phases, termed the attack chain. When appropriate data are monitored, these phases will generate multiple events across the attack chain on a compromised host. It is shown empirically that events in different parts of the attack chain are largely independent under nonattack conditions. This suggests that a powerful detector can be constructed by combining across events spanning the attack. This article describes the development of such a detector for a larger network. To construct events that span the attack chain, multiple data sources are used, and the detector combines across events observed on the same machine, across local neighborhoods of machines linked by network communications, as well as across events observed on multiple computers. A probabilistic approach for evaluating the combined events is developed, and empirical investigations support the underlying assumptions. The detection power of the approach is studied by inserting plausible attack scenarios into observed network and host data, and an application to a real-world intrusion is given.
2013 6th International Symposium on Resilient Control Systems (ISRCS) | 2013
Joshua Neil; Benjamin Uphoff; Curtis Hash; Curtis B. Storlie
This paper focuses on several important topics related to subgraph anomaly detection for computer networks. First, we briefly discuss a graph based view of a computer network consisting of nodes (computers) and edges (time-series of communications between computers), and how stochastic models of groups of edges can be used to identify local anomalous areas of the network indicating the traversal of attackers. Next, the concept of a new edge, an edge between two computers that have never communicated before, is introduced, and a model for establishing the probability of such an event is provided. We follow this with a discussion of exponentially weighted moving averages for updating models of edges. Next, as measuring network data for the purposes of anomaly detection is difficult we discuss a host agent designed specifically to gather this type of data. Finally, the performance of anomaly detection using this host agent to collect data is compared with that of DNS data.
intelligent data analysis | 2014
Melissa J. Turcotte; Nicholas A. Heard; Joshua Neil
Temporal monitoring of computer network data for statistical anomalies provides a means for detecting malicious intruders. The high volumes of traffic typically flowing through these networks can make detecting important changes in structure extremely challenging. In this article, agile algorithms which readily scale to large networks are provided, assuming conditionally independent node and edge-based statistical models. As a first stage, changes in the data streams arising from edges (pairs of hosts) in the network are detected. A second stage analysis combines any anomalous edges to identify more general anomalous substructures in the network. The method is demonstrated on the entire internal computer network of Los Alamos National Laboratory, comprising approximately 50,000 hosts, using a data set which contains a real, sophisticated cyber attack. This attack is quickly identified from amongst the huge volume of data being processed.
2013 6th International Symposium on Resilient Control Systems (ISRCS) | 2013
Joseph Sexton; Curtis B. Storlie; Joshua Neil; Alexander D. Kent
Anomaly based network intruder detection is considered. In particular, we view anomaly detection as a statistical hypothesis testing problem. The null hypothesis associated with each host is that it is acting normally, while the alternative is that the host is acting abnormally. When considered in relation to the network traffic, these host-level hypotheses form a graphically structured hypothesis testing problem. Some network intrusions will form linked regions in this graph where the null hypotheses are false. This will be the case when an intruder traverses the network, or when a coordinated attack is performed targeting the same set of machines. Other network intrusions can lead to multiple unrelated hosts acting abnormally, such as when multiple attackers are acting more or less independently. We consider model based approaches for detecting these different types of disruptions to the network activity. For instance, network traversal is modeled as a random walk through the network stringing together multiple abnormally acting machines. A coordinated attack targeting a single machine is modeled as multiple anomalous hosts connecting to a randomly selected target. The advantage of modeling the attacker patterns is that, under ideal conditions, this defines an optimal detector of the intruders. This optimal detector depends on unknown parameters, and is therefore less attractive for practical use. We describe pragmatic approaches that, in simulations, achieve close to optimal detection rates. The methodology is applied to a real-world network intrusion, clearly identifying the attack.
ieee conference on mass storage systems and technologies | 2001
Joshua Neil
One of the primary mass storage systems in use at the Los Alamos National Laboratory (LANL) is the Common File System, or CFS. CFS went into production in 1979, servicing supercomputer environments, and later was expanded for use with a broader networked workstation environment. It is now used by a very large user population at LANL. It can be used by any employee for storage purposes, and is used by all of the large supercomputers at LANL. CFS is being phased out for the supercomputing environment due to the need for a more scalable mass storage system design. To our benefit, records have been kept for the last seven years of all activity on CFS. A statistical analysis of these records has been performed, to understand how the mass storage system was used over a long period of time. Example usage statistics include maximum and average file sizes, data rates, and bytes moved for each month. Trends and observations about these usage statistics will be presented. The paper will also present some study in the effects of environmental changes and their implications for CFS. An example of an environmental change question is: how does new media technology affect the usage or management of the system? Study of the performance of the storage system over this long period of time will also be presented. Characterization of performance and how migration, environmental factors, and usage-affected data rate performance as well as time-to-first-byte performance is examined. Some conclusions about usage and its effect on planning, design, and operation of mass storage systems will be discussed. It is hoped that the analysis of actual long run usage data of a mass storage system in a demanding supercomputing environment will provide interesting lessons that can be applied to the planning, design, and operation of future mass storage systems.
Archive | 2013
Joshua Neil