Sean Whalen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sean Whalen is active.

Explore More

Publication

Featured researches published by Sean Whalen.

Insider Threats in Cyber Security | 2010

A Risk Management Approach to the 'Insider Threat'

Matt Bishop; Sophie Engle; Deborah A. Frincke; Carrie Gates; Frank L. Greitzer; Sean Peisert; Sean Whalen

Recent surveys indicate that the financial impact and operating losses due to insider intrusions are increasing. But these studies often disagree on what constitutes an “insider;” indeed, manydefine it only implicitly. In theory, appropriate selection of, and enforcement of, properly specified security policies should prevent legitimate users from abusing their access to computer systems, information, and other resources. However, even if policies could be expressed precisely, the natural mapping between the natural language expression of a security policy, and the expression of that policyin a form that can be implemented on a computer system or network, createsgaps in enforcement. This paper defines “insider” precisely, in termsof thesegaps, andexploresan access-based modelfor analyzing threats that include those usually termed “insider threats.” This model enables an organization to order its resources based on thebusinessvalue for that resource andof the information it contains. By identifying those users with access to high-value resources, we obtain an ordered list of users who can cause the greatest amount of damage. Concurrently with this, we examine psychological indicators in order to determine which usersareatthe greatestriskofacting inappropriately. We concludebyexamining how to merge this model with one of forensic logging and auditing.

IEEE Transactions on Dependable and Secure Computing | 2012

A Taxonomy of Buffer Overflow Characteristics

Matt Bishop; Sophie Engle; Damien Howard; Sean Whalen

Significant work on vulnerabilities focuses on buffer overflows, in which data exceeding the bounds of an array is loaded into the array. The loading continues past the array boundary, causing variables and state information located adjacent to the array to change. As the process is not programmed to check for these additional changes, the process acts incorrectly. The incorrect action often places the system in a nonsecure state. This work develops a taxonomy of buffer overflow vulnerabilities based upon characteristics, or preconditions that must hold for an exploitable buffer overflow to exist. We analyze several software and hardware countermeasures to validate the approach. We then discuss alternate approaches to ameliorating this vulnerability.

ieee international conference on high performance computing data and analytics | 2012

Network-theoretic classification of parallel computation patterns

Sean Whalen; Sophie Engle; Sean Peisert; Matt Bishop

Parallel computation in a high-performance computing environment can be characterized by the distributed memory access patterns of the underlying algorithm. During execution, networks of compute nodes exchange messages that indirectly exhibit these access patterns. Identifying the algorithm underlying these observable messages is the problem of latent class analysis over information flows in a computational network. Towards this end, our work applies methods from graph and network theory to classify parallel computations solely from network communication patterns. Pattern classification has applications to several areas including anomaly detection, performance analysis, and automated algorithm replacement. We discuss the difficulties encountered by previous efforts, introduce two new approximate matching techniques, and compare these approaches using massive datasets collected at Lawrence Berkeley National Laboratory.

Pattern Recognition Letters | 2013

Multiclass classification of distributed memory parallel computations

Sean Whalen; Sean Peisert; Matt Bishop

High Performance Computing (HPC) is a field concerned with solving large-scale problems in science and engineering. However, the computational infrastructure of HPC systems can also be misused as demonstrated by the recent commoditization of cloud computing resources on the black market. As a first step towards addressing this, we introduce a machine learning approach for classifying distributed parallel computations based on communication patterns between compute nodes. We first provide relevant background on message passing and computational equivalence classes called dwarfs and describe our exploratory data analysis using self organizing maps. We then present our classification results across 29 scientific codes using Bayesian networks and compare their performance against Random Forest classifiers. These models, trained with hundreds of gigabytes of communication logs collected at Lawrence Berkeley National Laboratory, perform well without any a priori information and address several shortcomings of previous approaches.

PLOS Computational Biology | 2012

Structural drift: The population dynamics of sequential learning

James P. Crutchfield; Sean Whalen

We introduce a theory of sequential causal inference in which learners in a chain estimate a structural model from their upstream “teacher” and then pass samples from the model to their downstream “student”. It extends the population dynamics of genetic drift, recasting Kimuras selectively neutral theory as a special case of a generalized drift process using structured populations with memory. We examine the diffusion and fixation properties of several drift processes and propose applications to learning, inference, and evolution. We also demonstrate how the organization of drift process space controls fidelity, facilitates innovations, and leads to information loss in sequential learning with and without memory.

Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop | 2014

Model Aggregation for Distributed Content Anomaly Detection

Sean Whalen; Nathaniel Gordon Boggs; Salvatore J. Stolfo

Cloud computing offers a scalable, low-cost, and resilient platform for critical applications. Securing these applications against attacks targeting unknown vulnerabilities is an unsolved challenge. Network anomaly detection addresses such zero-day attacks by modeling attributes of attack-free application traffic and raising alerts when new traffic deviates from this model. Content anomaly detection (CAD) is a variant of this approach that models the payloads of such traffic instead of higher level attributes. Zero-day attacks then appear as outliers to properly trained CAD sensors. In the past, CAD was unsuited to cloud environments due to the relative overhead of content inspection and the dynamic routing of content paths to geographically diverse sites. We challenge this notion and introduce new methods for efficiently aggregating content models to enable scalable CAD in dynamically-pathed environments such as the cloud. These methods eliminate the need to exchange raw content, drastically reduce network and CPU overhead, and offer varying levels of content privacy. We perform a comparative analysis of our methods using Random Forest, Logistic Regression, and Bloom Filter-based classifiers for operation in the cloud or other distributed settings such as wireless sensor networks. We find that content model aggregation offers statistically significant improvements over non-aggregate models with minimal overhead, and that distributed and non-distributed CAD have statistically indistinguishable performance. Thus, these methods enable the practical deployment of accurate CAD sensors in a distributed attack detection infrastructure.

BMC Bioinformatics | 2017

Unboxing cluster heatmaps

Sophie Engle; Sean Whalen; Alark Joshi; Katherine S. Pollard

BackgroundCluster heatmaps are commonly used in biology and related fields to reveal hierarchical clusters in data matrices. This visualization technique has high data density and reveal clusters better than unordered heatmaps alone. However, cluster heatmaps have known issues making them both time consuming to use and prone to error. We hypothesize that visualization techniques without the rigid grid constraint of cluster heatmaps will perform better at clustering-related tasks.ResultsWe developed an approach to “unbox” the heatmap values and embed them directly in the hierarchical clustering results, allowing us to use standard hierarchical visualization techniques as alternatives to cluster heatmaps. We then tested our hypothesis by conducting a survey of 45 practitioners to determine how cluster heatmaps are used, prototyping alternatives to cluster heatmaps using pair analytics with a computational biologist, and evaluating those alternatives with hour-long interviews of 5 practitioners and an Amazon Mechanical Turk user study with approximately 200 participants. We found statistically significant performance differences for most clustering-related tasks, and in the number of perceived visual clusters. Visit git.io/vw0t3 for our results.ConclusionsThe optimal technique varied by task. However, gapmaps were preferred by the interviewed practitioners and outperformed or performed as well as cluster heatmaps for clustering-related tasks. Gapmaps are similar to cluster heatmaps, but relax the heatmap grid constraints by introducing gaps between rows and/or columns that are not closely clustered. Based on these results, we recommend users adopt gapmaps as an alternative to cluster heatmaps.

bioRxiv | 2018

Most regulatory interactions are not in linkage disequilibrium

Sean Whalen; Katherine S. Pollard

Linkage disequilibrium (LD) and genomic proximity are commonly used to map non-coding variants to genes, despite increasing examples of causal variants outside the LD block of the gene they regulate. We compared chromatin contacts in 22 cell types to LD across billions of pairs of loci in the human genome and found no concordance, even at genomic distances below 25 kilobases where both tend to be high. Gene expression and ontology data suggest that chromatin contacts identify regulatory variants more reliably than do LD and genomic proximity. We conclude that the genomic architectures of genetic and physical interactions are independent, with important implications for gene regulatory evolution and precision medicine.

bioRxiv | 2018

Massively parallel dissection of human accelerated regions in human and chimpanzee neural progenitors

Hane Ryu; Fumitaka Inoue; Sean Whalen; Alex H. Williams; Martin Kircher; Beth Martin; Beatriz Alvarado; Md. Abul Hassan Samee; Kathleen Keough; Sean Thomas; Arnold R. Kriegstein; Jay Shendure; Alex A. Pollen; Nadav Ahituv; Katherine S. Pollard

How mutations in gene regulatory elements lead to evolutionary changes remains largely unknown. Human accelerated regions (HARs) are ideal for exploring this question, because they are associated with human-specific traits and contain multiple human-specific variants at sites conserved across mammals, suggesting that they alter or compensate to preserve function. We performed massively parallel reporter assays on all human and chimpanzee HAR sequences in human and chimpanzee iPSC-derived neural progenitors at two differentiation stages. Forty-three percent (306/714) of HARs function as neuronal enhancers, with two-thirds (204/306) showing consistent changes in activity between human and chimpanzee sequences. These changes were almost all sequence dependent and not affected by cell species or differentiation stage. We tested all evolutionary intermediates between human and chimpanzee sequences of seven HARs, finding variants that interact both positively and negatively. This study shows that variants acquired during human evolution interact to buffer and amplify changes to enhancer function.

bioRxiv | 2015

Protein binding and methylation on looping chromatin accurately predict distal regulatory interactions

Sean Whalen; Rebecca M. Truty; Katherine S. Pollard

Identifying the gene targets of distal regulatory sequences is a challenging problem with the potential to illuminate the causal underpinnings of complex diseases. However, current experimental methods to map enhancer-promoter interactions genome-wide are limited by their cost and complexity. We present TargetFinder, a computational method that reconstructs a cell’s three-dimensional regulatory landscape from two-dimensional genomic features. TargetFinder achieves outstanding predictive accuracy across diverse cell lines with a false discovery rate up to fifteen times smaller than common heuristics, and reveals that distal regulatory interactions are characterized by distinct signatures of protein interactions and epigenetic marks on the DNA loop between an active enhancer and targeted promoter. Much of this signature is shared across cell types, shedding light on the role of chromatin organization in gene regulation and establishing TargetFinder as a method to accurately map long-range regulatory interactions using a small number of easily acquired datasets.

Explore More