Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Thomas M. Kroeger is active.

Publication


Featured researches published by Thomas M. Kroeger.


workshop on hot topics in operating systems | 1999

The case for efficient file access pattern modeling

Thomas M. Kroeger; Darrell D. E. Long

Most modern I/O systems treat each file access independently. However events in a computer system are driven by programs. Thus, accesses to files occur in consistent patterns and are by no means independent. The result is that modern I/O systems ignore useful information. Using traces of file system activity we show that file accesses are strongly correlated with preceding accesses. In fact, a simple last-successor model (one that predicts each file access will be followed by the same file that followed the last time it was accessed) successfully predicted the next file 72% of the time. We examine the ability of two previously proposed models for file access prediction in comparison to this baseline model and see a stark contrast in accuracy and high overheads in state space. We then enhance one of these models to address the issues of model space requirements. This new model is able to improve an additional 10% on the accuracy of the last-successor model, while working within a state space that is within a constant factor (relative to the number of files) of the last successor model. While this work was motivated by the use of file relationships for l/O prefetching, information regarding the likelihood of file access patterns has several other uses such as disk layout and file clustering for disconnected operation.


modeling analysis and simulation on computer and telecommunication systems | 2016

RESAR: Reliable Storage at Exabyte Scale

Thomas J. E. Schwarz; Ahmed Amer; Thomas M. Kroeger; Ethan L. Miller; Darrell D. E. Long; Jehan-Francois Paris

Stored data needs to be protected against device failure and irrecoverable sector read errors, yet doing so at exabyte scale can be challenging given the large number of failures that must be handled. We have developed RESAR (Robust, Efficient, Scalable, Autonomous, Reliable) storage, an approach to storage system redundancy that only uses XOR-based parity and employs a graph to lay out data and parity. The RESAR layout offers greater robustness and higher flexibility for repair at the same overhead as a declustered version of RAID 6. For instance, a RESAR-based layout with 16 data disklets per stripe has about 50 times lower probability of suffering data loss in the presence of a fixed number of failures than a corresponding RAID 6 organization. RESAR uses a layer of virtual storage elements to achieve better manageability, a broader potential for energy savings, as well as easier adoption of heterogeneous storage devices.


symposium on principles of database systems | 2016

Anti-Persistence on Persistent Storage: History-Independent Sparse Tables and Dictionaries

Michael A. Bender; Jonathan W. Berry; Rob Johnson; Thomas M. Kroeger; Samuel McCauley; Cynthia A. Phillips; Bertrand Simon; Shikha Singh; David Zage

We present history-independent alternatives to a B-tree, the primary indexing data structure used in databases. A data structure is history independent (HI) if it is impossible to deduce any information by examining the bit representation of the data structure that is not already available through the API. We show how to build a history-independent cache-oblivious B-tree and a history-independent external-memory skip list. One of the main contributions is a data structure we build on the way---a history-independent packed-memory array (PMA). The PMA supports efficient range queries, one of the most important operations for answering database queries. Our HI PMA matches the asymptotic bounds of prior non-HI packed-memory arrays and sparse tables. Specifically, a PMA maintains a dynamic set of elements in sorted order in a linear-sized array. Inserts and deletes take an amortized O(log2 N) element moves with high probability. Simple experiments with our implementation of HI PMAs corroborate our theoretical analysis. Comparisons to regular PMAs give preliminary indications that the practical cost of adding history-independence is not too large. Our HI cache-oblivious B-tree bounds match those of prior non-HI cache-oblivious B-trees. Searches take O(logB N) I/Os; inserts and deletes take O((log2 N)/B+ logB N) amortized I/Os with high probability; and range queries returning k elements take O(logB N + k/B) I/Os. Our HI external-memory skip list achieves optimal bounds with high probability, analogous to in-memory skip lists: O(logB N) I/Os for point queries and amortized O(logB N) I/Os for inserts/deletes. Range queries returning k elements run in O(logB N + k/B) I/Os. In contrast, the best possible high-probability bounds for inserting into the folklore B-skip list, which promotes elements with probability 1/B, is just Theta(log N) I/Os. This is no better than the bounds one gets from running an in-memory skip list in external memory.


petascale data storage workshop | 2013

Fourier-assisted machine learning of hard disk drive access time models

Adam Crume; Carlos Maltzahn; Lee Ward; Thomas M. Kroeger; Matthew L. Curry; Ron A. Oldfield

Predicting access times is a crucial part of predicting hard disk drive performance. Existing approaches use white-box modeling and require intimate knowledge of the internal layout of the drive, which can take months to extract. Automatically learning this behavior is a much more desirable approach, requiring less expert knowledge, fewer assumptions, and less time. Others have created behavioral models of hard disk drive performance, but none have shown low per-request errors. A barrier to machine learning of access times has been the existence of periodic behavior with high, unknown frequencies. We show how hard disk drive access times can be predicted to within 0:83 ms using a neural net after these frequencies are found using Fourier analysis.


2013 6th International Symposium on Resilient Control Systems (ISRCS) | 2013

The case for distributed data archival using secret splitting with Percival

Thomas M. Kroeger; Joel Cameron Frank; Ethan L. Miller

Most encryption used today obfuscates data behind a secret key or a problem believed to be computationally complex. One can fundamentally think of it as delayed release for a determined adversary. This approach is not well suited for long-term archival of sensitive data. Additionally, issues such as key rotation, and lost or exposed keys, make keeping such archives up to date very difficult. As a result most become static and unable to respond to attacks. Once hacked, such systems offer little to no protection for data privacy and leave open uncertainty about data integrity. Given the increasing frequency of major cyber events these days, it is clear that any secure long-term archive needs to be able to address maintaining data privacy and integrity throughout a compromise event. In spite of these needs, most data archives today still use central storage servers and encryption. In this paper we make the case for secure data archival based on secret splitting and distributed data repositories. We present Percival, one example of a research project focused on long-term data archival using Shamirs secret splitting and distributed data repositories. We examine how this approach can continue secure operations in the presence of adversarial compromise. We discuss how this distributed model significantly increases the attackers burden by requiring the compromise of many sites. Additionally, this approach increases the resilience to insider threat and provides stronger assurances of data integrity and confidentiality. Finally we discuss current research to create new capabilities that enable blinded search across such an archive.


2017 International Conference on Computing, Networking and Communications (ICNC) | 2017

Locally Operated Cooperative Key Sharing (LOCKS)

Michael Bierma; Aaron Brown; Troy DeLano; Thomas M. Kroeger; Howard Poston

Malicious actors are increasingly using TLS to evade deep packet inspection (DPI). In response, vendors and enterprises have turned to man-in-the-middle (MITM) proxies to enable security monitoring of encrypted traffic. This approach not only breaks the end-to-end authentication component of TLS but requires clients to trust a root certificate that allows the proxy to masquerade as any domain. This paper presents Locally Operated Cooperative Key Sharing (LOCKS), a novel system that enables local clients to share their TLS session keys with the enterprise security monitoring system, facilitating DPI without subverting authentication. We tested the performance and impact of our new approach to enterprise communications security. Specifically, we conducted tests on browser latency, user experience, and packet loss at the network security monitors. Latency change was statistically indistinguishable from normal network variation. While the workload of decrypting TLS added overhead to our security monitors, the impact was within manageable limits. Additionally, we deployed LOCKS in a real-world environment and performed initial alpha testing. A user study demonstrated no negative impact on usability.


ieee conference on mass storage systems and technologies | 2015

Percival: A searchable secret-split datastore

Joel Cameron Frank; Shayna M. Frank; Lincoln Thurlow; Thomas M. Kroeger; Ethan L. Miller; Darrell D. E. Long

Maintaining information privacy is challenging when sharing data across a distributed long-term datastore. In such applications, secret splitting the data across independent sites has been shown to be a superior alternative to fixed-key encryption; it improves reliability, reduces the risk of insider threat, and removes the issues surrounding key management. However, the inherent security of such a datastore normally precludes it from being directly searched without reassembling the data; this, however, is neither computationally feasible nor without risk since reassembly introduces a single point of compromise. As a result, the secret-split data must be pre-indexed in some way in order to facilitate searching. Previously, fixed-key encryption has also been used to securely pre-index the data, but in addition to key management issues, it is not well suited for long term applications. To meet these needs, we have developed Percival: a novel system that enables searching a secret-split datastore while maintaining information privacy. We leverage salted hashing, performed within hardware security modules, to access prerecorded queries that have been secret split and stored in a distributed environment; this keeps the bulk of the work on each client, and the data custodians blinded to both the contents of a query as well as its results. Furthermore, Percival does not rely on the datastores exact implementation. The result is a flexible design that can be applied to both new and existing secret-split datastores. When testing Percival on a corpus of approximately one million files, it was found that the average search operation completed in less than one second.


ieee conference on mass storage systems and technologies | 2014

Automatic generation of behavioral hard disk drive access time models

Adam Crume; Carlos Maltzahn; Lee Ward; Thomas M. Kroeger; Matthew L. Curry

Predicting access times is a crucial part of predicting hard disk drive performance. Existing approaches use white-box modeling and require intimate knowledge of the internal layout of the drive, which can take months to extract. Automatically learning this behavior is a much more desirable approach, requiring less expert knowledge, fewer assumptions, and less time. While previous research has created black-box models of hard disk drive performance, none have shown low per-request errors. A barrier to machine learning of access times has been the existence of periodic behavior with high, unknown frequencies. We identify these high frequencies with Fourier analysis and include them explicitly as input to the model. In this paper we focus on the simulation of access times for random read workloads within a single zone. We are able to automatically generate and tune request-level access time models with mean absolute error less than 0.15 ms. To our knowledge this is the first time such a fidelity has been achieved with modern disk drives using machine learning. We are confident that our approach forms the core for automatic generation of access time models that include other workloads and span across entire disk drives, but more work remains.


2012 International Green Computing Conference (IGCC) | 2012

A distributed approach to taming peak demand

Michael Sabolish; Ahmed Amer; Thomas M. Kroeger

A significant portion of all energy capacity is wasted in over-provisioning to meet peak demand. The current state-of-the-art in reducing peak demand requires central authorities to limit device usage directly, and are generally reactive. We apply techniques drawn from established distributed computing principles to propose a novel and proactive solution to decentralize management of demand and to provide a more scalable and resilient approach to reducing overall peak demand. We demonstrate that such a system approaches the performance of an ideal centralized control authority, and experimentally demonstrate a 10-25% reduction in peak energy demand under conservative assumptions. Under worst-case demand scenarios, our approach has the potential to reduce peak demand by 65-85%.


arXiv: Cryptography and Security | 2016

Secure distributed membership tests via secret sharing: How to hide your hostile hosts: Harnessing shamir secret sharing

David John Zage; Helen Xu; Thomas M. Kroeger; Bridger Hahn; Nolan P. Donoghue; Thomas R. Benson

Data security and availability for operational use are frequently seen as conflicting goals. Research on searchable encryption and homomorphic encryption are a start, but they typically build from encryption methods that, at best, provide protections based on problems assumed to be computationally hard. By contrast, data encoding methods such as secret sharing provide information-theoretic data protections. Archives that distribute data using secret sharing can provide data protections that are resilient to malicious insiders, compromised systems, and untrusted components. In this paper, we create the Serial Interpolation Filter, a method for storing and interacting with sets of data that are secured and distributed using secret sharing. We provide the ability to operate over set-oriented data distributed across multiple repositories without exposing the original data. Furthermore, we demonstrate the security of our method under various attacker models and provide protocol extensions to handle colluding attackers. The Serial Interpolation Filter provides information-theoretic protections from a single attacker and computationally hard protections from colluding attackers.

Collaboration


Dive into the Thomas M. Kroeger's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bridger Hahn

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar

Helen Xu

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Nolan P. Donoghue

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar

Adam Crume

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Matthew L. Curry

Sandia National Laboratories

View shared research outputs
Researchain Logo
Decentralizing Knowledge