Craig D. Ulmer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Craig D. Ulmer is active.

Explore More

Publication

Featured researches published by Craig D. Ulmer.

IEEE Computer | 2008

Hardware Technologies for High-Performance Data-Intensive Computing

Maya Gokhale; Jonathan D. Cohen; Andy Yoo; William Marcus Miller; Arpith C. Jacob; Craig D. Ulmer; Roger A. Pearce

Data-intensive problems challenge conventional computing architectures with demanding CPU, memory, and I/O requirements. Experiments with three benchmarks suggest that emerging hardware technologies can significantly boost performance of a wide range of applications by increasing compute cycles and bandwidth and reducing latency.

conference on high performance computing (supercomputing) | 2006

Architectures and APIs: assessing requirements for delivering FPGA performance to applications

Keith D. Underwood; Karl Scott Hemmert; Craig D. Ulmer

Reconfigurable computing leveraging field programmable gate arrays (FPGAs) is one of many accelerator technologies that are being investigated for application to high performance computing (HPC). Like most accelerators, FPGAs are very efficient at both dense matrix multiplication and FFT computations, but two important aspects of how to deliver that performance to applications have received too little attention. First, the standard API for important compute kernels hides parallelism from the system. Second, the issue of system architecture is virtually never addressed. This paper explores both issues and their implications for applications. We find that high bandwidth, low latency connectivity can be important, but the right API can be even more important

applied reconfigurable computing | 2006

An FPGA-based network intrusion detection system with on-chip network interfaces

Christopher R. Clark; Craig D. Ulmer; David E. Schimmel

Network intrusion detection systems (NIDS) are critical network security tools that help protect computer installations from malicious users. Traditional software-based NIDS architectures are becoming strained as network data rates increase and attacks intensify in volume and complexity. In recent years, researchers have proposed using FPGAs to perform the computationally-intensive components of intrusion detection analysis. In this work, we present a new NIDS architecture that integrates the network interface hardware and packet analysis hardware into a single FPGA chip. This integration enables a higher performance and more flexible NIDS platform. To demonstrate the benefits of this technique, we have implemented a complete and functional NIDS in a Xilinx Virtex II Pro FPGA that performs in-line packet analysis and filtering on multiple Gigabit Ethernet links using rules from the open-source Snort attack database.

Journal of Parallel and Distributed Computing | 2011

Massively parallel acceleration of a document-similarity classifier to detect web attacks

Craig D. Ulmer; Maya Gokhale; Brian Gallagher; Philip Top; Tina Eliassi-Rad

This paper describes our approach to adapting a text document similarity classifier based on the Term Frequency Inverse Document Frequency (TFIDF) metric to two massively multi-core hardware platforms. The TFIDF classifier is used to detect web attacks in HTTP data. In our parallel hardware approaches, we design streaming, real time classifiers by simplifying the sequential algorithm and manipulating the classifiers model to allow decision information to be represented compactly. Parallel implementations on the Tilera 64-core System on Chip and the Xilinx Virtex 5-LX FPGA are presented. For the Tilera, we employ a reduced state machine to recognize dictionary terms without requiring explicit tokenization, and achieve throughput of 37 MB/s at a slightly reduced accuracy. For the FPGA, we have developed a set of software tools to help automate the process of converting training data to synthesizable hardware and to provide a means of trading off between accuracy and resource utilization. The Xilinx Virtex 5-LX implementation requires 0.2% of the memory used by the original algorithm. At 166 MB/s (80X the software) the hardware implementation is able to achieve Gigabit network throughput at the same accuracy as the original algorithm.

ACM Transactions on Reconfigurable Technology and Systems | 2009

From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing

Keith D. Underwood; K. Scott Hemmert; Craig D. Ulmer

The field of high performance computing (HPC) currently abounds with excitement about the potential of a broad class of things called accelerators. And, yet, few accelerator based systems are being deployed in general purpose HPC environments. Why is that? This article explores the challenges that accelerators face in the HPC world, with a specific focus on FPGA based systems. We begin with an overview of the characteristics and challenges of typical HPC systems and applications and discuss why FPGAs have the potential to have a significant impact. The bulk of the article is focused on twelve specific areas where FPGA researchers can make contributions to hasten the adoption of FPGAs in HPC environments.

ieee international symposium on parallel distributed processing workshops and phd forum | 2010

A configurable-hardware document-similarity classifier to detect web attacks

Craig D. Ulmer; Maya Gokhale

This paper describes our approach to adapting a text document similarity classifier based on the Term Frequency Inverse Document Frequency (TFIDF) metric [11] to reconfigurable hardware. The TFIDF classifier is used to detect web attacks in HTTP data. In our reconfigurable hardware approach, we design a streaming, real-time classifier by simplifying an existing sequential algorithm and manipulating the classifiers model to allow decision information to be represented compactly. We have developed a set of software tools to help automate the process of converting training data to synthesizable hardware and to provide a means of trading off between accuracy and resource utilization. The Xilinx Virtex 5-LX implementation requires two orders of magnitude less memory than the original algorithm. At 166MB/s (80X the software) the hardware implementation is able to achieve Gigabit network throughput at the same accuracy as the original algorithm.

scientific cloud computing | 2018

Faodel: Data Management for Next-Generation Application Workflows

Craig D. Ulmer; Shyamali Mukherjee; Gary Templet; Scott Levy; Jay F. Lofstead; Patrick M. Widener; Todd Kordenbrock; Margaret Lawson

Composition of computational science applications, whether into ad hoc pipelines for analysis of simulation data or into well-defined and repeatable workflows, is becoming commonplace. In order to scale well as projected system and data sizes increase, developers will have to address a number of looming challenges. Increased contention for parallel filesystem bandwidth, accomodating in situ and ex situ processing, and the advent of decentralized programming models will all complicate application composition for next-generation systems. In this paper, we introduce a set of data services, Faodel, which provide scalable data management for workflows and composed applications. Faodel allows workflow components to directly and efficiently exchange data in semantically appropriate forms, rather than those dictated by the storage hierarchy or programming model in use. We describe the architecture of Faodel and present preliminary performance results demonstrating its potential for scalability in workflow scenarios.

european conference on parallel processing | 2013

Investigating the Integration of Supercomputers and Data-Warehouse Appliances

Ron A. Oldfield; George S. Davidson; Craig D. Ulmer; Andrew T. Wilson

Two decades of experience with massively parallel supercomputing has given insight into the problem domains where these architectures are cost effective. Likewise experience with database machines and more recently massively parallel database appliances has shown where these architectures are valuable. Combining both architectures to simultaneously solve problems has received much less attention. In this paper, we describe a motivating application for economic modeling that requires both HPC and database capabilities. Then we discuss hardware and software integration issues related to a direct integration of a Cray XT supercomputer and a Netezza database appliance.

Archive | 2006

FPGAs in High Perfomance Computing: Results from Two LDRD Projects.

Keith D. Underwood; Craig D. Ulmer; David C. Thompson; Karl Scott Hemmert

Field programmable gate arrays (FPGAs) have been used as alternative computational de-vices for over a decade; however, they have not been used for traditional scientific com-puting due to their perceived lack of floating-point performance. In recent years, there hasbeen a surge of interest in alternatives to traditional microprocessors for high performancecomputing. Sandia National Labs began two projects to determine whether FPGAs wouldbe a suitable alternative to microprocessors for high performance scientific computing and,if so, how they should be integrated into the system. We present results that indicate thatFPGAs could have a significant impact on future systems. FPGAs have thepotentialtohave order of magnitude levels of performance wins on several key algorithms; however,there are serious questions as to whether the system integration challenge can be met. Fur-thermore, there remain challenges in FPGA programming and system level reliability whenusing FPGA devices.4 AcknowledgmentArun Rodrigues provided valuable support and assistance in the use of the Structural Sim-ulation Toolkit within an FPGA context. Curtis Janssen and Steve Plimpton provided valu-able insights into the workings of two Sandia applications (MPQC and LAMMPS, respec-tively).5

Proposed for publication in the International Journal of Electronics. | 2005