Michael Steffen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael Steffen is active.

Explore More

Publication

Featured researches published by Michael Steffen.

International Journal of Information Security | 2011

A case study in hardware Trojan design and implementation

Alex Baumgarten; Michael Steffen; Matthew Clausman; Joseph Zambreno

As integrated circuits (ICs) continue to have an overwhelming presence in our digital information-dominated world, having trust in their manufacture and distribution mechanisms is crucial. However, with ever-shrinking transistor technologies, the cost of new fabrication facilities is becoming prohibitive, pushing industry to make greater use of potentially less reliable foreign sources for their IC supply. The 2008 Computer Security Awareness Week (CSAW) Embedded Systems Challenge at the Polytechnic Institute of NYU highlighted some of the vulnerabilities of the IC supply chain in the form of a hardware hacking challenge. This paper explores the design and implementation of our winning entry.

international symposium on microarchitecture | 2010

Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels

Michael Steffen; Joseph Zambreno

Wide Single Instruction, Multiple Thread (SIMT)architectures often require a static allocation of thread groups that are executed in lockstep throughout the entire application kernel. Individual thread branching is supported by executing all control ¿ow paths for threads in a thread group and only committing the results of threads on the current control path. While convergence algorithms are used to maximize processorefficiency during branching operations, applications requiring complex control ¿ow often result in low processor efficiency due to the length and quantity of control paths. Global rendering algorithms are an example of a class of application that can be accelerated using a large number of independent parallel threads that each require complex control ¿ow, resulting in comparatively low efficiency on SIMT processors. To improve processor utilization for global rendering algorithms, we introduce a SIMT architecture that allows for threads to be created dynamically at runtime. Large application kernels are broken down into smaller code blocks we call µ-kernels that dynamically created threads can execute. These runtime µ-kernels allow for the removal of branching statements that would cause divergence within a thread group, and result in new threads being created and grouped with threads beginning execution of the same µ-kernel. In our evaluation of SIMT processor efficiency for a global rendering algorithms, dynamicµ-kernels improved processor performance by an average of1.4×.

reconfigurable computing and fpgas | 2008

A Reconfigurable Platform for Frequent Pattern Mining

Song Sun; Michael Steffen; Joseph Zambreno

In this paper, a new hardware architecture for frequent pattern mining based on a systolic tree structure is proposed. The goal of this architecture is to mimic the internal memory layout of the original FP-growth algorithm while achieving a much higher throughput. We also describe an embedded platform implementation of this architecture along with detailed analysis of area requirements and performance results for different configurations. Our results show that with an appropriate selection of tree size, the reconfigurable platform can be several orders of magnitude faster than the FP-growth algorithm.

symposium on application specific processors | 2010

A hardware pipeline for accelerating ray traversal algorithms on streaming processors

Michael Steffen; Joseph Zambreno

Ray Tracing is a graphics rendering method that uses rays to trace the path of light in a computer model. To accelerate the processing of rays, scenes are typically compiled into smaller spatial boxes using a tree structure and rays then traverse the tree structure to determine relevant spatial boxes. This allows computations involving rays and scene objects to be limited to only objects close to the ray and does not require processing all elements in the computer model. We present a ray traversal pipeline designed to accelerate ray tracing traversal algorithms using a combination of currently used programmable graphics processors and a new fixed hardware pipeline. Our fixed hardware pipeline performs an initial traversal operation that quickly identifies a smaller sized, fixed granularity spatial bounding box from the original scene. This spatial box can then be traversed further to identify subsequently smaller spatial bounding boxes using any user-defined acceleration algorithm. We show that our pipeline allows for an expected level of user programmability, including development of custom data structures, and can support a wide range of processor architectures. The performance of our pipeline is evaluated for ray traversal and intersection stages using a kd-tree ray tracing algorithm and a custom simulator modeling a generic streaming processor architecture. Experimental results show that our pipeline reduces the number of executed instructions on a graphics processor for the traversal operation by 2.15X for visible rays. The memory bandwidth required for traversal is also reduced by a factor of 1.3X for visible rays.

TPCG | 2009

Design and Evaluation of a Hardware Accelerated Ray Tracing Data Structure

Michael Steffen; Joseph Zambreno

The increase in graphics card performance and processor core count has allowed significant performance acceleration for ray tracing applications. Future graphics architectures are expected to continue increasing the number of processor cores, further improving performance by exploiting data parallelism. However, current ray tracing implementations are based on recursive searches which involve multiple memory reads. Consequently, software implementations are used without any dedicated hardware acceleration. In this paper, we introduce a ray tracing method designed around hierarchical space subdivision schemes that reduces memory operations. In addition, parts of this traversal method can be performed in fixed hardware running in parallel with programmable graphics processors. We used a custom performance simulator that uses our traversal method, based on a kd-tree, to compare against a conventional kd-tree. The system memory requirements and system memory reads are analyzed in detail for both acceleration structures. We simulated six benchmark scenes and show a reduction in the number of memory reads of up to 70 percent compared to current recursive methods for scenes with over 100,000 polygons.

Archive | 2012