Allan D. Knies
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Allan D. Knies.
symposium on operating systems principles | 2009
Mihai Dobrescu; Norbert Egi; Katerina J. Argyraki; Byung-Gon Chun; Kevin R. Fall; Gianluca Iannaccone; Allan D. Knies; Maziar Manesh; Sylvia Ratnasamy
We revisit the problem of scaling software routers, motivated by recent advances in server technology that enable high-speed parallel processing--a feature router workloads appear ideally suited to exploit. We propose a software router architecture that parallelizes router functionality both across multiple servers and across multiple cores within a single server. By carefully exploiting parallelism at every opportunity, we demonstrate a 35Gbps parallel router prototype; this router capacity can be linearly scaled through the use of additional servers. Our prototype router is fully programmable using the familiar Click/Linux environment and is built entirely from off-the-shelf, general-purpose server hardware.
programmable routers for extensible services of tomorrow | 2008
Katerina J. Argyraki; Salman A. Baset; Byung-Gon Chun; Kevin R. Fall; Gianluca Iannaccone; Allan D. Knies; Eddie Kohler; Maziar Manesh; Sergiu Nedevschi; Sylvia Ratnasamy
Software routers can lead us from a network of special-purpose hardware routers to one of general-purpose extensible infrastructure - if, that is, they can scale to high speeds. We identify the challenges in achieving this scalability and propose a solution: a cluster-based router architecture that uses an interconnect of commodity server platforms to build software routers that are both incrementally scalable and fully programmable.
international symposium on microarchitecture | 2001
Young-Soo Choi; Allan D. Knies; Luke Gerke; Tin-Fook Ngai
The research community has studied if-conversion for many years. However, due to the lack of existing hardware, studies were conducted by simulating code generated by experimental compilers. This paper presents the first comprehensive study of the use of predication to implement if-conversion on production hardware with a near-production compiler. To better understand trends in the measurements, we generated binaries at three increasing levels of if-conversion aggressiveness. For each level, we gathered data regarding the global runtime effects of if-conversion on overall execution time, register pressure, code size, and branch behavior. Furthermore, we studied the inherent characteristics of program control-flow structure related to branching to help determine fundamental limits of if-conversion. Our results show that on the Itanium™ processor if-conversion could potentially remove 29% of the branch mispredictions in SPEC2OOOCINT but that this accounts for a substantially smaller overall program speedup than previously reported.
international symposium on microarchitecture | 2008
Dong Hyuk Woo; Hsien-Hsin S. Lee; Joshua B. Fryman; Allan D. Knies; Marsha Eng
To build a future many-core processor, industry must address the challenges of energy consumption and performance scalability. A 3D-integrated broad-purpose accelerator architecture called parallel-on-demand (POD) integrates a specialized SIMD-based die layer on top of a CISC superscalar processor to accelerate a variety of data-parallel applications. It also maintains binary compatibility and facilitates extensibility by virtualizing the acceleration capability.
ACM Transactions on Architecture and Code Optimization | 2010
Dong Hyuk Woo; Joshua B. Fryman; Allan D. Knies; Hsien-Hsin Sean Lee
Heterogeneous multicore processors have emerged as an energy- and area-efficient architectural solution to improving performance for domain-specific applications such as those with a plethora of data-level parallelism. These processors typically contain a large number of small, compute-centric cores for acceleration while keeping one or two high-performance ILP cores on the die to guarantee single-thread performance. Although a major portion of the transistors are occupied by the acceleration cores, these resources will sit idle when running unparallelized legacy codes or the sequential part of an application. To address this underutilization issue, in this article, we introduce Chameleon, a flexible heterogeneous multicore architecture to virtualize these resources for enhancing memory performance when running sequential programs. The Chameleon architecture can dynamically virtualize the idle acceleration cores into a last-level cache, a data prefetcher, or a hybrid between these two techniques. In addition, Chameleon can operate in an adaptive mode that dynamically configures the acceleration cores between the hybrid mode and the prefetch-only mode by monitoring the effectiveness of the Chameleon cache mode. In our evaluation with SPEC2006 benchmark suite, different levels of performance improvements were achieved in different modes for different applications. In the case of the adaptive mode, Chameleon improves the performance of SPECint06 and SPECfp06 by 31% and 15%, on average. When considering only memory-intensive applications, Chameleon improves the system performance by 50% and 26% for SPECint06 and SPECfp06, respectively.
Archive | 2010
Gianluca Iannaccone; Sylvia Ratnasamy; Maziar Manesh; Katerina Argyraki; Byung-Gon Chun; Kevin R. Fall; Allan D. Knies; Norbert Egi; Mihai Dobrescu; Salman A. Baset
Archive | 2009
Norbert Egi; Mihai Dobrescu; Jiaqing Du; Katerina Argyraki; Byung-Gon Chun; Kevin R. Fall; Gianluca Iannaccone; Allan D. Knies; Maziar Manesh; Laurent Mathy; Sylvia Ratnasamy
Journal of Engineering Education | 1997
Seth Abraham; Allan D. Knies; Kristen L. Kukral; Thomas E. Willis
Archive | 2012
Masha Lipshits; Lihu Rappaport; Shantanu R. Gupta; Franck Sala; Naveen Kumar; Allan D. Knies
Archive | 2014
Joshua B. Fryman; Allan D. Knies