Andrew DeOrio
University of Michigan
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andrew DeOrio.
design, automation, and test in europe | 2009
David Fick; Andrew DeOrio; Gregory K. Chen; Valeria Bertacco; Dennis Sylvester; David T. Blaauw
Current trends in technology scaling foreshadow worsening transistor reliability as well as greater numbers of transistors in each system. The combination of these factors will soon make long-term product reliability extremely difficult in complex modern systems such as systems on a chip (SoC) and chip multiprocessor (CMP) designs, where even a single device failure can cause fatal system errors. Resiliency to device failure will be a necessary condition at future technology nodes. In this work, we present a network-on-chip (NoC) routing algorithm to boost the robustness in interconnect networks, by reconfiguring them to avoid faulty components while maintaining connectivity and correct operation. This distributed algorithm can be implemented in hardware with less than 300 gates per network router. Experimental results over a broad range of 2D-mesh and 2D-torus networks demonstrate 99.99% reliability on average when 10% of the interconnect links have failed.
design automation conference | 2009
David Fick; Andrew DeOrio; Jin Hu; Valeria Bertacco; David T. Blaauw; Dennis Sylvester
Process scaling has given designers billions of transistors to work with. As feature sizes near the atomic scale, extensive variation and wearout inevitably make margining uneconomical or impossible. The ElastIC project seeks to address this by creating a large-scale chip-multiprocessor that can self-diagnose, adapt, and heal. Creating large, flexible designs in this environment naturally lends itself to the repetitive nature of network-on-chip (NoC), but the loss of a single link or router will result in complete network failure. In this work we present Vicis, an ElastIC-style NoC that can tolerate the loss of many network components due to wearout induced hard faults. Vicis uses the inherent redundancy in the network and its routers in order to maintain correct operation while incurring a much lower area overhead than previously proposed N-modular redundancy (NMR) based solutions. Each router has a built-in-self-test (BIST) that diagnoses the locations of hard fault and runs a number of algorithms to best use ECC, port swapping, and a crossbar bypass bus to mitigate them. The routers work together to run distributed algorithms to solve network-wide problems as well, protecting the networking against critical failures in individual routers. In this work we show that with stuck-at fault rates as high as 1 in 2000 gates, Vicis will continue to operate with approximately half of its routers still functional and communicating.
design automation conference | 2009
Debapriya Chatterjee; Andrew DeOrio; Valeria Bertacco
Logic simulation is a critical component of the design tool flow in modern hardware development efforts. It is used widely from high level descriptions down to gate level ones to validate several aspects of the design, particularly functional correctness. Despite development houses investing vast resources in the simulation task, particularly at the gate level, it is still far from achieving the performance demands required to validate complex modern designs. In this work, we propose the first event driven logic simulator accelerated by a parallel, general purpose graphics processor (GPGPU). Our simulator leverages a gate level event driven design to exploit the benefits of the low switching activity that is typical of large hardware designs. We developed novel algorithms for circuit netlist partitioning and optimized for a highly parallel GPGPU host. Moreover, our flow is structured to extract the best simulation performance from the target hardware platform. We found that our experimental prototype could handle large, industrial scale designs comprised of millions of gates and deliver a 13x speedup on average over current commercial event driven simulators.
international conference on parallel architectures and compilation techniques | 2011
Konstantinos Aisopos; Andrew DeOrio; Li Shiuan Peh; Valeria Bertacco
Extreme transistor technology scaling is causing increasing concerns in device reliability: the expected lifetime of individual transistors in complex chips is quickly decreasing, and the problem is expected to worsen at future technology nodes. With complex designs increasingly relying on Networks-on-Chip (NoCs) for on-chip data transfers, a NoC must continue to operate even in the face of many transistor failures. Specifically, it must be able to reconfigure and reroute packets around faults to enable continued operation, i.e., generate new routing paths to replace the old ones upon a failure. In addition to these reliability requirements, NoCs must maintain low latency and high throughput at very low area budget. In this work, we propose a distributed reconfiguration solution named Ariadne, targeting large, aggressively scaled, unreliable NoCs. Ariadne utilizes up*/down* for fast routing at high bandwidth, and upon any number of concurrent network failures in any location, it reconfigures to discover new resilient paths to connect the surviving nodes. Experimental results show that Ariadne provides a 40%-140% latency improvement (when subject to 50 faults in a 64-node NoC) over other on-chip state-of-the-art fault tolerant solutions, while meeting the low area budget of on-chip routers with an overhead of just 1.97%.
design, automation, and test in europe | 2009
Debapriya Chatterjee; Andrew DeOrio; Valeria Bertacco
In recent years, the verification of digital designs has become one of the most challenging, time consuming and critical tasks in the entire hardware development process. Within this area, the vast majority of the verification effort in industry relies on logic simulation tools. However, logic simulators deliver limited performance when faced with vastly complex modern systems, especially synthesized netlists. The consequences are poor design coverage, delayed product releases and bugs that escape into silicon. Thus, we developed a novel GPU-accelerated logic simulator, called GCS, optimized for large structural netlists. By leveraging the vast parallelism offered by GP-GPUs and a novel netlist balancing algorithm tuned for the target architecture, we can attain an order-of-magnitude performance improvement on average over commercial logic simulators, and simulate large industrial-size designs, such as the OpenSPARC processor core design.
high-performance computer architecture | 2009
Andrew DeOrio; Ilya Wagner; Valeria Bertacco
The number of functional errors escaping design verification and being released into final silicon is growing, due to the increasing complexity and shrinking production schedules of modern processor designs. Recent trends towards chip multiprocessors (CMPs) are exacerbating the problem because of their complex and sometimes non-deterministic memory subsystems, prone to subtle but devastating bugs. This deteriorating situation calls for high-efficiency, high-coverage results in functional validation, results that are be achieved by leveraging the performance of post-silicon validation, that is, those verification tasks that are executed directly on prototype hardware. The orders-of-magnitude faster testing in post-silicon enables designers to achieve much higher coverage before customer release, but only if the limitations of this technology in diagnosis and internal node observability could be overcome. In this work, we unlock the full performance of post-silicon validation through Dacota, a new high-coverage solution for validating memory operation ordering in CMPs. When activated, Dacota reconfigures a portion of the cache storage to log memory accesses using a compact data-coloring scheme. Logs are periodically aggregated and checked by a distributed algorithm running in-situ on the CMP to verify correct memory operation ordering. When the design is ready for customer shipment, Dacota can be deactivated, releasing all cache storage, and only leaving a small silicon area footprint, less than 0.01% (three orders of magnitude smaller than previous solutions). We found experimentally that Dacota is effective in exposing memory subsystem bugs, and it delivers its high coverage capabilities at a 26% performance slowdown (only during validation) for real-world applications.
vlsi test symposium | 2012
Amirali Ghofrani; Ritesh Parikh; Saeed Shamshiri; Andrew DeOrio; Kwang-Ting Cheng; Valeria Bertacco
We propose a comprehensive yet low-cost solution for online detection and diagnosis of permanent faults in on-chip networks. Using error syndrome collection and packet/flit-counting techniques, high-resolution defect diagnosis is feasible in both datapath and control logic of the on-chip network without injecting any test traffic or incurring significant performance overhead.
international conference on computer design | 2008
Andrew DeOrio; Adam Bauserman; Valeria Bertacco
Modern processor designs are extremely complex and difficult to validate during development, causing a growing portion of the verification effort to shift to post-silicon, after the first few hardware prototypes become available. Extremely slow simulation speeds during pre-silicon verification result in functional errors escaping into silicon, a problem that is further exacerbated by the growing complexity of the memory subsystem in multi-core platforms. In this work we present CoSMa, a novel technology offering high coverage functional post-silicon validation of cache coherence protocols in multi-core systems. It enables the detection and diagnosis of functional errors in the memory subsystem by recording at runtime a compact encoding of the operations occurring at each cache line and checking their correctness at regular intervals. We leverage the systempsilas existing memory resources to store the required activity, thus minimizing area overhead. When the system is finally ready for customer shipment, CoSMa can be completely disabled, eliminating any performance or memory overhead. We reproduce in our experiments a set of coherence protocol bugs based on published errata documents of commercial multi-core designs, and show that CoSMa is highly effective in detecting them.
design automation conference | 2011
Andrew DeOrio; Konstantinos Aisopos; Valeria Bertacco; Li-Shiuan Peh
As transistor dimensions continue to scale deep into the nanometer regime, silicon reliability is becoming a chief concern. At the same time, transistor counts are scaling up, enabling the design of highly integrated chips with many cores and a complex interconnect fabric, often a network on chip (NoC). Particularly problematic is the case when the accumulation of permanent hardware faults leads to disconnected cores in the system. In order to maintain correct system operation, it is necessary to salvage the data from these isolated nodes. In this work, we introduce a recovery mechanism targeting precisely this issue: DRAIN (Distributed Recovery Architecture for Inaccessible Nodes) provides system-level recovery from permanent failures. When an error disconnects a node from the network, DRAIN uses emergency links to transfer architectural state and cached data from disconnected nodes to nearby connected caches. DRAIN incurs zero performance penalty during normal operation, and is compatible with any cache coherence protocol, interconnect topology or routing protocol. Experimental results show that DRAIN is able to provide complete state recovery within several milliseconds, on average, for a 1GHz 64-node CMP at an area overhead of only a few thousand gates.
design, automation, and test in europe | 2013
Andrew DeOrio; Qingkun Li; Matthew Burgess; Valeria Bertacco
The exponentially growing complexity of modern processors intensifies verification challenges. Traditional pre-silicon verification covers less and less of the design space, resulting in increasing post-silicon validation effort. A critical challenge is the manual debugging of intermittent failures on prototype chips, where multiple executions of a same test do not yield a consistent outcome. We leverage the power of machine learning to support automatic diagnosis of these difficult, inconsistent bugs. During post-silicon validation, lightweight hardware logs a compact measurement of observed signal activity over multiple executions of a same test: some may pass, somemay fail. Our novel algorithm applies anomaly detection techniques similar to those used to detect credit card fraud to identify the approximate cycle of a bugs occurrence and a set of candidate root-cause signals. Compared against other state-of-the-art solutions in this space, our new approach can locate the time of a bugs occurrence with nearly 4x better accuracy when applied to the complex OpenSPARC T2 design.