Michel A. Kinsy
Boston University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michel A. Kinsy.
IEEE Transactions on Computers | 2016
Pengju Ren; Xiaowei Ren; Sudhanshu Sane; Michel A. Kinsy; Nanning Zheng
To improve the reliability of on-chip network based systems, we design a deadlock-free routing technique that is more resilient to component failures and guarantees a higher degree of node connectivity. The routing methodology consists of three key steps. First, we determine the maximal connected subgraph of the faulty network by checking whether the defective components happen to be the cut vertices and bridges of the network topology. A precise fault diagnosis mechanism is used to identify partial defective routers. Second, we construct an acyclic channel dependency graph that breaks all cycles and preserves connectivity of the maximal connected subgraph. This is done through the cycle-breaking and connectivity guaranteed (CBCG) algorithm. Finally, we introduce a fault-tolerant adaptive routing scheme that can be used with or without virtual channels for network congestion avoidance and high-throughput routing. The simulation results show both the effectiveness and robustness of the proposed approach. For an 8 × 8 2D-Mesh with 40 percent of link damage, full connectivity and deadlock freedom are still archived without disabling any faultless router in 98.18 percent of the simulations. In a 2D-Torus, the simulation percentage is even higher (99.93 percent). The hardware overhead for supporting the introduced features is minimal. An on-line implementation of CBCG using TSMC 65nm library has only 0.966 and 1.139 percent area overhead for the 8 × 8 and 16 × 16 2D-Meshes.
IEEE Transactions on Computers | 2016
Pengju Ren; Michel A. Kinsy; Nanning Zheng
Routing algorithm design for on-chip networks (OCNs) has become increasingly challenging due to high levels of integration and complexity of modern systems-on-chip (SoCs). The inherent unreliability of components, embedded oversized IP blocks, and finegrained voltage-frequency islands (VFIs) management among others, raise several challenges in OCNs: (a) network topologies become irregular or asymmetric making circular route dependencies that lead to deadlock hard to detect; and (b) routing algorithms that lack strong load-balancing properties often saturate prematurely. In order to address the aforementioned deadlock and loadbalancing problems, we propose the traffic balancing oblivious routing (TBOR) algorithm. It is a two-phase routing algorithm consisting of: (1) construction of the weighted acyclic channel dependency graph (CDG) for the OCN to efficiently maximize available resource utilization; and (2) channel ordering across turn models to keep the underlying CDG cycle-free to guarantee deadlock-freedom using one or more turn-models. Channel bandwidth utilization and traffic balancing are achieved through static virtual channel allocation according to residual bandwidth of healthy links. In addition, we introduce in this work two schemes of different granularity of fault detection and analysis while guaranteeing in-order packet delivery by assigning a unique path to each flow. Extensive experiments demonstrate the proposed routing methodology outperforms previous algorithms.
great lakes symposium on vlsi | 2017
Michel A. Kinsy; Shreeya Khadka; Mihailo Isakov
In this paper, we introduce a neural network based predictive routing algorithm for on-chip networks which uses anticipated global network state and congestion information to efficiently route network traffic. The core of the algorithm is a multi-layer neural network machine learning approach where the inputs are level of occupancy of virtual channels, average latency for a particular router to be selected for route computation, the probability of virtual channel allocation, and the probability of winning switch arbitration at the crossbar. The algorithm lends itself to both node routing and source routing. To evaluate the PreNoc routing algorithm, we simulate both synthetic traffic and real application traces using a cycle-accurate simulator. In most test cases, the proposed approach outperforms current deterministic and adaptive routing techniques in terms of latency and throughput. The hardware overhead for supporting the new routing algorithm is minimal.
ieee high performance extreme computing conference | 2014
Michel A. Kinsy; Srinivas Devadas
The increasing complexity of embedded systems is accelerating the use of multicore processors in these systems. This trend gives rise to new problems such as the sharing of on-chip network resources among hard real-time and normal best effort data traffic. We propose a network-on-chip router that provides predictable and deterministic communication latency for hard real-time data traffic while maintaining high concurrency and throughput for best-effort/general-purpose traffic with minimal hardware overhead. The proposed router requires less area than non-interfering networks, and provides better Quality of Service (QoS) in terms of predictability and determinism to hard real-time traffic than priority-based routers. We present a deadlock-free algorithm for decoupled routing of the two types of traffic. We compare the area and power estimates of three different router architectures with various QoS schemes using the IBM 45-nm SOI CMOS technology cell library. Performance evaluations are done using three realistic benchmark applications: a hybrid electric vehicle application, a utility grid connected photovoltaic converter system, and a variable speed induction motor drive application.
ieee high performance extreme computing conference | 2014
Michel A. Kinsy; Srinivas Devadas
In this paper we present an Integer Linear Programming (ILP) formulation and two non-iterative heuristics for scheduling a task-based application onto a heterogeneous many-core architecture. Our ILP formulation is able to handle different application performance targets, e.g., low execution time, low memory miss rate, and different architectural features, e.g., cache sizes. For large size problem where the ILP convergence time may be too long, we propose a simple mapping algorithm which tries to spread tasks onto as many processing units as possible, and a more elaborate heuristic that shows good mapping performance when compared to the ILP formulation. We use two realistic power electronics applications to evaluate our mapping techniques on full RTL many-core systems consisting of eight different types of processor cores.
great lakes symposium on vlsi | 2018
Lake Bu; Michel A. Kinsy
The Advanced Encryption Standard (AES) enables secure transmission of confidential messages. Since its invention, there have been many proposed attacks against the scheme. For example, one can inject errors or faults to acquire the encryption keys. It has been shown that the AES algorithm itself does not provide a protection against these types of attacks. Therefore, additional techniques like error control codes (ECCs) have been proposed to detect active attacks. However, not all the proposed solutions show the adequate efficacy. For instance, linear ECCs have some critical limitations, especially when the injected errors are beyond their fault detection or tolerance capabilities. In this paper, we propose a new method based on a non-linear code to protect all four internal stages of the AES hardware implementation. With this method, the protected AES system is able to (a) detect all multiplicity of errors with a high probability and (b) correct them if the errors follow certain patterns or frequencies. Results shows that the proposed method provides much higher security and reliability to the AES hardware implementation with minimal overhead.
ad hoc networks | 2018
Lake Bu; Mihailo Isakov; Michel A. Kinsy
Abstract In Internet of Things (IoT) systems with security demands, there is often a need to distribute sensitive information (such as encryption keys, digital signatures, or login credentials etc.) to the devices, so that it can be retrieved for confidential purposes at a later moment. However, this piece of information cannot be entrusted to any individual device, since the malfunction of one device will jeopardize the security of the entire network. Even if the information is split among the devices, there is still a danger when attackers compromise a group of them. Therefore we have designed and implemented a secure and robust scheme to facilitate the sharing of sensitive information in IoT networks. This solution provides two important features: 1) This scheme uses Threshold Secret Sharing (TSS) to split the information into shares to be kept by all devices in the system. And so the information can only be retrieved collaboratively by groups of devices. 2) This scheme ensures the privacy and integrity of that piece of information even when there is a large amount of sophisticated and collusive attackers who can hijack the devices. It is able to identify all the compromised devices, while still keeping the secret unknown and unforgeable to attackers.
Proceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies - HEART 2018 | 2018
Alan Ehret; Peter Jamieson; Michel A. Kinsy
Quorum sensing in cells is a generalized framework for modeling and analyzing the local density of the bacterial population in a given biological environment. It has applications in biology, medical and therapeutic domains, e.g., cancer cell research. Software-based simulations are generally slow and only provide a certain level of functional faithfulness or model fidelity. In this work we introduce a scalable open-source architecture to accelerate bacterial quorum sensing simulations called ABAQS (Agent Based Architecture for Quorum sensing Simulation). The presented architecture allows researchers to create and launch new simulations by quickly incorporating custom cell models. The architecture is highly modular and separates the functional model from control logic. It has a simple interface to enable users to readily connect their custom models to the simulation platform. To illustrate the proposed architecture, we present the implementation details and results for a small-scale model representing up to 81 cells which we have synthesized and configured on an FPGA. We also highlight some of the key features to be implemented in future versions of the proposed architecture. The open-source license of this project will allow other researchers to contribute and improve the architecture to (a) better fit their quorum sensing simulations and (b) give the community a flexible simulation acceleration tool.
Iet Computers and Digital Techniques | 2018
Seyed Mohammad Sebt; Ahmad Patooghy; Hakem Beitollahi; Michel A. Kinsy
A hardware Trojan (HT) is an extra circuitry inserted into a chip design with the malicious aim of functionality alteration, reliability degradation or secret information leakage. It is normally very hard to find HT activation signals since such signals are intended to activate upon occurring very rare conditions on specific nets of the infected circuit. A security engineer would have to search among thousands of gates and modules to make sure about the non-existence of design-time HTs in the circuit. The authors propose efficient net susceptibility metrics to significantly speedup functional-HT detection in gate-level digital designs. The proposed metrics perform a computationally low overhead analysis on the controllability and observability parameters of each net of the under HT-test circuit. Then, using a proposed net classifier method, a very low percentage of circuit nets is determined as HT trigger suspicious nets. To show practicality and detection accuracy of the proposed metrics, gate-level circuits of Trust-HUB benchmark suite are examined by the proposed metrics. Results confirm a 100% HT trigger detection with a low false positive as compared with previous metrics. More importantly, unlike previously proposed methods, the authors detection accuracy is totally independent of the switching probability of circuit inputs.
Cryptography | 2018
Michel A. Kinsy; Lake Bu; Mihailo Isakov; Miguel Mark
In current systems-on-chip (SoCs) designs, processing elements, i.e., intellectual property (IP) cores, may come from different providers, and executable code may have varying levels of trust, all executing on the same compute platform and sharing resources. This creates a very fertile attack ground and represents the Achilles’ heel of heterogeneous SoC architectures and distributed connected devices. The general consensus today is that conventional approaches and software-only add-on schemes fail to provide sufficient security protections and trustworthiness. In this paper, we develop a secure heterogeneous SoC architecture named Hermes. It represents a new architectural model that integrates multiple processing elements (called tenants) of secure and non-secure cores into the same chip design while: (a) maintaining individual tenant security; (b) preventing data leakage and corruption; (c) promoting collaboration among the tenants; and (d) tolerating untrusted tenants with potentially malicious purposes. The Hermes architecture is based on a programmable secure router interface and a trust-aware routing algorithm. Depending on the trust levels of computing nodes, it is able to virtually isolate them in different access modes to the memory blocks. With secure key management and join protocols, Hermes is also able to function properly when nodes request for, or allow, memory access in a dishonest manner. With 17% hardware overhead, it enables the implementation of processing-element-oblivious secure multicore systems with a programmable distributed group key management scheme. The Hermes architecture is meant to emblematize the design of secure heterogeneous multicore computing systems out of unsecured or untrusted components using user-defined security policies to create at the hardware-level virtual zones to enforce these security and trust policies.