Ebrahim M. Songhori | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ebrahim M. Songhori is active.

Explore More

Publication

Featured researches published by Ebrahim M. Songhori.

ieee symposium on security and privacy | 2015

TinyGarble: Highly Compressed and Scalable Sequential Garbled Circuits

Ebrahim M. Songhori; Siam U. Hussain; Ahmad-Reza Sadeghi; Thomas Schneider; Farinaz Koushanfar

We introduce Tiny Garble, a novel automated methodology based on powerful logic synthesis techniques for generating and optimizing compressed Boolean circuits used in secure computation, such as Yaos Garbled Circuit (GC) protocol. Tiny Garble achieves an unprecedented level of compactness and scalability by using a sequential circuit description for GC. We introduce new libraries and transformations, such that our sequential circuits can be optimized and securely evaluated by interfacing with available garbling frameworks. The circuit compactness makes the memory footprint of the garbling operation fit in the processor cache, resulting in fewer cache misses and thereby less CPU cycles. Our proof-of-concept implementation of benchmark functions using Tiny Garble demonstrates a high degree of compactness and scalability. We improve the results of existing automated tools for GC generation by orders of magnitude, for example, Tiny Garble can compress the memory footprint required for 1024-bit multiplication by a factor of 4,172, while decreasing the number of non-XOR gates by 67%. Moreover, with Tiny Garble we are able to implement functions that have never been reported before, such as SHA-3. Finally, our sequential description enables us to design and realize a garbled processor, using the MIPS I instruction set, for private function evaluation. To the best of our knowledge, this is the first scalable emulation of a general purpose processor.

ieee international conference on pervasive computing and communications | 2013

Idetic: A high-level synthesis approach for enabling long computations on transiently-powered ASICs

Azalia Mirhoseini; Ebrahim M. Songhori; Farinaz Koushanfar

We develop Idetic, a set of mechanisms to enable long computations on ultra-low power Application Specific Integrated Circuits (ASICs) with energy harvesting sources. We address the power transiency and unpredictability problem by optimally inserting checkpoints. Idetic targets highlevel synthesis designs and automatically locates and embeds the checkpoints at the register-transfer level. We define an objective function that aims to find the checkpoints which incur minimum overhead and minimize recomputation energy cost. We develop and exploit a dynamic programming technique to solve the optimization problem. For real time operation, Idetic adaptively adjusts the checkpointing rate based on the available energy level in the system. Idetic is deployed and evaluated on cryptographic benchmark circuits. The test platform harvests RF power through an RFID-reader and stores the energy in a 3.3μF capacitor. For storage of checkpointed data, we evaluate and compare the effectiveness of various non-volatile memories including NAND Flash, PCM, and STTM. Extensive evaluations show that Idetic reliably enables execution of long computations under different source power patterns with low overhead. Our benchmark evaluations demonstrate that the area and energy overheads corresponding to the checkpoints are less than 5% and 11% respectively.

field-programmable custom computing machines | 2015

SSketch: An Automated Framework for Streaming Sketch-Based Analysis of Big Data on FPGA

Bita Darvish Rouhani; Ebrahim M. Songhori; Azalia Mirhoseini; Farinaz Koushanfar

This paper proposes SSketch, a novel automated computing framework for FPGA-based online analysis of big data with dense (non-sparse) correlation matrices. SSketch targets streaming applications where each data sample can be processed only once and storage is severely limited. The stream of input data is used by SSketch for adaptive learning and updating a corresponding ensemble of lower dimensional data structures, a.k.a., A sketch matrix. A new sketching methodology is introduced that tailors the problem of transforming the big data with dense correlations to an ensemble of lower dimensional subspaces such that it is suitable for hardware-based acceleration performed by reconfigurable hardware. The new method is scalable, while it significantly reduces costly memory interactions and enhances matrix computation performance by leveraging coarse-grained parallelism existing in the dataset. To facilitate automation, SSketch takes advantage of a HW/SW co-design approach: It provides an Application Programming Interface (API) that can be customized for rapid prototyping of an arbitrary matrix-based data analysis algorithm. Proof-of-concept evaluations on a variety of visual datasets with more than 11 million non-zeros demonstrates up to 200 folds speedup on our hardware-accelerated realization of SSketch compared to a software-based deployment on a general purpose processor.

international symposium on low power electronics and design | 2013

Automated checkpointing for enabling intensive applications on energy harvesting devices

Azalia Mirhoseini; Ebrahim M. Songhori; Farinaz Koushanfar

We propose a framework that enables intensive computation on ultra-low power devices with discontinuous energy-harvesting supplies. We devise an optimization algorithm that efficiently partitions the applications into smaller computational steps during high-level synthesis. Our system finds low-overhead checkpoints that minimize recomputation cost due to power losses, then inserts the checkpoints at the designs registertransfer level. The checkpointing rate is automatically adapted to the sources realtime behavior. We evaluate our mechanisms on a battery-less RF energy-harvester platform. Extensive experiments targeting applications in medical implant devices demonstrate our approachs ability to successfully execute complex computations for various supply patterns with low time, energy, and area overheads.

design automation conference | 2016

Perform-ML: performance optimized machine learning by platform and content aware customization

Azalia Mirhoseini; Bita Darvish Rouhani; Ebrahim M. Songhori; Farinaz Koushanfar

We propose Perform-ML, the first Machine Learning (ML) framework for analysis of massive and dense data which customizes the algorithm to the underlying platform for the purpose of achieving optimized resource efficiency. Perform-ML creates a performance model quantifying the computational cost of iterative analysis algorithms on a pertinent platform in terms of FLOPs, communication, and memory, which characterize runtime, storage, and energy. The core of Perform-ML is a novel parametric data projection algorithm, called Elastic Dictionary (ExD), that enables versatile and sparse representations of the data which can help in minimizing performance cost. We show that Perform-ML can achieve the optimal performance objective, according to our cost model, by platform aware tuning of the ExD parameters. An accompanying API ensures automated applicability of Perform-ML to various algorithms, datasets, and platforms. Proof-of-concept evaluations of massive and dense data on different platforms demonstrate more than an order of magnitude improvements in performance compared to the state of the art, within guaranteed user-defined error bounds.

design automation conference | 2016

GarbledCPU: a MIPS processor for secure computation in hardware

Ebrahim M. Songhori; Shaza Zeitouni; Ghada Dessouky; Thomas Schneider; Ahmad-Reza Sadeghi; Farinaz Koushanfar

We present GarbledCPU, the first framework that realizes a hardware-based general purpose sequential processor for secure computation. Our MIPS-based implementation enables development of applications (functions) in a high-level language while performing secure function evaluation (SFE) using Yaos garbled circuit protocol in hardware. GarbledCPU provides three degrees of freedom for SFE which allow leveraging the trade-off between privacy and performance: public functions, private functions, and semi-private functions. We synthesize GarbledCPU on a Virtex-7 FPGA as a proof-of-concept implementation and evaluate it on various benchmarks including Hamming distance, private set intersection and AES. Our results indicate that our pipelined hardware framework outperforms the fastest available software implementation.

design automation conference | 2015

Compacting privacy-preserving k-nearest neighbor search using logic synthesis

Ebrahim M. Songhori; Siam U. Hussain; Ahmad-Reza Sadeghi; Farinaz Koushanfar

This paper introduces the first efficient, scalable, and practical method for privacy-preserving k-nearest neighbors (k-NN) search. The approach enables performing the widely used k-NN search in sensitive scenarios where none of the parties reveal their information while they can still cooperatively find the nearest matches. The privacy preservation is based on the Yaos garbled circuit (GC) protocol. In contrast with the existing GC approaches that only accept function descriptions as combinational circuits, we suggest using sequential circuits. This work introduces novel transformations, such that the sequential description can be evaluated by interfacing with the existing GC schemes that only accept combinational circuits. We demonstrate a great efficiency in the memory required for realizing the secure k-NN search. The first-of-a-kind implementation of privacy preserving k-NN, utilizing the Synopsys Design Compiler on a conventional Intel processor demonstrates the applicability, efficiency, and scalability of the suggested methods.

ACM Transactions on Reconfigurable Technology and Systems | 2016

Automated Real-Time Analysis of Streaming Big and Dense Data on Reconfigurable Platforms

Bita Darvish Rouhani; Azalia Mirhoseini; Ebrahim M. Songhori; Farinaz Koushanfar

We propose SSketch, a novel automated framework for efficient analysis of dynamic big data with dense (non-sparse) correlation matrices on reconfigurable platforms. SSketch targets streaming applications where each data sample can be processed only once and storage is severely limited. Our framework adaptively learns from the stream of input data and updates a corresponding ensemble of lower-dimensional data structures, a.k.a., a sketch matrix. A new sketching methodology is introduced that tailors the problem of transforming the big data with dense correlations to an ensemble of lower-dimensional subspaces such that it is suitable for hardware-based acceleration performed by reconfigurable hardware. The new method is scalable, while it significantly reduces costly memory interactions and enhances matrix computation performance by leveraging coarse-grained parallelism existing in the dataset. SSketch provides an automated optimization methodology for creating the most accurate data sketch for a given set of user-defined constraints, including runtime and power as well as platform constraints such as memory. To facilitate automation, SSketch takes advantage of a Hardware/Software (HW/SW) co-design approach: It provides an Application Programming Interface that can be customized for rapid prototyping of an arbitrary matrix-based data analysis algorithm. Proof-of-concept evaluations on a variety of visual datasets with more than 11 million non-zeros demonstrate up to a 200-fold speedup on our hardware-accelerated realization of SSketch compared to a software-based deployment on a general-purpose processor.

design automation conference | 2017

PriSearch: Efficient Search on Private Data

M. Sadegh Riazi; Ebrahim M. Songhori; Farinaz Koushanfar

We propose PriSearch, a provably secure methodology for two-party string search. The scenario involves two parties, Alice (holding a query string) and Bob (holding a text), who wish to perform a string search while keeping both the query and the text private without relying on any third party. Such privacy-preserving string search avoids any data leakage when handling sensitive information, e.g., genomic data. PriSearch provides an efficient solution where two parties only need to interact for a constant number of rounds independent of the query and text size. Our approach is based on the provably secure Yaos Garbled Circuit (GC) protocol that requires the string search algorithm to be described as a Boolean circuit. We leverage logic synthesis tools to generate an optimized Boolean circuit for PriSearch such that it incurs the minimum communication/computation cost. We achieve approximately 2× and 140× performance improvements compared to the best prior non-GC and GC-based solutions, respectively.

measurement and modeling of computer systems | 2015

Flexible Transformations For Learning Big Data

Azalia Mirhoseini; Ebrahim M. Songhori; Bita Darvish Rouhani; Farinaz Koushanfar

This paper proposes a domain-specific solution for iterative learning of big and dense (non-sparse) datasets. A large host of learning algorithms, including linear and regularized regression techniques, rely on iterative updates on the data connectivity matrix in order to converge to a solution. The performance of such algorithms often severely degrade when it comes to large and dense data. Massive dense datasets not only induce obligatory large number of arithmetics, but they also incur unwanted message passing cost across the processing nodes. Our key observation is that despite the seemingly dense structures, in many applications, data can be transformed into a new space where sparse structures become revealed. We propose a scalable data transformation scheme that enables creating versatile sparse representations of the data. The transformation can be tuned to benefit the underlying platforms cost and constraints. Our evaluations demonstrate significant improvement in energy usage, runtime, and mem

Explore More