Bita Darvish Rouhani
University of California, San Diego
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bita Darvish Rouhani.
field-programmable custom computing machines | 2015
Bita Darvish Rouhani; Ebrahim M. Songhori; Azalia Mirhoseini; Farinaz Koushanfar
This paper proposes SSketch, a novel automated computing framework for FPGA-based online analysis of big data with dense (non-sparse) correlation matrices. SSketch targets streaming applications where each data sample can be processed only once and storage is severely limited. The stream of input data is used by SSketch for adaptive learning and updating a corresponding ensemble of lower dimensional data structures, a.k.a., A sketch matrix. A new sketching methodology is introduced that tailors the problem of transforming the big data with dense correlations to an ensemble of lower dimensional subspaces such that it is suitable for hardware-based acceleration performed by reconfigurable hardware. The new method is scalable, while it significantly reduces costly memory interactions and enhances matrix computation performance by leveraging coarse-grained parallelism existing in the dataset. To facilitate automation, SSketch takes advantage of a HW/SW co-design approach: It provides an Application Programming Interface (API) that can be customized for rapid prototyping of an arbitrary matrix-based data analysis algorithm. Proof-of-concept evaluations on a variety of visual datasets with more than 11 million non-zeros demonstrates up to 200 folds speedup on our hardware-accelerated realization of SSketch compared to a software-based deployment on a general purpose processor.
international symposium on low power electronics and design | 2016
Bita Darvish Rouhani; Azalia Mirhoseini; Farinaz Koushanfar
Physical viability, in particular energy efficiency, is a key challenge in realizing the true potential of Deep Neural Networks (DNNs). In this paper, we aim to incorporate the energy dimension as a design parameter in the higher-level hierarchy of DNN training and execution to optimize for the energy resources and constraints. We use energy characterization to bound the network size in accordance to the pertinent physical resources. An automated customization methodology is proposed to adaptively conform the DNN configurations to the underlying hardware characteristics while minimally affecting the inference accuracy. The key to our approach is a new context and resource aware projection of data to a lower-dimensional embedding by which learning the correlation between data samples requires significantly smaller number of neurons. We leverage the performance gain achieved as a result of the data projection to enable the training of different DNN architectures which can be aggregated together to further boost the inference accuracy. Accompanying APIs are provided to facilitate rapid prototyping of an arbitrary DNN application customized to the underlying platform. Proof-of-concept evaluations for deployment of different visual, audio, and smart-sensing benchmarks demonstrate up to 100-fold energy improvement compared to the prior-art DL solutions.
design automation conference | 2016
Azalia Mirhoseini; Bita Darvish Rouhani; Ebrahim M. Songhori; Farinaz Koushanfar
We propose Perform-ML, the first Machine Learning (ML) framework for analysis of massive and dense data which customizes the algorithm to the underlying platform for the purpose of achieving optimized resource efficiency. Perform-ML creates a performance model quantifying the computational cost of iterative analysis algorithms on a pertinent platform in terms of FLOPs, communication, and memory, which characterize runtime, storage, and energy. The core of Perform-ML is a novel parametric data projection algorithm, called Elastic Dictionary (ExD), that enables versatile and sparse representations of the data which can help in minimizing performance cost. We show that Perform-ML can achieve the optimal performance objective, according to our cost model, by platform aware tuning of the ExD parameters. An accompanying API ensures automated applicability of Perform-ML to various algorithms, datasets, and platforms. Proof-of-concept evaluations of massive and dense data on different platforms demonstrate more than an order of magnitude improvements in performance compared to the state of the art, within guaranteed user-defined error bounds.
world conference on information systems and technologies | 2014
Babak Darvish Rouhani; Mohd Naz’ri Mahrin; Fatemeh Nikpay; Bita Darvish Rouhani
Enterprise Architecture (EA) becomes a strategy plan for enterprises to align their business and Information Technology (IT). EA is developed, managed, and maintained through EA implementation methodology (EAIM) processes. There are some problems in current EAIMs that lead to ineffectiveness implementation of EA. This paper represents current issues on EAIM. In this regard, we set the framework in order to represent EAIM’s issues based on the processes of EAIM including: modeling, developing, and maintaining. The results of this research not only increase the knowledge of EAIM, but also could be useful for both scientific and practitioner in order to realize the current situation of EAIMs.
design automation conference | 2018
Bita Darvish Rouhani; M. Sadegh Riazi; Farinaz Koushanfar
This paper presents DeepSecure, the an scalable and provably secure Deep Learning (DL) framework that is built upon automated design, efficient logic synthesis, and optimization methodologies. DeepSecure targets scenarios in which neither of the involved parties including the cloud servers that hold the DL model parameters or the delegating clients who own the data is willing to reveal their information. Our framework is the first to empower accurate and scalable DL analysis of data generated by distributed clients without sacrificing the security to maintain efficiency. The secure DL computation in DeepSecure is performed using Yaos Garbled Circuit (GC) protocol. We devise GC-optimized realization of various components used in DL. Our optimized implementation achieves up to 58-fold higher throughput per sample compared with the best prior solution. In addition to the optimized GC realization, we introduce a set of novel low-overhead pre-processing techniques which further reduce the GC overall runtime in the context of DL. Our extensive evaluations demonstrate up to two orders-of-magnitude additional runtime improvement achieved as a result of our pre-processing methodology.
design automation conference | 2017
Bita Darvish Rouhani; Azalia Mirhoseini; Farinaz Koushanfar
This paper proposes Deep3, an automated platform-aware Deep Learning (DL) framework that brings orders of magnitude performance improvement to DL training and execution. Deep3 is the first to simultaneously leverage three levels of parallelism for performing DL: data, network, and hardware. It uses platform profiling to abstract physical characterizations of the target platform. The core of Deep3 is a new extensible methodology that enables incorporation of platform characteristics into the higher-level data and neural network transformation. We provide accompanying libraries to ensure automated customization and adaptation to different datasets and platforms. Proof-of-concept evaluations demonstrate 10-100 fold physical performance improvement compared to the state-of-the-art DL frameworks, e.g., TensorFlow.
ACM Transactions on Reconfigurable Technology and Systems | 2016
Bita Darvish Rouhani; Azalia Mirhoseini; Ebrahim M. Songhori; Farinaz Koushanfar
We propose SSketch, a novel automated framework for efficient analysis of dynamic big data with dense (non-sparse) correlation matrices on reconfigurable platforms. SSketch targets streaming applications where each data sample can be processed only once and storage is severely limited. Our framework adaptively learns from the stream of input data and updates a corresponding ensemble of lower-dimensional data structures, a.k.a., a sketch matrix. A new sketching methodology is introduced that tailors the problem of transforming the big data with dense correlations to an ensemble of lower-dimensional subspaces such that it is suitable for hardware-based acceleration performed by reconfigurable hardware. The new method is scalable, while it significantly reduces costly memory interactions and enhances matrix computation performance by leveraging coarse-grained parallelism existing in the dataset. SSketch provides an automated optimization methodology for creating the most accurate data sketch for a given set of user-defined constraints, including runtime and power as well as platform constraints such as memory. To facilitate automation, SSketch takes advantage of a Hardware/Software (HW/SW) co-design approach: It provides an Application Programming Interface that can be customized for rapid prototyping of an arbitrary matrix-based data analysis algorithm. Proof-of-concept evaluations on a variety of visual datasets with more than 11 million non-zeros demonstrate up to a 200-fold speedup on our hardware-accelerated realization of SSketch compared to a software-based deployment on a general-purpose processor.
world conference on information systems and technologies | 2015
Babak Darvish Rouhani; Mohd Naz’ri Mahrin; Fatemeh Nikpay; Pourya Nikfard; Bita Darvish Rouhani
Enterprise Architecture (EA) is managed, developed, and maintained by Enterprise Architecture Implementation Methodology (EAIM). There is ineffectiveness in existing EAIMs due to the complexities; these complexities come from EAIM’s processes, models, methods, and strategy. Consequently, EA projects may be faced with lack of support in the following parts of EA: requirement analysis, governance and evaluation, a guideline for implementation, and continual improvement of EA implementation. This research aims to represent an Agent-Oriented based EAIM. The proposed EAIM was evaluated by means of a case study. The results show that proposed EAIM could directly affect the effectiveness of EA implementation in following items: reducing the mismatch between business and IT, defining reachable goals for enterprise, employing easy implementation practices and easy learning procedure, using efficient documentation, applying appropriate communication among project team member, providing an effective environment for alignment of business and IT, and using effective plan for governance and migration plan. This research extends the application of Agent Technology, which provides new area of research for academics and provides effective EAIM, which can be employed by practitioners in EA project.
measurement and modeling of computer systems | 2015
Azalia Mirhoseini; Ebrahim M. Songhori; Bita Darvish Rouhani; Farinaz Koushanfar
This paper proposes a domain-specific solution for iterative learning of big and dense (non-sparse) datasets. A large host of learning algorithms, including linear and regularized regression techniques, rely on iterative updates on the data connectivity matrix in order to converge to a solution. The performance of such algorithms often severely degrade when it comes to large and dense data. Massive dense datasets not only induce obligatory large number of arithmetics, but they also incur unwanted message passing cost across the processing nodes. Our key observation is that despite the seemingly dense structures, in many applications, data can be transformed into a new space where sparse structures become revealed. We propose a scalable data transformation scheme that enables creating versatile sparse representations of the data. The transformation can be tuned to benefit the underlying platforms cost and constraints. Our evaluations demonstrate significant improvement in energy usage, runtime, and mem
ACM Transactions in Embedded Computing Systems | 2017
Bita Darvish Rouhani; Azalia Mirhoseini; Farinaz Koushanfar
This paper proposes RISE, an automated Reconfigurable framework for real-time background subtraction applied to Intelligent video SurveillancE. RISE is devised with a new streaming-based methodology that adaptively learns/updates a corresponding dictionary matrix from background pixels as new video frames are captured over time. This dictionary is used to highlight the foreground information in each video frame. A key characteristic of RISE is that it adaptively adjusts its dictionary for diverse lighting conditions and varying camera distances by continuously updating the corresponding dictionary. We evaluate RISE on natural-scene vehicle images of different backgrounds and ambient illuminations. To facilitate automation, we provide an accompanying API that can be used to deploy RISE on FPGA-based system-on-chip platforms. We prototype RISE for end-to-end deployment of three widely-adopted image processing tasks used in intelligent transportation systems: License Plate Recognition (LPR), image denoising/reconstruction, and principal component analysis. Our evaluations demonstrate up to 87-fold higher throughput per energy unit compared to the prior-art software solution executed on ARM Cortex-A15 embedded platform.