Ekow J. Otoo
University of the Witwatersrand
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ekow J. Otoo.
parallel computing | 2013
Ekow J. Otoo; Gideon Nimako; Daniel Ohene-Kwofie
Several meetings of the Extremely Large Databases Community for large scale scientific applications advocate the use of multidimensional arrays as the appropriate model for representing scientific databases. Scientific databases gradually grow to massive sizes of the order of terabytes and petabytes. As such, the storage of such databases require efficient dynamic storage schemes where the array is allowed to arbitrarily extend the bounds of the dimensions. Conventional multidimensional array representations cannot extend or shrink their bounds without relocating elements of the data-set. In general, extendibility of the bounds of the dimensions, is limited to only one dimension. This paper presents a technique for storing dense multidimensional arrays by chunks such that the array can be extended along any dimension without compromising the access time for an element. This is done with a computed access mapping function, that maps the k-dimensional index onto a linear index of the storage locations. This concept forms the basis for the implementation of an array file of any number of dimensions, where the bounds of the array dimension can be extended arbitrarily. Such a feature currently exists in the Hierarchical Data Format version 5 (HDF5). However, extending the bound of a dimension in the HDF5 array file can be unusually expensive in time. Such extensions, in our storage scheme for dense array files, can still be performed while still accessing elements of the array at orders of magnitude faster than in HDF5 or conventional arrays-files. We also present theoretical and experimental analysis of our scheme with respect to access time and storage overhead. Such mapping scheme can be readily integrated into existing PGAS models for parallel processing in a cluster networked computing environment.
statistical and scientific database management | 2014
Ekow J. Otoo; Hairong Wang; Gideon Nimako
In this paper, we introduce some storage schemes for multi-dimensional sparse arrays (MDSAs) that handle the sparsity of the array with two primary goals; reducing the storage overhead and maintaining efficient data element access. Four schemes are proposed. These are: i.) The PATRICIA trie compressed storage method (PTCS) which uses PATRICIA trie to store the valid non-zero array elements; ii.)The extended compressed row storage (xCRS) which extends CRS method for sparse matrix storage to sparse arrays of higher dimensions and achieves the best data element access efficiency of all the methods; iii.) The bit encoded xCRS (BxCRS) which optimizes the storage utilization of xCRS by applying data compression methods with run length encoding, while maintaining its data access efficiency; and iv.) a hybrid approach that provides a desired balance between the storage utilization and data manipulation efficiency by combining xCRS and the Bit Encoded Sparse Storage (BESS). These storage schemes were evaluated and compared on three basic array operations; constructing the storage scheme, accessing a random element and retrieving a sub-array, using a set of synthetic sparse multi-dimensional arrays.
international conference on algorithms and architectures for parallel processing | 2012
Gideon Nimako; Ekow J. Otoo; Daniel Ohene-Kwofie
The current trend of multicore and Symmetric Multi-Processor (SMP), architectures underscores the need for parallelism in most scientific computations. Matrix-matrix multiplication is one of the fundamental computations in many algorithms for scientific and numerical analysis. Although a number of different algorithms (such as Cannon, PUMMA, SUMMA etc), have been proposed for the implementation of matrix-matrix multiplication on distributed memory architectures, matrix-matrix algorithms for multicore and SMP architectures have not been extensively studied. We present two types of algorithms, based largely on blocked dense matrices, for parallel matrix-matrix multiplication on shared memory systems. The first algorithm is based on blocked matrices whiles the second algorithm uses blocked matrices with the MapReduce framework in shared memory. Our experimental results show that, our blocked dense matrix approach outperforms the known existing implementations by up to 50% whiles our MapReduce blocked matrix-matrix algorithm outperforms the existing matrix-matrix multiplication algorithm of the Phoenix shared memory MapReduce approach, by about 40%.
computational science and engineering | 2012
Daniel Ohene-Kwofie; Ekow J. Otoo; Gideon Nimako
Modern computer architecture with 64-bit addressing has now become commonplace. The consequence is that sufficiently large databases can be maintained, during a usage session as main memory resident database. Such in memory resident databases still require the use of an index for fast access to the data items. A more common architecture is to maintain an in-memory index to data records that are grouped into data blocks. The data items are then accessed through a large in-memory cache pool. The T-Tree and its variants were developed as the appropriate in-memory index structure but recently a number of such index-driven databases have emerged under the banner of NoSQL databases. We propose the O2-Tree as an alternative to the T-Tree that is usable as an index for in-memory database that out-performs a number of NoSQL databases. The O2-Tree is essentially a Red-Black Binary-Search tree in which the leaf nodes are index blocks that store multiple records of key-value and record-pointer pairs. The internal nodes contain copies of the keys that split blocks of the leaf nodes in a manner similar to the B+-Tree. The new internal nodes are formed into a Red-Black-Tree structure. It has the advantage that it can be easily reconstructed by reading only the lowest key values of each leaf node block. The main contributions are in the development and implementation of the O2-Tree; its performance comparison with the T-Tree, AVL-Tree, B+-Tree, Top-Down implementation of the Red-Black Binary-Search tree; and also with NoSQL databases such as Kyoto Cabinet, LevelDB and in-memory BerkeleyDB.
international conference on parallel processing | 2012
Gideon Nimako; Ekow J. Otoo; Daniel Ohene-Kwofie
Several meetings of the Extremely Large Databases Community for large scale scientific applications, advocate the use of multidimensional arrays as the appropriate model for representing scientific databases. Scientific databases gradually grow to massive sizes of the order of terabytes and petabytes. As such, the storage of such databases require efficient dynamic storage schemes where the array is allowed to arbitrary extend the bounds of the dimensions. Conventional multidimensional array representations cannot extend or shrink their bounds without relocating elements of the dataset. In general, extendibility of the bounds of the dimensions, is limited to only one dimension. This paper presents a technique for storing dense multidimensional arrays by chunks such that the array can be extended along any dimension without compromising the access time for an element. This is done with a computed access mapping function, that maps the kdimensional index onto a linear index of the storage locations. This concept forms the basis for the implementation of an array file of any number of dimensions, where the bounds of the array can be extended arbitrarily. Such a feature currently exists in the Hierarchical Data Format version 5 (HDF5). However, extending the bound of a dimension in the HDF5 array file can be unusually expensive in time. Such extensions in our storage scheme for dense array files can still be performed while still accessing elements of the array at orders of magnitude faster than in HDF5 or conventional arrays-files.
ieee international conference on high performance computing data and analytics | 2012
Ekow J. Otoo; Gideon Nimako; Daniel Ohene-Kwofie
The data model found to be most appropriate for scientific databases is the array-oriented data model. This also forms the basis of storing and accessing the database onto and from physical storage. Such storage systems are exemplified by the hierarchical data format(HDF/HDF5), the network common data format (NetCDF) and recently the SciDB. Given that the array is mapped onto linear locations in a file, i.e., a representation of an array file, in either row-major or column-major order, a fundamental feature of the representation is that they should be allowed to grow to massively large sizes by gradual expansions of the array bounds. In both the row-major and column-major order of array elements, extendibility is allowed in one dimension only. We present an approach of storing multi-dimensional dense array on physical storage devices, that allows arbitrary extensions of any of the array bounds, without reorganising previously allocated array elements. For a k-dimensional N-element array, the organisation allows an element to be accessed in time O(k +log N) using O(k2N1/k) additional space. By chunking the array to size Noc chunks, the time and space requirements reduce to O(k + log Noc) and O(k2Nc1/k ).
international conference on cluster computing | 2013
Gideon Nimako; Ekow J. Otoo; Daniel Ohene-Kwofie
Over the past decade, I/O is has been a limiting factor for extreme scale parallel computing even though there has been substantial growth in the amount of data produced by parallel scientific applications. The datasets usually grow incrementally to massive sizes of the order of terabytes and petabytes. As such, the storage of such datasets, typically modelled as multidimensional arrays, requires efficient dynamic storage schemes where the array is allowed to arbitrary extend the bounds of the dimensions. This paper introduces PEXTA, a new parallel I/O model for the Global Array Toolkit. PEXTA provides the necessary APIs for explicit transfer between the memory resident global array and its secondary storage counterpart but also allows the persistent array to be extended on any dimension without compromising on the access time of an element or sub-array elements. Such a feature currently exists in the Hierarchical Data Format version 5 (HDF5) and parallel HDF5. However, extending the bound of a dimension in the HDF5 array file can be unusually expensive in time. Extensions, in our storage scheme for parallel dense array files, can still be performed while still accessing elements of the array much faster than parallel HDF5. We illustrate the PEXTA APIs with three applications; an out-of-core matrix-matrix multiplication, lattice Boltzmann simulation and Molecular Dynamics of Lennard Jones System.
south african institute of computer scientists and information technologists | 2012
Gideon Nimako; Ekow J. Otoo; Daniel Ohene-Kwofie
Parallelism in linear algebra libraries is a common approach to accelerate numerical and scientific applications. Matrix-matrix multiplication is one of the most widely used computations in scientific and numerical algorithms. Although a number of matrix multiplication algorithms exist for distributed memory environments (e.g., Cannon, Fox, PUMMA, SUMMA), matrix-matrix multiplication algorithms for shared memory and SMP architectures have not been extensively studied. In this paper, we present a fast matrix-matrix multiplication algorithm for multi-core and SMP architectures using the MapReduce framework. Memory-resident linear algebra algorithms suffer performance losses on modern multi-core architectures because of the increasing performance gap between the CPU and main memory. To allow such compute-intensive algorithms to exploit the full potential of the programs inherent instruction level parallelism, the adverse effect of the processor-memory performance gap should be minimized. We present a cache-sensitive MapReduce matrix multiplication algorithm that fully exploits memory bandwidth and minimize cache misses and conflicts. Our experimental results show that the two algorithms outperform existing matrix multiplication algorithms for shared-memory architectures such as those given in the Phoenix, PLASMA and LAPACK libraries.
high performance computing and communications | 2016
Ekow J. Otoo; Hairong Wang; Gideon Nimako
A relational table over a set of attributes can be mapped onto a multi-dimensional array and stored as such. Such a conceptual view of relations lends itself to easy formulations of numerous analytical algorithms. This is the view taken in the representation of relations in data-warehousing to support On-Line Analytical Processing (OLAP). The main drawback of such a storage scheme is that the equivalent array is typically a highly sparse multi-dimensional array with dominating null entries, and requires a storage scheme with high compression scheme that retains the significant non-null elements. We introduce, analyse and compare the performances of some storage schemes for Multi-Dimensional Sparse Arrays (MDSAs). We first introduce a previously known method called Bit Encoded Sparse Storage (BESS) and then introduce four new storage schemes namely, Patricia trie compressed storage (PTCS), extended compressed row storage (xCRS), bit encoded compressed row storage (BxCRS) and a hybrid storage scheme (Hybrid), that combines the two methods of BESS and xCRS. The performances of these storage schemes are compared with respect to their compression ratios and computational efficiency for accessing an element, retrieving sub-array elements and computing aggregate functions and other analytic functions. We focus primarily on the aggregate function of summation of sub-array elements in this paper. The results show that xCRS, BxCRS, Hybrid and BESS can achieve compression ratios of less than 40% for MDSAs with more than 80% sparsity. The BESS storage scheme gives the best performance in computing multi-dimensional aggregates, for varying sparsity and dimensionality, compared with the other schemes. The key virtue of PTCS is that it is the only scheme that allows for insertions and deletions without reorganising the entire storage previously allocated.
Journal of Physics: Conference Series | 2015
Daniel Ohene-Kwofie; Ekow J. Otoo
The ATLAS detector, operated at the Large Hadron Collider (LHC) records proton-proton collisions at CERN every 50ns resulting in a sustained data flow up to PB/s. The upgraded Tile Calorimeter of the ATLAS experiment will sustain about 5PB/s of digital throughput. These massive data rates require extremely fast data capture and processing. Although there has been a steady increase in the processing speed of CPU/GPGPU assembled for high performance computing, the rate of data input and output, even under parallel I/O, has not kept up with the general increase in computing speeds. The problem then is whether one can implement an I/O subsystem infrastructure capable of meeting the computational speeds of the advanced computing systems at the petascale and exascale level.We propose a system architecture that leverages the Partitioned Global Address Space (PGAS) model of computing to maintain an in-memory data-store for the Processing Unit (PU) of the upgraded electronics of the Tile Calorimeter which is proposed to be used as a high throughput general purpose co-processor to the sROD of the upgraded Tile Calorimeter. The physical memory of the PUs are aggregated into a large global logical address space using RDMA- capable interconnects such as PCI- Express to enhance data processing throughput.