Suejb Memeti | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Suejb Memeti is active.

Explore More

Publication

Featured researches published by Suejb Memeti.

computational science and engineering | 2014

PaREM: A Novel Approach for Parallel Regular Expression Matching

Suejb Memeti; Sabri Pllana

Regular expression matching is essential for many applications, such as finding patterns in text, exploring substrings in large DNA sequences, or lexical analysis. However, sequential regular expression matching may be time-prohibitive for large problem sizes. In this paper, we describe a novel algorithm for parallel regular expression matching via deterministic finite automata. Furthermore, we present our tool PaREM that accepts regular expressions and finite automata as input and automatically generates the corresponding code for our algorithm that is amenable for parallel execution on shared-memory systems. We evaluate our parallel algorithm empirically by comparing it with a commonly used algorithm for sequential regular expression matching. Experiments on a dual-socket shared-memory system with 24 physical cores show speed-ups of up to 21× for 48 threads.

International Journal of High Performance Computing Applications | 2018

A machine learning approach for accelerating DNA sequence analysis

Suejb Memeti; Sabri Pllana

The DNA sequence analysis is a data and computationally intensive problem and therefore demands suitable parallel computing resources and algorithms. In this paper, we describe an optimized approach for DNA sequence analysis on a heterogeneous platform that is accelerated with the Intel Xeon Phi. Such platforms commonly comprise one or two general purpose host central processing units (CPUs) and one or more Xeon Phi devices. We present a parallel algorithm that shares the work of DNA sequence analysis between the host CPUs and the Xeon Phi device to reduce the overall analysis time. For automatic worksharing we use a supervised machine learning approach, which predicts the performance of DNA sequence analysis on the host and device and accordingly maps fractions of the DNA sequence to the host and device. We evaluate our approach empirically using real-world DNA segments for human and various animals on a heterogeneous platform that comprises two 12-core Intel Xeon E5 CPUs and an Intel Xeon Phi 7120P device with 61 cores.

ieee international conference on cloud computing technology and science | 2017

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption

Suejb Memeti; Lu Li; Sabri Pllana; Joanna Kolodziej; Christoph W. Kessler

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. However, exploiting the available performance of heterogeneous architectures may be challenging. There are various parallel programming frameworks (such as, OpenMP, OpenCL, OpenACC, CUDA) and selecting the one that is suitable for a target context is not straightforward. In this paper, we study empirically the characteristics of OpenMP, OpenACC, OpenCL, and CUDA with respect to programming productivity, performance, and energy. To evaluate the programming productivity we use our homegrown tool CodeStat, which enables us to determine the percentage of code lines required to parallelize the code using a specific framework. We use our tools MeterPU and x-MeterPU to evaluate the energy consumption and the performance. Experiments are conducted using the industry-standard SPEC benchmark suite and the Rodinia benchmark suite for accelerated computing on heterogeneous systems that combine Intel Xeon E5 Processors with a GPU accelerator or an Intel Xeon Phi co-processor.

computational science and engineering | 2015

Analyzing Large-Scale DNA Sequences on Multi-core Architectures

Suejb Memeti; Sabri Pllana

Rapid analysis of DNA sequences is important in preventing the evolution of different viruses and bacteria during an early phase, early diagnosis of genetic predispositions to certain diseases (cancer, cardiovascular diseases), and in DNA forensics. However, real-world DNA sequences may comprise several Gigabytes and the process of DNA analysis demands adequate computational resources to be completed within a reasonable time. In this paper we present a scalable approach for parallel DNA analysis that is based on Finite Automata, and which is suitable for analysing very large DNA segments. We evaluate our approach for real-world DNA segments of mouse (2.7GB), cat (2.4GB), dog (2.4GB), chicken (1GB), human (3.2GB) and turkey (0.2GB). Experimental results on a dual-socket shared-memory system with 24 physical cores show speedups of up to 17.6x. Our approach is up to 3x faster than a pattern-based parallel approach that uses the RE2 library.

Concurrency and Computation: Practice and Experience | 2017

Combinatorial optimization of DNA sequence analysis on heterogeneous systems.

Suejb Memeti; Sabri Pllana

Analysis of DNA sequences is a data and computational intensive problem, and therefore, it requires suitable parallel computing resources and algorithms. In this paper, we describe our parallel algorithm for DNA sequence analysis that determines how many times a pattern appears in the DNA sequence. The algorithm is engineered for heterogeneous platforms that comprise a host with multi‐core processors and one or more many‐core devices. For combinatorial optimization, we use the simulated annealing algorithm. The optimization goal is to determine the number of threads, thread affinities, and DNA sequence fractions for host and device, such that the overall execution time of DNA sequence analysis is minimized. We evaluate our approach experimentally using real‐world DNA sequences of various organisms on a heterogeneous platform that comprises two Intel Xeon E5 processors and an Intel Xeon Phi 7120P co‐processing device. By running only about 5% of possible experiments, our optimization method finds a near‐optimal system configuration for DNA sequence analysis that yields with average speedup of 1.6 × and 2 × compared with the host‐only and device‐only execution. Copyright

Resource Management for Big Data Platforms | 2016

Optimal Worksharing of DNA Sequence Analysis on Accelerated Platforms

Suejb Memeti; Sabri Pllana; Joanna Kolodziej

In this chapter, we describe an optimized approach for DNA sequence analysis on a heterogeneous platform that is accelerated with the Intel Xeon Phi. Such platforms commonly comprise one or two gen ...

international conference on parallel processing | 2016

Combinatorial Optimization of Work Distribution on Heterogeneous Systems

Suejb Memeti; Sabri Pllana

We describe an approach that uses combinatorial optimization and machine learning to share the work between the host and device of heterogeneous computing systems such that the overall application execution time is minimized. We propose to use combinatorial optimization to search for the optimal system configuration in the given parameter space (such as, the number of threads, thread affinity, work distribution for the host and device). For each system configuration that is suggested by combinatorial optimization, we use machine learning for evaluation of the system performance. We evaluate our approach experimentally using a heterogeneous platform that comprises two 12-core Intel Xeon E5 CPUs and an Intel Xeon Phi 7120P co-processor with 61 cores. Using our approach we are able to find a near-optimal system configuration by performing only about 5% of all possible experiments.

international conference on conceptual structures | 2017

Using Cognitive Computing for Learning Parallel Programming : An IBM Watson Solution

Adridan Calvo Chozas; Suejb Memeti; Sabri Pllana

While modern parallel computing systems provide high performance resources, utilizing them to the highest extent requires advanced programming expertise. Programming for parallel computing systems ...

Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications | 2018

A Review of Machine Learning and Meta-heuristic Methods for Scheduling Parallel Computing Systems

Suejb Memeti; Sabri Pllana; Alécio Pedro Delazari Binotto; Joanna Kolodziej; Ivona Brandic

Optimized software execution on parallel computing systems demands consideration of many parameters at run-time. Determining the optimal set of parameters in a given execution context is a complex task, and therefore to address this issue researchers have proposed different approaches that use heuristic search or machine learning. In this paper, we undertake a systematic literature review to aggregate, analyze and classify the existing software optimization methods for parallel computing systems. We review approaches that use machine learning or meta-heuristics for scheduling parallel computing systems. Additionally, we discuss challenges and future research directions. The results of this study may help to better understand the state-of-the-art techniques that use machine learning and meta-heuristics to deal with the complexity of scheduling parallel computing systems. Furthermore, it may aid in understanding the limitations of existing approaches and identification of areas for improvement.

Journal of Computational Science | 2018

PAPA: A parallel programming assistant powered by IBM Watson cognitive computing technology

Suejb Memeti; Sabri Pllana

Abstract The efficient utilization of the available resources in modern parallel computing systems requires advanced parallel programming expertise. However, parallel programming is more difficult than sequential programming. To alleviate the difficulties of parallel programming, high-level programming frameworks, such as OpenMP, have been proposed. Yet, there is evidence that novice parallel programmers make common mistakes that may lead to performance degradation or unexpected program behavior. In this paper, we present our cognitive parallel programming assistant (PAPA) that aims at educating and assisting novice parallel programmers to avoid common OpenMP mistakes. PAPA combines different IBM Watson services to provide a dialog-based interaction (through text and voice) for programmers. We use the Watson Conversation service to implement the dialog-based interaction, and the Speech-to-Text and Text-to-Speech services to enable the voice interaction. The Watson Natural Language Understanding and WordsAPI Synonyms services are used to train PAPA with OpenMP-related publications. We evaluate our approach using a user experience questionnaire with a number of novice parallel programmers at Linnaeus University.

Explore More