Mikhail L. Zymbler
South Ural State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mikhail L. Zymbler.
database and expert systems applications | 2013
Constantin S. Pan; Mikhail L. Zymbler
The paper describes the design and the implementation of PargreSQL parallel database management system DBMS for cluster systems. PargreSQL is based on PostgreSQL open-source DBMS and exploits partitioned parallelism. Presented experimental results show that this scheme is worthy of further development.
advanced industrial conference on telecommunications | 2015
Timofey Rechkalov; Mikhail L. Zymbler
The Partition Around Medoids (PAM) is a variation of well known k-Means clustering algorithm where center of each cluster should be chosen as an object of clustered set of objects. PAM is used in a wide spectrum of applications, e.g. text analysis, bioinformatics, intelligent transportation systems, etc. There are approaches to speed up k-Means and PAM algorithms by means of graphic accelerators but there none for accelerators based on the Intel Many Integrated Core architecture. This paper presents a parallel version of PAM for the Intel Xeon Phi many-core coprocessor. Parallelization is based on the OpenMP technology. Loop operations are adapted to provide vectorization. Distance matrix is precomputed and stored in the coprocessors memory. Experimental results are presented and confirm the efficiency of the algorithm.
similarity search and applications | 2015
Aleksander Movchan; Mikhail L. Zymbler
Subsequence similarity search is one of the most important problems of time series data mining. Nowadays there is empirical evidence that Dynamic Time Warping DTW is the best distance metric for many applications. However in spite of sophisticated software speedup techniques DTW still computationally expensive. There are studies devoted to acceleration of the DTW computation by means of parallel hardware e.g. computer-cluster, multi-core, FPGA and GPU. In this paper we present an approach to acceleration of the subsequence similarity search based on DTW distance using the Intel Many Integrated Core architecture. The experimental evaluation on synthetic and real data sets confirms the efficiency of the approach.
advances in databases and information systems | 2015
Mikhail L. Zymbler
Subsequence similarity search is one of the basic problems of time series data mining. Nowadays Dynamic Time Warping (DTW) is considedered as the best similarity measure. However despite various existing software speedup techniques DTW is still computationally expensive. There are approaches to speed up DTW computation by means of parallel hardware (e.g. GPU and FPGA) but accelerators based on the Intel Many Integrated Core architecture have not been payed attention. The paper presents a parallel algorithm for best-match time series subsequence search based on DTW distance for the Intel Xeon Phi coprocessor. The experimental results on synthetic and real data sets confirm the efficiency of the algorithm.
advanced industrial conference on telecommunications | 2015
Maria Miniakhmetova; Mikhail L. Zymbler
Video summary is a sequence of still or moving pictures that represents the content of a video. Personalized summary provides a person with brief information reflecting essential message of the video according to his/her interests. Existing methods of discovering users personal interests often demands from the user either extra efforts or extra equipment, e.g. manually setting up relative preferences or camera to capture of eyes movement. The paper presents an approach to constructing personalized video summary utilizing users “like/neutral/dislike” estimations of videos watched beforehand. Summary is built as a sequence of scenes extracted from the video, which are most influencing the user. Most influencing scene contain a set of objects detected on video, which are in range of users interest. Formal definitions of most influencing scene and range of interest are given and mathematical model of constructing personalized video summary is described.
Programming and Computer Software | 2015
Constantin S. Pan; Mikhail L. Zymbler
This paper presents an original approach to parallel processing of very large databases by means of encapsulation of partitioned parallelism into open-source database management systems (DBMSs). The architecture and methods for implementing a parallel DBMS through encapsulation of partitioned parallelism into PostgreSQL DBMS are described. Experimental results that confirm the effectiveness of the proposed approach are presented.
advances in databases and information systems | 2013
Constantin S. Pan; Mikhail L. Zymbler
The paper introduces an approach to partitioning of very large graphs by means of parallel relational database management system DBMS named PargreSQL. Very large graph and its intermediate data that does not fit into main memory are represented as relational tables and processed by parallel DBMS. Multilevel partitioning is used. Parallel DBMS carries out coarsening to reduce graph size. Then an initial partitioning is performed by some third-party main-memory tool. After that parallel DBMS is used again to provide uncoarsening. The PargreSQLs architecture is described in brief. The PargreSQL is developed by authors by means of embedding parallelism into PostgreSQL open-source DBMS. Experimental results are presented and show that our approach works with a very good time and speedup at an acceptable quality loss.
International Conference on Parallel Computational Technologies | 2018
Timofey Rechkalov; Mikhail L. Zymbler
Computation of a Euclidean distance matrix (EDM) is a typical task in a wide spectrum of problems connected with data analysis. Currently, many parallel algorithms for this task have been developed for GPUs. However, these developments cannot be directly applied to the Intel Xeon Phi many-core processor. In this paper, we address the task of accelerating EDM computation on Intel Xeon Phi in the case when the input data fit into the main memory. We present a parallel algorithm based on a novel block-oriented scheme of computations that allows for the efficient utilization of Intel Xeon Phi vectorization abilities. Experimental evaluation of the algorithm on real-world and synthetic datasets shows that it is highly scalable and outruns analogues in the case of rectangular matrices with low-dimensional data points.
international convention on information and communication technology electronics and microelectronics | 2017
Mikhail L. Zymbler
The paper presents a parallel implementation of a Dynamic Itemset Counting (DIC) algorithm for many-core systems, where DIC is a variation of the classical Apriori algorithm.We propose a bit-based internal layout for transactions and itemsets with the assumption that such a representation of the transaction database fits in main memory. This technique reduces the memory space for storing the transaction database and also simplifies support counting and candidate itemsets generation via logical bitwise operations. Implementation uses OpenMP technology and thread-level parallelism. Experimental evaluation on the platforms of Intel Xeon CPU and Intel Xeon Phi coprocessor with large synthetic database showed good performance and scalability of the proposed algorithm.
International Conference on Data Analytics and Management in Data Intensive Domains | 2017
Timofey Rechkalov; Mikhail L. Zymbler
Relational DBMSs (RDBMSs) remain the most popular tool for processing structured data in data intensive domains. However, most of stand-alone data mining packages process flat files outside a RDBMS. In-database data mining avoids export-import data/results bottleneck as opposed to use stand-alone mining packages and keeps all the benefits provided by a RDBMS. The paper presents an approach to data mining inside a RDBMS based on a parallel implementation of user-defined functions (UDFs). Such an approach is implemented for PostgreSQL and modern Intel MIC (Many Integrated Core) architecture. The UDF performs a single mining task on data from the specified table and produces a resulting table. The UDF is organized as a wrapper of an appropriate mining algorithm, which is implemented in C language and is parallelized by the OpenMP technology and thread-level parallelism. The heavy-weight parts of the algorithm are additionally parallelized by intrinsic functions for MIC platforms to reach the optimal loop vectorization manually. The library of such UDFs supports a cache of precomputed mining structures to reduce costs of further computations. In the experiments, the proposed approach shows good scalability and overtakes R data mining package.