Mirto Musci | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mirto Musci is active.

Explore More

Publication

Featured researches published by Mirto Musci.

international conference on image analysis and processing | 2013

CCMS: A Greedy Approach to Motif Extraction

Giacomo Drago; Marco Ferretti; Mirto Musci

Efficient and precise motif extraction is a central problem in the study of proteins functions and structures. This paper presents an efficient new geometric approach to the problem, based on the General Hough Transform. The approach is both an extension and a variation of the Secondary Structure Co-Occurrences algorithm by Cantoni et al. [1-2]. The goal is to provide an effective and efficient implementation, suitable for HPC. The most significant contribution of this paper is the introduction of a heuristic greedy variant of the algorithm, which is able to reduce computational time by two orders of magnitude. A secondary effect of the new version is the capability to cope with uncertainty in the geometric description of the secondary structures.

Proceedings of the 20th European MPI Users' Group Meeting on | 2013

Entire motifs search of secondary structures in proteins: a parallelization study

Marco Ferretti; Mirto Musci

The study of proteins is a crucial aspect of the biological field, as they are essential elements of every living cell. Often, the function of a given protein is tightly tied to its geometric structure. Thus, protein structure analysis is an important issue with multiple applications. For example, the ability of a protein to bind other proteins or ligands and the estimation of evolutionary distances between families of proteins are both based on their spatial structure. A key part in the geometric description of a protein is played by the structural motif, a 3D element which appears in a variety of molecules and it is usually made of just a few structures. In this paper we will consider only motifs of secondary structures (SS in the following). The goal is to implement a parallel method to detect the presence and location of all motifs of SSs in a given protein or in a set of proteins. The paper discusses all possible forms of available parallelism, both shared memory and message passing. The analysis is based on existing approaches, such as Secondary Structure Co-Occurrences (SSC) [4, 5] and the Secondary Structure Triplets (SST) [6], both based on the General Hough Transform (GHT) technique [1]. The key idea is to ignore the biological significance on the motifs as much as possible, and to focus on the geometric description of the structures which could be simply viewed as vectors in a 3D space. The paper analyzes a parallel implementation, based on OpenMP. It also hints to a possible exploitation of a hybrid MPI/OpenMP paradigm, useful in the special case of cross protein analysis.

parallel computing | 2015

Geometrical motifs search in proteins

Marco Ferretti; Mirto Musci

We present CMS, an algorithm used to search for geometric motifs in proteins.Complete cross-proteins analysis calls for parallel processing.Data parallel problem decomposition favors a shared-memory implementation.OpenMP implementation meets expectations, but scales only up to 8 threads.Hybrid OpenMP/MPI approach required for further analysis. The analysis of the 3D structures of proteins is a very important problem in life sciences, since the geometric set-up of proteins has a deep relevance in many biological processes. The complexity of the analysis and the continuous increase in the number of proteins whose 3D structure is known, call for efficient and quick algorithms. Parallel processing is becoming an enabling tool for such research. A key component in the geometric description of a protein is the structural motif, a 3D element which appears in a variety of molecules and is usually made of just a few simpler structures, the secondary structures elements (SSEs).This paper is an extended version of Ferretti and Musci (2013), and presents the Cross Motif Search (CMS) and the Complete CMS (CCMS) algorithms, two highly optimized and efficient parallel methods to detect the presence and location of all common motifs of secondary structures in a given protein pair (CMS) or across an arbitrary large dataset of proteins (CCMS). The analysis builds on existing approaches, such as Secondary Structure Co-Occurrences (SSC), based on the General Hough Transform (GHT) technique. The main difference between our proposal and the state of the art is the innovative focus that CMS puts on the geometric description of the structural motifs, which could be simply viewed as vectors in a 3D space, rather than on the topological/biological description employed by competing algorithms, such as ProSMoS, PROMOTIF or MASS. The advantage of a geometrical approach is that it enables to retrieve the exact location of the common substructures in a protein pair.The paper analyzes all possible forms of serial and parallelism optimization of the proposed algorithms, both shared memory and message passing. It introduces a complete parallel implementation of CMS, based on OpenMP, and discusses its scalability on shared-memory architectures. Both small-scale and medium-scale testing shows that the methods produces very interesting results in real applications, and scales nicely up to the eight-processor limit. More in-depth testing also shows that the scalability limit is due to the inner structure of the problem, and that the similarities among proteins and the chosen tolerance for the analysis highly affect the overall performance.

Concurrency and Computation: Practice and Experience | 2015

MPI-CMS: a hybrid parallel approach to geometrical motif search in proteins

Marco Ferretti; Mirto Musci; Luigi Santangelo

This paper describes the message passing parallel implementation of the Cross Motif Search algorithm (MPI‐CMS). It is an extension and specifically improves on the results obtained in a conference paper presented at PBIO 2014. CMS is a bioinformatics algorithm whose goal is to search for geometrical motifs in proteins. For the purpose of a complete characterization of protein similarities, it would be important to run CMS on the largest possible dataset. Unfortunately, due to its precision, CMS is inherently slow; thus, it was originally implemented using a shared memory parallel paradigm. In the original conference paper, we proved that the OpenMP implementation of Cross Motif Search (MP‐CMS) is extremely inefficient and cannot scale adequately. To solve the problem, we designed a new parallel implementation of CMS (MPI‐CMS) based on a hybrid shared memory and message passing paradigm. This paper reconsiders MPI‐CMS with the target to port it on a supercomputing machine. The focus is on the dependence of performance in the hybrid approach on the workload unbalance. Using a simple statistical analysis of the workload we discuss several strategies through which we can improve the design of MPI‐CMS. We conclude the paper describing a revised implementation of MPI‐CMS, which takes into account the size of the protein pairs to fine‐tune the parallelization strategy. Copyright

international conference on cluster computing | 2014

A hybrid OpenMP and OpenMPI approach to geometrical motif search in proteins.

Marco Ferretti; Mirto Musci; Luigi Santangelo

The retrieval and identification of geometrical motifs is an important open problem in bioinformatics. In previous works we presented Cross Motif Search (CMS), a novel algorithm which is able to search for recurring geometrical patterns in the secondary structure of proteins. A single run of CMS is able to look for similarities between a pair of proteins, and can be easily extended to compare each pair of proteins in an arbitrarily large dataset. We have implemented a shared memory parallel version of CMS and analyzed its scalability, which is limited to 8 cores. So, when the number of proteins in the set increases, the execution time of the algorithm quickly becomes unmanageable and the OpenMP implementation cannot keep up by just increasing the number of cores. In this paper we present a new hybrid parallel implementation of CMS, which combines the previous OpenMP approach with OpenMPI. Experimental runs on the same small-sized server (32 cores) show that the best hybrid OpenMP-OpenMPI configuration outperforms the best OpenMP one by a factor of 13.52. This result is confirmed on a medium-sized cluster with 256 cores, that allows the processing a larger data set in reasonable times. We also show that the new design is able to achieve great efficiency and scalability, which allows us to process huge data-set of proteins up to, in theory, the entire Protein Data Bank.

database and expert systems applications | 2014

Protein Motif Retrieval by Secondary Structure Element Geometry and Biological Features Saliency

Virginio Cantoni; Marco Ferretti; Nicola Pellicano; Jennifer Vandoni; Mirto Musci; Nahumi Nugrahaningsih

This paper presents an approach to detect the presence of a given motif in proteins or in protein data bank (PDB). The approach is based on the secondary structure elements (SSEs) geometrical arrangement in 3D space. A motif is represented as a set of SSEs in their specific positions related to a local reference system (LRS). We propose, exploiting the SSE biological feature saliency in the motif LRS construction stage, a planning strategy to speed-up the motif retrieval process. The experimentation has been carried out on a set of 20 proteins selected from the PDB. In detail we tested five different cases: (i) performances on searching a motif within single proteins, (ii) searching motifs on a set of proteins belonging to the same biological family, (iii) searching into single symmetric proteins, (iv) searching on a set of symmetric proteins from the same family, and finally (v) a general motif retrieval from the entire protein dataset. The experimental results showed good motif recognition performances on each test category, and, by exploiting the basic biological features saliency in motif construction, comparing to a previous approach of SSEs block geometrical retrieval based on the Generalized Hough Transform, it was revealed a significant decrease of the time/space computational complexity. It is worth to point out that the computation time for the case of motif absence is significantly lower than the case of motif present.

database and expert systems applications | 2016

MotifVisualizer: An Interdisciplinary GUI for Geometrical Motif Retrieval in Proteins

Teo Argentier; Virginio Cantoni; Mirto Musci

In previous works, we presented Cross Motif Search (CMS), an algorithm designed to explore new techniques in the field of protein motif retrieval and identification. The novelty of the CMS approach is to look for geometrical similarities in the secondary structures of proteins, instead of homologous topology. Put in other words, while the connections among different secondary structures are still considered, CMS puts the emphasis in the overall 3D shape of candidate motifs and identifies them using computer vision techniques, such as a modified Generalized Hough Transform (GHT). This approach, even if slower than classic approaches such as PROMOTIF or ProSMoS, and less resistant to deformations, has two distinct advantages: first of all is able to conduct a precise detection, that is, is able to exactly identify the positions of each candidate, most importantly, is also able to identify similar geometrical structure even in unfamiliar proteins (i.e. proteins belonging to different families in the SCOP database) and similarities which may elude topological algorithms. Unfortunately, while we were able to accumulate a huge database of results, we are still working on classifying the matches according to their significance. Indeed, one aspect which is often overlooked by computer scientists working in this field is the cross-cooperation with biologists, which usually do not speak the same common language of engineers, and vice versa. In this paper, we describe the MotifVisualizer application and the algorithm design of CMS. MotifVisualizer has been conceived to be an easy-to-use graphical user interface to the CMS algorithm, and through it we hope to close the gap between engineers and biologists, which may help us to showcase the significance of the CMS matches.

database and expert systems applications | 2017

Extending Cross Motif Search with Heuristic Data Mining

Teo Argentieri; Virginio Cantoni; Mirto Musci

In previous works we have presented Cross Motif Search (CMS), a MP/MPI parallel tool for geometrical motif extraction in the secondary structure of proteins. We proved that our algorithm is capable of retrieving previously unknown motifs, thanks to its innovative approach based on the generalized Hough transform. We have also presented a GUI to CMS, called MotifVisualizer, which was introduced to improve software usability and to encourage collaboration with the biology community. In this paper we address the main shortcoming of CMS: with a simple approach based on heuristic data mining we show how we can classify the candidate motifs according to their statistical significance in the data set. We also present two extensions to MotifVisualizer, one to include the new data mining functions in the GUI, and a second one to allow for an easier retrieval of testing data sets.

database and expert systems applications | 2018

Mining Geometrical Motifs Co-occurrences in the CMS Dataset.

Mirto Musci; Marco Ferretti

Precise and efficient retrieval of structural motifs is a task of great interest in proteomics. Geometrical approaches to motif identification allow the retrieval of unknown motifs in unfamiliar proteins that may be missed by widespread topological algorithms. In particular, the Cross Motif Search (CMS) algorithm analyzes pairs of proteins and retrieves every group of secondary structure elements that is similar between the two proteins. These similarities are candidate to be structural motifs. When extended to large datasets, the exhaustive approach of CMS generates a huge volume of data. Mining the output of CMS means identifying the most significant candidate motifs proposed by the algorithm, in order to determine their biological significance. In the literature, effective data mining on a CMS dataset is an unsolved problem.

International Journal of High Performance Computing Applications | 2018

Parallelizing a finite element solver in computational hemodynamics: A black box approach

Ferdinando Auricchio; Marco Ferretti; Adrien Lefieux; Mirto Musci; A. Reali; Santi Trimarchi; Alessandro Veneziani

In the last 20 years, a new approach has emerged to investigate the physiopathology of circulation. By merging medical images with validated numerical models, it is possible to support doctors’ decision-making process. The iCardioCloud project aims at establishing a computational framework to perform a complete patient-specific numerical analysis, specially oriented to aortic diseases (like dissections or aneurysms) and to deliver a compelling synthesis. The project can be considered a pioneering example of a Computer Aided Clinical Trial: i.e., a comprehensive analysis of patients where the level of knowledge extracted by traditional measures and statistics is enhanced through the massive use of numerical modeling. From a computer engineering point of view, iCardioCloud faces multiple challenges. First, the number of problems to solve for each patient is significantly huge – this is typical of computational fluid dynamics (CFD) – and it requires parallel methods. In addition, working in a clinical environment demands efficiency as the timeline requires rapid quantitative answers (as may happen in an emergency scenario). It is therefore mandatory to employ high-end parallel systems, such as large clusters or supercomputers. Here we discuss a parallel implementation of an application within the iCardioCloud project, built with a black-box approach – i.e., by assembling and configuring existing packages and libraries and in particular LifeV, a finite element library developed to solve CFD problems. The goal of this paper is to describe the software architecture underlying LifeV and to assess its performance and the most appropriate parallel paradigm. This paper is an extension of a previous work presented at the PBio 2015 Conference. This revision extends the description of the software architecture and discusses several new serial and parallel optimizations to the application. We discuss the introduction of hybrid parallelism in order to mitigate some performance problems previously experienced.

Explore More