Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Simona E. Rombo is active.

Publication


Featured researches published by Simona E. Rombo.


Bioinformatics | 2014

Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods

Clara Pizzuti; Simona E. Rombo

MOTIVATION Protein-protein interaction (PPI) networks are powerful models to represent the pairwise protein interactions of the organisms. Clustering PPI networks can be useful for isolating groups of interacting proteins that participate in the same biological processes or that perform together specific biological functions. Evolutionary orthologies can be inferred this way, as well as functions and properties of yet uncharacterized proteins. RESULTS We present an overview of the main state-of-the-art clustering methods that have been applied to PPI networks over the past decade. We distinguish five specific categories of approaches, describe and compare their main features and then focus on one of them, i.e. population-based stochastic search. We provide an experimental evaluation, based on some validation measures widely used in the literature, of techniques in this class, that are as yet less explored than the others. In particular, we study how the capability of Genetic Algorithms (GAs) to extract clusters in PPI networks varies when different topology-based fitness functions are used, and we compare GAs with the main techniques in the other categories. The experimental campaign shows that predictions returned by GAs are often more accurate than those produced by the contestant methods. Interesting issues still remain open about possible generalizations of GAs allowing for cluster overlapping. AVAILABILITY AND IMPLEMENTATION We point out which methods and tools described here are publicly available. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Briefings in Bioinformatics | 2014

Compressive biological sequence analysis and archival in the era of high-throughput sequencing technologies

Raffaele Giancarlo; Simona E. Rombo; Filippo Utro

High-throughput sequencing technologies produce large collections of data, mainly DNA sequences with additional information, requiring the design of efficient and effective methodologies for both their compression and storage. In this context, we first provide a classification of the main techniques that have been proposed, according to three specific research directions that have emerged from the literature and, for each, we provide an overview of the current techniques. Finally, to make this review useful to researchers and technicians applying the existing software and tools, we include a synopsis of the main characteristics of the described approaches, including details on their implementation and availability. Performance of the various methods is also highlighted, although the state of the art does not lend itself to a consistent and coherent comparison among all the methods presented here.


flexible query answering systems | 2004

Discovering Representative Models in Large Time Series Databases

Simona E. Rombo; Giorgio Terracina

The discovery of frequently occurring patterns in a time series could be important in several application contexts. As an example, the analysis of frequent patterns in biomedical observations could allow to perform diagnosis and/or prognosis. Moreover, the efficient discovery of frequent patterns may play an important role in several data mining tasks such as association rule discovery, clustering and classification. However, in order to identify interesting repetitions, it is necessary to allow errors in the matching patterns; in this context, it is difficult to select one pattern particularly suited to represent the set of similar ones, whereas modelling this set with a single model could be more effective. In this paper we present an approach for deriving representative models in a time series. Each model represents a set of similar patterns in the time series. The approach presents the following peculiarities: (i) it works on discretized time series but its complexity does not depend on the cardinality of the alphabet exploited for the discretization; (ii) derived models allow to express the distribution of the represented patterns; (iii) all interesting models are derived in a single scan of the time series. The paper reports the results of some experimental tests and compares the proposed approach with related ones.


IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2012

A Coclustering Approach for Mining Large Protein-Protein Interaction Networks

Clara Pizzuti; Simona E. Rombo

Several approaches have been presented in the literature to cluster Protein-Protein Interaction (PPI) networks. They can be grouped in two main categories: those allowing a protein to participate in different clusters and those generating only nonoverlapping clusters. In both cases, a challenging task is to find a suitable compromise between the biological relevance of the results and a comprehensive coverage of the analyzed networks. Indeed, methods returning high accurate results are often able to cover only small parts of the input PPI network, especially when low-characterized networks are considered. We present a coclustering-based technique able to generate both overlapping and nonoverlapping clusters. The density of the clusters to search for can also be set by the user. We tested our method on the two networks of yeast and human, and compared it to other five well-known techniques on the same interaction data sets. The results showed that, for all the examples considered, our approach always reaches a good compromise between accuracy and network coverage. Furthermore, the behavior of our algorithm is not influenced by the structure of the input network, different from all the techniques considered in the comparison, which returned very good results on the yeast network, while on the human network their outcomes are rather poor.


IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2011

Asymmetric Comparison and Querying of Biological Networks

Nicola Ferraro; Luigi Palopoli; Simona Panni; Simona E. Rombo

Comparing and querying the protein-protein interaction (PPI) networks of different organisms is important to infer knowledge about conservation across species. Known methods that perform these tasks operate symmetrically, i.e., they do not assign a distinct role to the input PPI networks. However, in most cases, the input networks are indeed distinguishable on the basis of how the corresponding organism is biologically well characterized. In this paper a new idea is developed, that is, to exploit differences in the characterization of organisms at hand in order to devise methods for comparing their PPI networks. We use the PPI network (called Master) of the best characterized organism as a fingerprint to guide the alignment process to the second input network (called Slave), so that generated results preferably retain the structural characteristics of the Master network. Technically, this is obtained by generating from the Master a finite automaton, called alignment model, which is then fed with (a linearization of) the Slave for the purpose of extracting, via the Viterbi algorithm, matching subgraphs. We propose an approach able to perform global alignment and network querying, and we apply it on PPI networks. We tested our method showing that the results it returns are biologically relevant.


Information Fusion | 2009

Improving protein secondary structure predictions by prediction fusion

Luigi Palopoli; Simona E. Rombo; Giorgio Terracina; Giuseppe Tradigo; Pierangelo Veltri

Protein secondary structure prediction is still a challenging problem at today. Even if a number of prediction methods have been presented in the literature, the various prediction tools that are available on-line produce results whose quality is not always fully satisfactory. Therefore, a user has to know which predictor to use for a given protein to be analyzed. In this paper, we propose a server implementing a method to improve the accuracy in protein secondary structure prediction. The method is based on integrating the prediction results computed by some available on-line prediction tools to obtain a combined prediction of higher quality. Given an input protein p whose secondary structure has to be predicted, and a group of proteins F, whose secondary structures are known, the server currently works according to a two phase approach: (i) it selects a set of predictors good at predicting the secondary structure of proteins in F (and, therefore, supposedly, that of p as well), and (ii) it integrates the prediction results delivered for p by the selected team of prediction tools. Therefore, by exploiting our system, the user is relieved of the burden of selecting the most appropriate predictor for the given input protein being, at the same time, assumed that a prediction result at least as good as the best available one will be delivered. The correctness of the resulting prediction is measured referring to EVA accuracy parameters used in several editions of CASP.


evolutionary computation machine learning and data mining in bioinformatics | 2012

Complex detection in protein-protein interaction networks: a compact overview for researchers and practitioners

Clara Pizzuti; Simona E. Rombo; Elena Marchiori

The availability of large volumes of protein-protein interaction data has allowed the study of biological networks to unveil the complex structure and organization in the cell. It has been recognized by biologists that proteins interacting with each other often participate in the same biological processes, and that protein modules may be often associated with specific biological functions. Thus the detection of protein complexes is an important research problem in systems biology. In this review, recent graph-based approaches to clustering protein interaction networks are described and classified with respect to common peculiarities. The goal is that of providing a useful guide and reference for both computer scientists and biologists.


data mining in bioinformatics | 2009

A technique to search for functional similarities in protein-protein interaction networks

Valeria Fionda; Luigi Palopoli; Simona Panni; Simona E. Rombo

We describe a method to search for similarities across protein-protein interaction networks of different organisms. The technique core consists in computing a maximum weight matching of bipartite graphs resulting from comparing the neighbourhoods of proteins belonging to different networks. Both quantitative and reliability information are exploited. We tested the method on the networks of S. cerevisiae, D. melanogaster and C. elegans. The experiments showed that the technique is able to detect functional orthologs when the sole sequence similarity does not prove itself sufficient. They also demonstrated the capability of our approach in discovering common biological processes involving uncharacterised proteins.


Bioinformatics | 2015

Epigenomic k-mer dictionaries: shedding light on how sequence composition influences in vivo nucleosome positioning

Raffaele Giancarlo; Simona E. Rombo; Filippo Utro

MOTIVATION Information-theoretic and compositional analysis of biological sequences, in terms of k-mer dictionaries, has a well established role in genomic and proteomic studies. Much less so in epigenomics, although the role of k-mers in chromatin organization and nucleosome positioning is particularly relevant. Fundamental questions concerning the informational content and compositional structure of nucleosome favouring and disfavoring sequences with respect to their basic building blocks still remain open. RESULTS We present the first analysis on the role of k-mers in the composition of nucleosome enriched and depleted genomic regions (NER and NDR for short) that is: (i) exhaustive and within the bounds dictated by the information-theoretic content of the sample sets we use and (ii) informative for comparative epigenomics. We analize four different organisms and we propose a paradigmatic formalization of k-mer dictionaries, providing two different and complementary views of the k-mers involved in NER and NDR. The first extends well known studies in this area, its comparative nature being its major merit. The second, very novel, brings to light the rich variety of k-mers involved in influencing nucleosome positioning, for which an initial classification in terms of clusters is also provided. Although such a classification offers many insights, the following deserves to be singled-out: short poly(dA:dT) tracts are reported in the literature as fundamental for nucleosome depletion, however a global quantitative look reveals that their role is much less prominent than one would expect based on previous studies. AVAILABILITY AND IMPLEMENTATION Dictionaries, clusters and Supplementary Material are available online at http://math.unipa.it/rombo/epigenomics/. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Theoretical Computer Science | 2014

Irredundant tandem motifs

Laxmi Parida; Cinzia Pizzi; Simona E. Rombo

Eliminating the possible redundancy from a set of candidate motifs occurring in an input string is fundamental in many applications. The existing techniques proposed to extract irredundant motifs are not suitable when the motifs to search for are structured, i.e., they are made of two (or several) subwords that co-occur in a text string s of length n. The main effort of this work is studying and characterizing a compact class of tandem motifs, that is, pairs of substrings occurring in tandem within a maximum distance of d symbols in s, where d is an integer constant given in input. To this aim, we first introduce the concept of maximality, related to four specific conditions that hold only for this class of motifs. Then, we eliminate the remaining redundancy by defining the notion of irredundancy for tandem motifs. We prove that the number of non-overlapping irredundant tandem motifs is O(d^2n) which, considering d as a constant, leads to a linear number of tandems in the length of the input string. This is an order of magnitude less than previously developed compact indexes for tandem extraction. The notions and bounds provided for tandem motifs are generalized for the case r>=2, if r is the number of subwords composing the motifs. Finally, we also provide an algorithm to extract irredundant tandem motifs.

Collaboration


Dive into the Simona E. Rombo's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Clara Pizzuti

National Research Council

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge