Maryam Abbasi
University of Coimbra
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Maryam Abbasi.
international conference on bioinformatics and biomedical engineering | 2015
Maryam Abbasi; Luís Paquete; Francisco Baptista Pereira
Recently, there has been a growing interest on the multiobjective formulation of optimization problems that arise in bioinformatics, such as sequence alignment. In this work, we consider the multiobjective multiple sequence alignment, with the goal of maximizing the substitution score and minimizing the number of indels. We introduce several local search approaches for this problem. Several neighborhood definitions and perturbations are presented and discussed. The local search algorithms are tested experimentally on a wide range of instances.
Bioinformatics | 2013
Maryam Abbasi; Luís Paquete; Arnaud Liefooghe; Miguel Pinheiro; Pedro Matias
MOTIVATION In this article, we consider the bicriteria pairwise sequence alignment problem and propose extensions of dynamic programming algorithms for several problem variants with a novel pruning technique that efficiently reduces the number of states to be processed. Moreover, we present a method for the construction of phylogenetic trees based on this bicriteria framework. Two exemplary cases are discussed. RESULTS Numerical results on a real dataset show that this approach is very fast in practice. The pruning technique saves up to 90% in memory usage and 80% in CPU time. Based on this method, phylogenetic trees are constructed from real-life data. In addition of providing complementary information, some of these trees match those obtained by the Maximum Likelihood method. AVAILABILITY AND IMPLEMENTATION Source code is freely available for download at URL http://eden.dei.uc.pt/paquete/MOSAL, implemented in C and supported on Linux, MAC OS and MS Windows.
Source Code for Biology and Medicine | 2014
Luís Paquete; Pedro Matias; Maryam Abbasi; Miguel Pinheiro
Multiobjective sequence alignment brings the advantage of providing a set of alignments that represent the trade-off between performing insertion/deletions and matching symbols from both sequences. Each of these alignments provide a potential explanation of the relationship between the sequences. We introduce MOSAL, a software tool that provides an open-source implementation and an on-line application for multiobjective pairwise sequence alignment.
business intelligence systems | 2016
Pedro Martins; Maryam Abbasi; Pedro Furtado
In this paper, we investigate the problem of providing scalability (out and in) to extraction transformation load (ETL) and querying (Q) (ETL+Q) process of data warehouses. In general, data loading, transformation and integration are heavy tasks that are performed only periodically, instead of row by row. Parallel architectures and mechanisms are able to optimise the ETL process by speeding-up each part of the pipeline process as more performance is needed. We propose parallelisation solutions, called AScale, for each part of the ETL+Q, that is, an approach that enables the automatic scalability and freshness of any data warehouse and ETL+Q process. Our results show that the proposed system algorithms can handle scalablity to provide the desired processing speed.
Biomedical Engineering Online | 2016
Maryam Abbasi; Luís Paquete; Francisco Baptista Pereira
BackgroundAligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment.MethodsWe introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments.Results and conclusionsThe proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show that our approaches can obtain better results than TCoffee and Clustal Omega in terms of the first ratio.
international conference: beyond databases, architectures and structures | 2015
Pedro Martins; Maryam Abbasi; Pedro Furtado
In this paper we investigate the problem of providing timely results for the Extraction, Transformation and Load (ETL) process and automatic scalability to the entire pipeline including the data warehouse. In general, data loading, transformation and integration are heavy tasks that are performed only periodically during specific offline time windows. Parallel architectures and mechanisms are able to optimize the ETL process by speeding-up each part of the pipeline process as more performance is needed. However, none of them allow the user to specify the ETL time and the framework scales automatically to assure it.
advances in databases and information systems | 2015
Pedro Martins; Maryam Abbasi; Pedro Furtado
In this paper we investigate the problem of providing automatic scalability and data freshness to data warehouses, when at the same time dealing with high-rate data efficiently. In general, data freshness is not guaranteed in those contexts, since data loading, transformation and integration are heavy tasks that are performed only periodically, instead of row by row.
Archive | 2019
Maryam Abbasi; Filipe Sá; Daniel Albuquerque; Cristina Wanzeller; Filipe Caldeira; Paulo Tomé; Pedro Furtado; Pedro Martins
The implementation of Smart-Cities is growing all over the world. From big cities to small villages, information able to provide a better and efficient urban management is collected from multiple sources (sensors). Such information has to be stored, queried, analyzed and displayed, aiming to contribute to a better quality of life for citizens and also a more sustainable environment. In this context it is important to choose the right database engine for this scenario. NoSQL databases are now generally accepted by the database community to support application niches. They are known for their scalability, simplicity, and key-indexed data storage, thus, allowing an easy data distribution and balancing over several nodes.
international conference: beyond databases, architectures and structures | 2018
Maryam Abbasi; Pedro Martins; José Cecílio; João Pedro Costa; Pedro Furtado
Over the past decade’s several new concepts emerged to organize and query data over large Data Warehouse (DW) system with the same primary objective, that is, optimize processing speed. More recently, with the rise of BigData concept, storage cost lowered significantly, and performance (random accesses) increased, particularly with modern SSD disks. This paper introduces and tested a storage alternative which goes against current data normalization premises, where storage space is no longer a concern. By de-normalizing the entire data schema (transparent to the user) it is proposed a new concept system where query execution time must be entirely predictable, independently of its complexity, called, SINGLE. The proposed data model also allows easy partitioning and distributed processing to enable execution parallelism, boosting performance, as happens in MapReduce. TPC-H benchmark is used to evaluate storage space and query performance. Results show predictable performance when comparing with approaches based on a normalized relational schema, and MapReduce oriented.
international conference: beyond databases, architectures and structures | 2017
Pedro Martins; Maryam Abbasi; José Cecílio; Pedro Furtado
Works in the field of data warehousing (DW) do not address Stream Processing (SP) integration in order to provide results freshness (i.e. results that include information that is not yet stored into the DW) and at the same time to relax the DW processing load. Previous research works focus mainly on parallelization, for instance: adding more hardware resources; parallelizing operators, queries, and storage. A very known and studied approach is to use Map-Reduce to scale horizontally in order to achieve more storage and processing performance. In many contexts, high-rate data needs to be processed in small time windows without storing results (e.g. for near real-time monitoring), in other cases, the objective is to relax the data warehouse usage (e.g. keeping results updated for web-pages reload). In both cases, stream processing solutions can be set to work together with the data warehouse (Map-Reduce or not) to keep results available on the fly avoiding high query execution times, and, this way leaving the DW servers more available to process other heavy tasks (e.g. data mining).