Jorge Bernardino | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jorge Bernardino is active.

Explore More

Publication

Featured researches published by Jorge Bernardino.

international database engineering and applications symposium | 2008

Real-time data warehouse loading methodology

Ricardo Jorge Santos; Jorge Bernardino

A data warehouse provides information for analytical processing, decision making and data mining tools. As the concept of real-time enterprise evolves, the synchronism between transactional data and data warehouses, statically implemented, has been redefined. Traditional data warehouse systems have static structures of their schemas and relationships between data, and therefore are not able to support any dynamics in their structure and content. Their data is only periodically updated because they are not prepared for continuous data integration. For real-time enterprises with needs in decision support purposes, real-time data warehouses seem to be very promising. In this paper we present a methodology on how to adapt data warehouse schemas and user-end OLAP queries for efficiently supporting real-time data integration. To accomplish this, we use techniques such as table structure replication and query predicate restrictions for selecting data, to enable continuously loading data in the data warehouse with minimum impact in query execution time. We demonstrate the efficiency of the method by analyzing its impact in query performance using benchmark TPC-H executing query workloads while simultaneously performing continuous data integration at various insertion time rates.

international c conference on computer science & software engineering | 2013

NoSQL databases: MongoDB vs cassandra

Veronika Abramova; Jorge Bernardino

In the past, relational databases were used in a large scope of applications due to their rich set of features, query capabilities and transaction management. However, they are not able to store and process big data effectively and are not very efficient to make transactions and join operations. Recently, emerge a new paradigm, NoSQL databases, to overcome some of these problems, which are more suitable for the usage in web environments. In this paper, we describe NoSQL databases, their characteristics and operational principles. The main focus of this paper is to compare and evaluate two of the most popular NoSQL databases: MongoDB and Cassandra.

data warehousing and knowledge discovery | 2002

Approximate Query Answering Using Data Warehouse Striping

Jorge Bernardino; Pedro S. Furtado; Henrique Madeira

This paper presents and evaluates a simple but very effective method to implement large data warehouses on an arbitrary number of computers, achieving very high query execution performance and scalability. The data is distributed and processed in a potentially large number of autonomous computers using our technique called data warehouse striping (DWS). The major problem of DWS technique is that it would require a very expensive cluster of computers with fault tolerant capabilities to prevent a fault in a single computer to stop the whole system. In this paper, we propose a radically different approach to deal with the problem of the unavailability of one or more computers in the cluster, allowing the use of DWS with a very large number of inexpensive computers. The proposed approach is based on approximate query answering techniques that make it possible to deliver an approximate answer to the user even when one or more computers in the cluster are not available. The evaluation presented in the paper shows both analytically and experimentally that the approximate results obtained this way have a very small error that can be negligible in most of the cases.

Journal of Big Data | 2015

Choosing the right NoSQL database for the job: a quality attribute evaluation

João Ricardo Lourenço; Bruno Cabral; Paulo Carreiro; Marco Vieira; Jorge Bernardino

For over forty years, relational databases have been the leading model for data storage, retrieval and management. However, due to increasing needs for scalability and performance, alternative systems have emerged, namely NoSQL technology. The rising interest in NoSQL technology, as well as the growth in the number of use case scenarios, over the last few years resulted in an increasing number of evaluations and comparisons among competing NoSQL technologies. While most research work mostly focuses on performance evaluation using standard benchmarks, it is important to notice that the architecture of real world systems is not only driven by performance requirements, but has to comprehensively include many other quality attribute requirements. Software quality attributes form the basis from which software engineers and architects develop software and make design decisions. Yet, there has been no quality attribute focused survey or classification of NoSQL databases where databases are compared with regards to their suitability for quality attributes common on the design of enterprise systems. To fill this gap, and aid software engineers and architects, in this article, we survey and create a concise and up-to-date comparison of NoSQL engines, identifying their most beneficial use case scenarios from the software engineer point of view and the quality attributes that each of them is most suited to.

international database engineering and applications symposium | 2009

Optimizing data warehouse loading procedures for enabling useful-time data warehousing

Ricardo Jorge Santos; Jorge Bernardino

The purpose of a data warehouse is to aid decision making. As the real-time enterprise evolves, synchronism between transactional data and data warehouses is redefined. To cope with real-time requirements, the data warehouses must be able to enable continuous data integration, in order to deal with the most recent business data. Traditional data warehouses are unable to support any dynamics in structure and content while they are available for OLAP. Their data is periodically updated because they are unprepared for continuous data integration. For real-time enterprises with needs in decision support while the transactions are occurring, (near) real-time data warehousing seem very promising. In this paper we present a survey on testing todays most used loading techniques and analyze which are the best data loading methods, presenting a methodology for efficiently supporting continuous data integration for data warehouses. To accomplish this, we use techniques such as table structure replication with minimum content and query predicate restrictions for selecting data, to enable loading data in the data warehouse continuously, with minimum impact in query execution time. We demonstrate the efficiency of the method using benchmark TPC-H and executing query workloads while simultaneously performing continuous data integration.

international database engineering and applications symposium | 2014

An overview of openstack architecture

Tiago Rosado; Jorge Bernardino

Cloud Computing concept refers to both the applications delivered as services over the Internet and the servers and system software in the datacenters that provide those services. These solutions offer pools of virtualized computing resources, paid on a pay-per-use basis, and drastically reduce the initial investment and maintenance costs. Efficient and flexible resource management is the main focus for the cloud solutions on the market as well as scalability and adaptability to new environments. Openstack exceeded the market as a scalable, performant and highly adaptive open source architecture for both public and private cloud solutions as well as leveraging from hardware resources either they be professional or entry level. This paper gives an overview of Openstack software components functionalities in order to design and implement unique cloud computing solutions to fit enterprises purposes.

pacific rim international symposium on dependable computing | 2015

A Survey on Data Quality: Classifying Poor Data

Nuno Laranjeiro; Seyma Nur Soydemir; Jorge Bernardino

Data is part of our everyday life and an essential asset in numerous businesses and organizations. The quality of the data, i.e., the degree to which the data characteristics fulfill requirements, can have a tremendous impact on the businesses themselves, the companies, or even in human lives. In fact, research and industry reports show that huge amounts of capital are spent to improve the quality of the data being used in many systems, sometimes even only to understand the quality of the information in use. Considering the variety of dimensions, characteristics, business views, or simply the specificities of the systems being evaluated, understanding how to measure data quality can be an extremely difficult task. In this paper we survey the state of the art in classification of poor data, including the definition of dimensions and specific data problems, we identify frequently used dimensions and map data quality problems to the identified dimensions. The huge variety of terms and definitions found suggests that further standardization efforts are required. Also, data quality research on Big Data appears to be in its initial steps, leaving open space for further research.

international database engineering and applications symposium | 2001

Experimental evaluation of a new distributed partitioning technique for data warehouses

Jorge Bernardino; Henrique Madeira

Since data warehousing has become a major field of research there has been a lot of interest in reducing the response time of complex queries posed over the very large databases. The problem is that data warehouses store large amounts of data for decision support, requiring a high level of query performance and scalability to the database engines. A novel round-robin data partitioning approach especially designed for relational data warehouse environments is proposed and experimentally evaluated. This approach is specific to data warehouses implemented over relational repositories using the star schema, as it takes advantage of the specific characteristics of star schemas and typical data warehouse query profiles. The proposed approach guarantees optimal load balancing of query execution and assures high scalability. The experimental evaluation presented in the paper, using a comprehensive set of typical queries from the APB-I benchmark running over Oracle 8, shows that an optimal speedup can be obtained with this technique. The proposed technique constitutes an effective and practical way of coping with very large data warehouses and can be applied to existing database technology.

international c conference on computer science & software engineering | 2013

Comparison of data mining techniques and tools for data classification

Luís C. Borges; Viriato M. Marques; Jorge Bernardino

Data Mining is a knowledge field that intersects domains from computer science and statistics, attempting to discover knowledge from databases in order to facilitate the decision making process. Classification is a Data Mining task that learns from a collection of cases in order to accurately predict the target class for new cases. Several machine learning techniques can be used to perform classification. Free and open source Data Mining software tools are available from the Internet that offers the capability of performing classification through different techniques. This study compares four free and open source Data Mining tools: KNIME, Orange, RapidMiner and Weka. Our objective is to reveal the most accurate tool and technique for the classification task. Analysts may use the results to rapidly achieve a good result. Our experimental results show that there is no single tool or technique that always achieves the best result but some achieve better results more often than others.

conference on computer as a tool | 2011

A survey on data security in data warehousing: Issues, challenges and opportunities

Ricardo Jorge Santos; Jorge Bernardino; Marco Vieira

Data Warehouses (DWs) are the enterprises most valuable assets in what concerns critical business information, making them an appealing target for malicious inside and outside attackers. Given the volume of data and the nature of DW queries, most of the existing data security solutions for databases are inefficient, consuming too many resources and introducing too much overhead in query response time, or resulting in too many false positive alarms (i.e., incorrect detection of attacks) to be checked. In this paper, we present a survey on currently available data security techniques, focusing on specific issues and requirements concerning their use in data warehousing environments. We also point out challenges and opportunities for future research work in this field.

Explore More