Thomas Cerqueus
University College Dublin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thomas Cerqueus.
international conference on data engineering | 2015
Stefanie Scherzinger; Thomas Cerqueus; Eduardo Cunha de Almeida
Building scalable web applications on top of NoSQL data stores is becoming common practice. Many of these data stores can easily be accessed programmatically, and do not enforce a schema. Software engineers can design the data model on the go, a flexibility that is crucial in agile software development. The typical tasks of database schema management are now handled within the application code, usually involving object mapper libraries. However, todays Integrated Development Environments (IDEs) lack the proper tool support when it comes to managing the combined evolution of the application code and of the schema. Yet simple refactorings such as renaming an attribute at the source code level can cause irretrievable data loss or runtime errors once the application is serving in production. In this demo, we present ControVol, a framework for controlled schema evolution in application development against NoSQL data stores. ControVol is integrated into the IDE and statically type checks object mapper class declarations against the schema evolution history, as recorded by the code repository. ControVol is capable of warning of common yet risky cases of mismatched data and schema. ControVol is further able to suggest quick fixes by which developers can have these issues automatically resolved.
international conference on software testing verification and validation workshops | 2014
Adrien Thiery; Thomas Cerqueus; Christina Thorpe; Gerson Sunyé; John Murphy
Cloud computing is becoming increasingly prevalent, more and more software providers are offering their applications as Software-as-a-Service solutions rather than traditional on-premises installations. In order to ensure the efficacy of the testing phase, it is critical to create a test environment that sufficiently emulates the production environment. Thus, Cloud applications should be tested in the Cloud. Cloud providers offer command-line tools for interacting with their platforms. However, writing custom low-level scripts using the providers tool can become very complex to maintain and manage when variability (in terms of providers and platforms) is introduced. The contributions in this paper include: the development of a high level Domain Specific Language for the abstract definition of the application deployment process, and resource requirements, and a generation process that transforms these definitions to automatically produce deployment and instantiation scripts for a variety of providers and platforms. These contributions significantly simplify and accelerate the testing process for Cloud applications.
information reuse and integration | 2013
Teodora Sandra Buda; Thomas Cerqueus; Morten Kristiansen; John Murphy
In a wide range of application areas (e.g. data mining, approximate query evaluation, histogram construction), database sampling has proved to be a powerful technique. It is generally used when the computational cost of processing large amounts of information is extremely high, and a faster response with a lower level of accuracy for the results is preferred. Previous sampling techniques achieve this balance, however, an evaluation of the cost of the database sampling process should be considered. We argue that the performance of current relational database sampling techniques that maintain the data integrity of the sample database is low and a faster strategy needs to be devised. In this paper we propose a very fast sampling method that maintains the referential integrity of the sample database intact. The sampling method targets the production environment of a system under development, that generally consists of large amounts of data computationally costly to analyze. We evaluate our method in comparison with previous database sampling approaches and show that our method produces a sample database at least 300 times faster and with a maximum trade off of 0.5% in terms of sample size error.
database and expert systems applications | 2013
Teodora Sandra Buda; Thomas Cerqueus; John Murphy; Morten Kristiansen
Database sampling has become a popular approach to handle large amounts of data in a wide range of application areas such as data mining or approximate query evaluation. Using database samples is a potential solution when using the entire database is not cost-effective, and a balance between the accuracy of the results and the computational cost of the process applied on the large data set is preferred. Existing sampling approaches are either limited to specific application areas, to single table databases, or to random sampling. In this paper, we propose CoDS: a novel sampling approach targeting relational databases that ensures that the sample database follows the same distribution for specific fields as the original database. In particular it aims to maintain the distribution between tables. We evaluate the performance of our algorithm by measuring the representativeness of the sample with respect to the original database. We compare our approach with two existing solutions, and we show that our method performs faster and produces better results in terms of representativeness.
Proceedings of the 2013 International Workshop on Testing the Cloud | 2013
Michael Lynch; Thomas Cerqueus; Christina Thorpe
IBM SmartCloud is a branded collection of Cloud products and solutions from IBM. It includes Infrastructure as a Service (IaaS), Software as a Service (SaaS), and Platform as a Service (PaaS) offered through public, private and hybrid cloud delivery models. This paper focuses on the software testing process employed for the SmartCloud iNotes SaaS application, providing details of the methodologies and tools developed to streamline testing. The new tools have enabled the testing team to meet the pace of the highly agile development team, enabling a more efficient software development lifecycle. Results indicate that the methodologies and tools used have increased the performance of the testing team: there was a decrease in the number of bugs present in the code (prior to release), and an overall increase in customer satisfaction.
conference on information and knowledge management | 2014
Teodora Sandra Buda; Thomas Cerqueus; John Murphy; Morten Kristiansen
Large amounts of data often require expensive and time-consuming analysis. Therefore, highly scalable and efficient techniques are necessary to process, analyze and discover useful information. Database sampling has proven to be a powerful method to surpass these limitations. Using only a sample of the original large database brings the benefit of obtaining useful information faster, at the potential expense of lower accuracy. In this paper, we demonstrate \vfds, a novel fast database sampling system that maintains the referential integrity of the data. The system is developed over the open-source database management system, MySQL. We present various scenarios to demonstrate the effectiveness of VFDS in approximate query answering, sample size, and execution time, on both real and synthetic databases.
Information Systems | 2017
Teodora Sandra Buda; Thomas Cerqueus; C. Grava; John Murphy
Abstract Generating synthetic data is useful in multiple application areas (e.g., database testing, software testing). Nevertheless, existing synthetic data generators are either limited to generating data that only respect the database schema constraints, or they are not accurate in terms of representativeness, unless a complex set of inputs are given from the user (such as the data characteristics of the desired generated data). In this paper, we present an extension of a prior representative extrapolation technique, namely ReX [20], limited to natural scaling rates. The objective is to produce in an automated and efficient way a representative extrapolated database, given an original database O and a rational scaling rate, s ∈ Q . In the extended version, the ReX system can handle rational scaling rates by combining existing efficient sampling and extrapolation techniques. Furthermore, we propose a novel sampling technique, RVFDS for handling positive rational values for the desired size of the generated database. We evaluate ReX in comparison with a realistic scaling method, namely UpSizeR [43], on both real and synthetic databases. We show that our solution statistically and significantly outperforms the compared method for rational scaling rates in terms of representativeness.
information reuse and integration | 2016
Vanessa Ayala-Rivera; Thomas Cerqueus; Liam Murphy; Christina Thorpe
The dissemination of textual personal information has become a key driver for innovation and value creation. However, due to the possible content of sensitive information, this data must be anonymized, which can reduce its usefulness for secondary uses. One of the most used techniques to anonymize data is generalization. However, its effectiveness can be hampered by the Value Generalization Hierarchies (VGHs) used to dictate the anonymization of data, as poorly-specified VGHs can reduce the usefulness of the resulting data. To tackle this problem, we propose a metric for evaluating the quality of textual VGHs used in anonymization. Our evaluation approach considers the semantic properties of VGHs and exploits information from the input datasets to predict with higher accuracy (compared to existing approaches) the potential effectiveness of VGHs for anonymizing data. As a consequence, the utility of the resulting datasets is improved without sacrificing the privacy goal. We also introduce a novel rating scale to classify the quality of the VGHs into categories to facilitate the interpretation of our quality metric for practitioners.
self-adaptive and self-organizing systems | 2014
Arnaud Cordier; Remi Domingues; Anthony Labaere; Nicolas Noel; Adrien Thiery; Thomas Cerqueus; Siobhán Clarke; Pawel M. Idziak; Hui Song; Philip Perry; Anthony Ventresque
This paper demonstrates how we applied a constraint-based dynamic adaptation approach on CarDemo, a traffic management system. The approach allows domain experts to describe the adaptation goals as declarative constraints, and automatically plan the adaptation decisions to satisfy these constraints. We demonstrate how to utilise this approach to realise the dynamic switch of routing services of the traffic management system, according to the change of global system states and user requests.
Transactions on Data Privacy | 2014
Vanessa Ayala-Rivera; Patrick McDonagh; Thomas Cerqueus; Liam Murphy