Vanessa Ayala-Rivera | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vanessa Ayala-Rivera is active.

Explore More

Publication

Featured researches published by Vanessa Ayala-Rivera.

privacy in statistical databases | 2016

COCOA: A Synthetic Data Generator for Testing Anonymization Techniques

Vanessa Ayala-Rivera; A. Omar Portillo-Dominguez; Liam Murphy; Christina Thorpe

Conducting extensive testing of anonymization techniques is critical to assess their robustness and identify the scenarios where they are most suitable. However, the access to real microdata is highly restricted and the one that is publicly-available is usually anonymized or aggregated; hence, reducing its value for testing purposes. In this paper, we present a framework (COCOA) for the generation of realistic synthetic microdata that allows to define multi-attribute relationships in order to preserve the functional dependencies of the data. We prove how COCOA is useful to strengthen the testing of anonymization techniques by broadening the number and diversity of the test scenarios. Results also show how COCOA is practical to generate large datasets.

knowledge science engineering and management | 2016

Automatic Construction of Generalization Hierarchies for Publishing Anonymized Data

Vanessa Ayala-Rivera; Liam Murphy; Christina Thorpe

Concept hierarchies are widely used in multiple fields to carry out data analysis. In data privacy, they are known as Value Generalization Hierarchies (VGHs), and are used by generalization algorithms to dictate the data anonymization. Thus, their proper specification is critical to obtain anonymized data of good quality. The creation and evaluation of VGHs require expert knowledge and a significant amount of manual effort, making these tasks highly error-prone and time-consuming. In this paper we present AIKA, a knowledge-based framework to automatically construct and evaluate VGHs for the anonymization of categorical data. AIKA integrates ontologies to objectively create and evaluate VGHs. It also implements a multi-dimensional reward function to tailor the VGH evaluation to different use cases. Our experiments show that AIKA improved the creation of VGHs by generating VGHs of good quality in less time than when manually done. Results also showed how the reward function properly captures the desired VGH properties.

international conference on performance engineering | 2018

One Size Does Not Fit All: In-Test Workload Adaptation for Performance Testing of Enterprise Applications

Vanessa Ayala-Rivera; Maciej Kaczmarski; John Murphy; Amarendra Darisa; A. Omar Portillo-Dominguez

The identification of workload-dependent performance issues, as well as their root causes, is a time-consuming and complex process which typically requires several iterations of tests (as this type of issues can depend on the input workloads), and heavily relies on human expert knowledge. To improve this process, this paper presents an automated approach to dynamically adapt the workload (used by a performance testing tool) during the test runs. As a result, the performance issues of the tested application can be revealed more quickly; hence, identifying them with less effort and expertise. Our experimental evaluation has assessed the accuracy of the proposed approach and the time savings that it brings to testers. The results have demonstrated the benefits of the approach by achieving a significant decrease in the time invested in performance testing (without compromising the accuracy of the test results), while introducing a low overhead in the testing environment.

international conference on computer science and education | 2017

A unified approach to automate the usage of plagiarism detection tools in programming courses

A. Omar Portillo-Dominguez; Vanessa Ayala-Rivera; Evin Murphy; John Murphy

Plagiarism in programming assignments is an extremely common problem in universities. While there are many tools that automate the detection of plagiarism in source code, users still need to inspect the results and decide whether there is plagiarism or not. Moreover, users often rely on a single tool (using it as “gold standard” for all cases), which can be ineffective and risky. Hence, it is desirable to make use of several tools to complement their results. However, various limitations exist in these tools that make their usage a very time-consuming task, such as the need of manually analyzing and correlating their multiple outputs. In this paper, we propose an automated system that addresses the common usage limitations of plagiarism detection tools. The system automatically manages the execution of different plagiarism tools and generates a consolidated comparative visualization of their results. Consequently, the user can make better-informed decisions about potential plagiarisms. Our experimental results show that the effort and expertise required to use plagiarism detection tools is significantly reduced, while the probability of detecting plagiarism is increased. Results also show that our system is lightweight (in terms of computational resources), proving it is practical for real-world usage.

information reuse and integration | 2016

Improving the Utility of Anonymized Datasets through Dynamic Evaluation of Generalization Hierarchies

Vanessa Ayala-Rivera; Thomas Cerqueus; Liam Murphy; Christina Thorpe

The dissemination of textual personal information has become a key driver for innovation and value creation. However, due to the possible content of sensitive information, this data must be anonymized, which can reduce its usefulness for secondary uses. One of the most used techniques to anonymize data is generalization. However, its effectiveness can be hampered by the Value Generalization Hierarchies (VGHs) used to dictate the anonymization of data, as poorly-specified VGHs can reduce the usefulness of the resulting data. To tackle this problem, we propose a metric for evaluating the quality of textual VGHs used in anonymization. Our evaluation approach considers the semantic properties of VGHs and exploits information from the input datasets to predict with higher accuracy (compared to existing approaches) the potential effectiveness of VGHs for anonymizing data. As a consequence, the utility of the resulting datasets is improved without sacrificing the privacy goal. We also introduce a novel rating scale to classify the quality of the VGHs into categories to facilitate the interpretation of our quality metric for practitioners.

Transactions on Data Privacy | 2014