Helena Galhardas
University of Lisbon
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Helena Galhardas.
international database engineering and applications symposium | 2014
Valéria Magalhães Pequeno; Vânia Maria Ponte Vidal; Marco A. Casanova; Luís Eufrasio T. Neto; Helena Galhardas
The W3C RDB2RDF Working Group proposed a standard language to map relational data into RDF triples, called R2RML. However, creating R2RML mappings may sometimes be a difficult task because it involves the creation of views (within the mappings or not) and referring to them in the R2RML mapping. To overcome such difficulty, this paper first proposes algebraic correspondence assertions, which simplify the definition of relational-to-RDF mappings and yet are expressive enough to cover a wide range of mappings. Algebraic correspondence assertions include data-metadata mappings (where data elements in one schema serve as metadata components in the other), mappings containing custom value functions (e.g., data format transformation functions) and union, intersection and difference between tables. Then, the paper shows how to automatically compile algebraic correspondence assertions into R2RML mappings.
acm/ieee joint conference on digital libraries | 2014
Pablo Barrio; Gonçalo Simões; Helena Galhardas; Luis Gravano
We introduce the REEL (RElation Extraction Learning) framework, an open source framework that facilitates the development and evaluation of relation extraction systems over text collections. To define a relation extraction system for a new relation and text collection, users only need to specify the parsers to load the collection, the relation and its constraints, and the learning and extraction techniques to be used. This makes REEL a powerful framework to enable the deployment and evaluation of relation extraction systems for both application building and research.
very large data bases | 2017
Pradeeban Kathiravelu; Yiru Chen; Ashish Sharma; Helena Galhardas; Peter Van Roy; Luís Veiga
Biomedical research requires distributed access, analysis, and sharing of data from various disperse sources in the Internet scale. Due to the volume and variety of big data, materialized data integration is often infeasible or too expensive including the costs of bandwidth, storage, maintenance, and management. Obidos (On-demand Big Data Integration, Distribution, and Orchestration System) provides a novel on-demand integration approach for heterogeneous distributed data. Instead of integrating data from the data sources to build a complete data warehouse as the initial step, Obidos employs a hybrid approach of virtual and materialized data integrations. By allocating unique identifiers as pointers to virtually integrated data sets, Obidos supports efficient data sharing among data consumers. We design Obidos as a generic service-based data integration system, and implement and evaluate a prototype for multimodal medical data.
Distributed and Parallel Databases | 2018
Pradeeban Kathiravelu; Ashish Sharma; Helena Galhardas; Peter Van Roy; Luıs Veiga
Scientific research requires access, analysis, and sharing of data that is distributed across various heterogeneous data sources at the scale of the Internet. An eager extract, transform, and load (ETL) process constructs an integrated data repository as its first step, integrating and loading data in its entirety from the data sources. The bootstrapping of this process is not efficient for scientific research that requires access to data from very large and typically numerous distributed data sources. A lazy ETL process loads only the metadata, but still eagerly. Lazy ETL is faster in bootstrapping. However, queries on the integrated data repository of eager ETL perform faster, due to the availability of the entire data beforehand. In this paper, we propose a novel ETL approach for scientific data integration, as a hybrid of eager and lazy ETL approaches, and applied both to data as well as metadata. This way, hybrid ETL supports incremental integration and loading of metadata and data from the data sources. We incorporate a human-in-the-loop approach, to enhance the hybrid ETL, with selective data integration driven by the user queries and sharing of integrated data between users. We implement our hybrid ETL approach in a prototype platform, Óbidos, and evaluate it in the context of data sharing for medical research. Óbidos outperforms both the eager ETL and lazy ETL approaches, for scientific research data integration and sharing, through its selective loading of data and metadata, while storing the integrated data in a scalable integrated data repository.
very large data bases | 2017
Nuno Lages; Bernardo Caetano; Manuel J. Fonseca; João Pereira; Helena Galhardas; Rui Farinha
Recording patient clinical data in a comprehensive and easy way is very important for health care providers. However, and although there are information systems to facilitate the storage and access to patient data, many records are still in paper. Even when data is stored electronically, systems often are complex to use and do not provide means to gather statistical information about a population of patients, thus limiting the usefulness of the data. Physicians often give up searching for relevant information to support their medical decisions because the task is too time-consuming. This paper proposes Umedicine, a web-based software application in Portuguese that addresses current limitations of clinical information systems. Umedicine is an application for physicians, patients and administrative staff that keeps clinical data (e.g., symptoms, clinical examination results, and treatments prescribed) up to date on a database in a structured way. It also provides easy and quick access to a large amount of clinical data collected over time. Furthermore, Umedicine supports the application of a particular clustering algorithm and a visualization module for analyzing patient time-series data, to identify evolution patterns. Preliminary user tests revealed promising results, showing that users were able to identify the evolution of groups of patients over time and their common characteristics.
international conference on enterprise information systems | 2014
Valéria Magalhães Pequeno; Helena Galhardas; Vânia Maria Ponte Vidal
Data Integration (DI) is the problem of combining a set of heterogeneous, autonomous data sources and providing the user with a unified view of these data. Integrating data raises several challenges, since the designer usually encounters incompatible data models characterized by differences in structure and semantics. One of the hardest challenges is to define correspondences between schema elements (e.g., attributes) to determine how they relate to each other. Since most business data is currently stored in relational databases, here present a declarative and formal approach to specify 1-to-1, 1-m, and m-to-n correspondences between relational schema components. Differently from usual approaches, our (CAs) have semantics and can deal with outer-joins and data-metadata relationships. Finally, we demonstrate how to use the CAs to generate mapping expressions in the form of SQL queries, and we present some preliminary tests to verify the performance of the generated queries.
brazilian symposium on databases | 2005
Paulo Carreira; Helena Galhardas; Antónia Lopes; João Madeiras Pereira
extending database technology | 2015
Pablo Barrio; Gonçalo Simões; Helena Galhardas; Luis Gravano
SBBD (Short Papers) | 2015
Valéria Magalhães Pequeno; Vânia Maria Ponte Vidal; Tiago Vinuto; Helena Galhardas
OTM Conferences | 2015
Pradeeban Kathiravelu; Helena Galhardas; Luís Veiga