Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Aida Valls is active.

Publication


Featured researches published by Aida Valls.


Expert Systems With Applications | 2012

Ontology-based semantic similarity: A new feature-based approach

David Sánchez; Montserrat Batet; David Isern; Aida Valls

Estimation of the semantic likeness between words is of great importance in many applications dealing with textual data such as natural language processing, knowledge acquisition and information retrieval. Semantic similarity measures exploit knowledge sources as the base to perform the estimations. In recent years, ontologies have grown in interest thanks to global initiatives such as the Semantic Web, offering an structured knowledge representation. Thanks to the possibilities that ontologies enable regarding semantic interpretation of terms many ontology-based similarity measures have been developed. According to the principle in which those measures base the similarity assessment and the way in which ontologies are exploited or complemented with other sources several families of measures can be identified. In this paper, we survey and classify most of the ontology-based approaches developed in order to evaluate their advantages and limitations and compare their expected performance both from theoretical and practical points of view. We also present a new ontology-based measure relying on the exploitation of taxonomical features. The evaluation and comparison of our approachs results against those reported by related works under a common framework suggest that our measure provides a high accuracy without some of the limitations observed in other works.


Expert Systems With Applications | 2014

Review: Intelligent tourism recommender systems: A survey

Joan Borrís; Antonio Moreno; Aida Valls

Recommender systems are currently being applied in many different domains. This paper focuses on their application in tourism. A comprehensive and thorough search of the smart e-Tourism recommenders reported in the Artificial Intelligence journals and conferences since 2008 has been made. The paper provides a detailed and up-to-date survey of the field, considering the different kinds of interfaces, the diversity of recommendation algorithms, the functionalities offered by these systems and their use of Artificial Intelligence techniques. The survey also provides some guidelines for the construction of tourism recommenders and outlines the most promising areas of work in the field for the next years.


Engineering Applications of Artificial Intelligence | 2013

SigTur/E-Destination: Ontology-based personalized recommendation of Tourism and Leisure Activities

Antonio Moreno; Aida Valls; David Isern; Lucas Marin; Joan Borrís

SigTur/E-Destination is a Web-based system that provides personalized recommendations of touristic activities in the region of Tarragona. The activities are properly classified and labeled according to a specific ontology, which guides the reasoning process. The recommender takes into account many different kinds of data: demographic information, travel motivations, the actions of the user on the system, the ratings provided by the user, the opinions of users with similar demographic characteristics or similar tastes, etc. The system has been fully designed and implemented in the Science and Technology Park of Tourism and Leisure. The paper presents a numerical evaluation of the correlation between the recommendations and the users motivations, and a qualitative evaluation performed by end users.


International Journal of Medical Informatics | 2010

Using ontologies for structuring organizational knowledge in Home Care assistance

Aida Valls; Karina Gibert; David Sánchez; Montserrat Batet

PURPOSE Information Technologies and Knowledge-based Systems can significantly improve the management of complex distributed health systems, where supporting multidisciplinarity is crucial and communication and synchronization between the different professionals and tasks becomes essential. This work proposes the use of the ontological paradigm to describe the organizational knowledge of such complex healthcare institutions as a basis to support their management. The ontology engineering process is detailed, as well as the way to maintain the ontology updated in front of changes. The paper also analyzes how such an ontology can be exploited in a real healthcare application and the role of the ontology in the customization of the system. The particular case of senior Home Care assistance is addressed, as this is a highly distributed field as well as a strategic goal in an ageing Europe. MATERIALS AND METHODS The proposed ontology design is based on a Home Care medical model defined by an European consortium of Home Care professionals, framed in the scope of the K4Care European project (FP6). Due to the complexity of the model and the knowledge gap existing between the - textual - medical model and the strict formalization of an ontology, an ontology engineering methodology (On-To-Knowledge) has been followed. RESULTS After applying the On-To-Knowledge steps, the following results were obtained: the feasibility study concluded that the ontological paradigm and the expressiveness of modern ontology languages were enough to describe the required medical knowledge; after the kick-off and refinement stages, a complete and non-ambiguous definition of the Home Care model, including its main components and interrelations, was obtained; the formalization stage expressed HC medical entities in the form of ontological classes, which are interrelated by means of hierarchies, properties and semantically rich class restrictions; the evaluation, carried out by exploiting the ontology into a knowledge-driven e-health application running on a real scenario, showed that the ontology design and its exploitation brought several benefits with regards to flexibility, adaptability and work efficiency from the end-user point of view; for the maintenance stage, two software tools are presented, aimed to address the incorporation and modification of healthcare units and the personalization of ontological profiles. CONCLUSIONS The paper shows that the ontological paradigm and the expressiveness of modern ontology languages can be exploited not only to represent terminology in a non-ambiguous way, but also to formalize the interrelations and organizational structures involved in a real and distributed healthcare environment. This kind of ontologies facilitates the adaptation in front of changes in the healthcare organization or Care Units, supports the creation of profile-based interaction models in a transparent and seamless way, and increases the reusability and generality of the developed software components. As a conclusion of the exploitation of the developed ontology in a real medical scenario, we can say that an ontology formalizing organizational interrelations is a key component for building effective distributed knowledge-driven e-health systems.


intelligent information systems | 2010

Ontology-driven web-based semantic similarity

David Sánchez; Montserrat Batet; Aida Valls; Karina Gibert

Estimation of the degree of semantic similarity/distance between concepts is a very common problem in research areas such as natural language processing, knowledge acquisition, information retrieval or data mining. In the past, many similarity measures have been proposed, exploiting explicit knowledge—such as the structure of a taxonomy—or implicit knowledge—such as information distribution. In the former case, taxonomies and/or ontologies are used to introduce additional semantics; in the latter case, frequencies of term appearances in a corpus are considered. Classical measures based on those premises suffer from some problems: in the first case, their excessive dependency of the taxonomical/ontological structure; in the second case, the lack of semantics of a pure statistical analysis of occurrences and/or the ambiguity of estimating concept statistical distribution from term appearances. Measures based on Information Content (IC) of taxonomical concepts combine both approaches. However, they heavily depend on a properly pre-tagged and disambiguated corpus according to the ontological entities in order to compute accurate concept appearance probabilities. This limits the applicability of those measures to other ontologies –like specific domain ontologies- and massive corpus –like the Web-. In this paper, several of the presented issues are analyzed. Modifications of classical similarity measures are also proposed. They are based on a contextualized and scalable version of IC computation in the Web by exploiting taxonomical knowledge. The goal is to avoid the measures’ dependency on the corpus pre-processing to achieve reliable results and minimize language ambiguity. Our proposals are able to outperform classical approaches when using the Web for estimating concept probabilities.


Journal of Biomedical Informatics | 2013

A semantic framework to protect the privacy of electronic health records with non-numerical attributes

Sergio Martínez; David Sánchez; Aida Valls

Structured patient data like Electronic Health Records (EHRs) are a valuable source for clinical research. However, the sensitive nature of such information requires some anonymisation procedure to be applied before releasing the data to third parties. Several studies have shown that the removal of identifying attributes, like the Social Security Number, is not enough to obtain an anonymous data file, since unique combinations of other attributes as for example, rare diagnoses and personalised treatments, may lead to patients identity disclosure. To tackle this problem, Statistical Disclosure Control (SDC) methods have been proposed to mask sensitive attributes while preserving, up to a certain degree, the utility of anonymised data. Most of these methods focus on continuous-scale numerical data. Considering that part of the clinical data found in EHRs is expressed with non-numerical attributes as for example, diagnoses, symptoms, procedures, etc., their application to EHRs produces far from optimal results. In this paper, we propose a general framework to enable the accurate application of SDC methods to non-numerical clinical data, with a focus on the preservation of semantics. To do so, we exploit structured medical knowledge bases like SNOMED CT to propose semantically-grounded operators to compare, aggregate and sort non-numerical terms. Our framework has been applied to several well-known SDC methods and evaluated using a real clinical dataset with non-numerical attributes. Results show that the exploitation of medical semantics produces anonymised datasets that better preserve the utility of EHRs.


Computers & Security | 2012

Semantic adaptive microaggregation of categorical microdata

Sergio Martínez; David Sánchez; Aida Valls

In the context of Statistical Disclosure Control, microaggregation is a privacy-preserving method aimed to mask sensitive microdata prior to publication. It iteratively creates clusters of, at least, k elements, and replaces them by their prototype so that they become k-indistinguishable (anonymous). This data transformation produces a loss of information with regards to the original dataset which affects the utility of masked data, so, the aim of microaggregation algorithms is to find the partition that minimises the information loss while ensuring a certain level of privacy. Most microaggregation methods, such as the MDAV algorithm, which is the focus of this paper, have been designed for numerical data. Extending them to support non-numerical (categorical) attributes is not straightforward because of the limitations on defining appropriate aggregation operators. Concretely, related works focused on the MDAV algorithm propose grouping data into groups with constrained size (or even fixed) and/or incorporate a basic categorical treatment of non-numerical data. This approach affects negatively the utility of the protected dataset because neither the distributional characteristics of data nor their underlying semantics are properly considered. In this paper, we propose a set of modifications to the MDAV algorithm focused on categorical microdata. Our approach has been evaluated and compared with related works when protecting real datasets with textual attribute values. Results show that our method produces masked datasets that better minimises the information loss resulting from the data transformation.


Information Fusion | 2012

Privacy protection of textual attributes through a semantic-based masking method

Sergio Martínez; David Sánchez; Aida Valls; Montserrat Batet

Using microdata provided by statistical agencies has many benefits from the data mining point of view. However, such data often involve sensitive information that can be directly or indirectly related to individuals. An appropriate anonymisation process is needed to minimise the risk of disclosure. Several masking methods have been developed to deal with continuous-scale numerical data or bounded textual values but approaches to tackling the anonymisation of textual values are scarce and shallow. Because of the importance of textual data in the Information Society, in this paper we present a new masking method for anonymising unbounded textual values based on the fusion of records with similar values to form groups of indistinguishable individuals. Since, from the data exploitation point of view, the utility of textual information is closely related to the preservation of its meaning, our method relies on the structured knowledge representation given by ontologies. This domain knowledge is used to guide the masking process towards the merging that best preserves the semantics of the original data. Because textual data typically consist of large and heterogeneous value sets, our method provides a computationally efficient algorithm by relying on several heuristics rather than exhaustive searches. The method is evaluated with real data in a concrete data mining application that involves solving a clustering problem. We also compare the method with more classical approaches that focus on optimising the value distribution of the dataset. Results show that a semantically grounded anonymisation best preserves the utility of data in both the theoretical and the practical setting, and reduces the probability of record linkage. At the same time, it achieves good scalability with regard to the size of input data.


Applied Intelligence | 2013

Semantic similarity estimation from multiple ontologies

Montserrat Batet; David Sánchez; Aida Valls; Karina Gibert

The estimation of semantic similarity between words is an important task in many language related applications. In the past, several approaches to assess similarity by evaluating the knowledge modelled in an ontology have been proposed. However, in many domains, knowledge is dispersed through several partial and/or overlapping ontologies. Because most previous works on semantic similarity only support a unique input ontology, we propose a method to enable similarity estimation across multiple ontologies. Our method identifies different cases according to which ontology/ies input terms belong. We propose several heuristics to deal with each case, aiming to solve missing values, when partial knowledge is available, and to capture the strongest semantic evidence that results in the most accurate similarity assessment, when dealing with overlapping knowledge. We evaluate and compare our method using several general purpose and biomedical benchmarks of word pairs whose similarity has been assessed by human experts, and several general purpose (WordNet) and biomedical ontologies (SNOMED CT and MeSH). Results show that our method is able to improve the accuracy of similarity estimation in comparison to single ontology approaches and against state of the art related works in multi-ontology similarity assessment.


Knowledge Based Systems | 2012

Semantically-grounded construction of centroids for datasets with textual attributes

Sergio Martínez; Aida Valls; David Sánchez

Centroids are key components in many data analysis algorithms such as clustering or microaggregation. They are considered as the central value that minimises the distance to all the objects in a dataset or cluster. Methods for centroid construction are mainly devoted to datasets with numerical and categorical attributes, focusing on the numerical and distributional properties of data. Textual attributes, on the contrary, consist of term lists referring to concepts with a specific semantic content (i.e., meaning), which cannot be evaluated by means of classical numerical operators. Hence, the centroid of a dataset with textual attributes should be the term that minimises the semantic distance against the members of the set. Semantically-grounded methods aiming to construct centroids for datasets with textual attributes are scarce and, as it will be discussed in this paper, they are hampered by their limited semantic analysis of data. In this paper, we propose a method that, exploiting the knowledge provided by background ontologies (like WordNet), is able to construct the centroid of multivariate datasets described by means of textual attributes. Special efforts have been put in the minimisation of the semantic distance between the centroid and the input data. As a result, our method is able to provide optimal centroids (i.e., those that minimise the distance to all the objects in the dataset) according to the exploited background ontology and a semantic similarity measure. Our proposal has been evaluated by means of a real dataset consisting on short textual answers provided by visitors of a natural park. Results show that our centroids retain the semantic content of the input data better than related works.

Collaboration


Dive into the Aida Valls's collaboration.

Top Co-Authors

Avatar

Antonio Moreno

Autonomous University of Madrid

View shared research outputs
Top Co-Authors

Avatar

David Sánchez

Instituto de Salud Carlos III

View shared research outputs
Top Co-Authors

Avatar

Montserrat Batet

Open University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Karina Gibert

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Domenec Puig

Rovira i Virgili University

View shared research outputs
Top Co-Authors

Avatar

Lucas Marin

Rovira i Virgili University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge