Victor Sosa-Sosa
CINVESTAV
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Victor Sosa-Sosa.
Expert Systems With Applications | 2013
Ana B. Rios-Alvarado; Ivan Lopez-Arevalo; Victor Sosa-Sosa
Ontologies play a very important role in knowledge management and the Semantic Web, their use has been exploited in many current applications. Ontologies are especially useful because they support the exchange and sharing of information. Ontology learning from text is the process of deriving high-level concepts and their relations. An important task in ontology learning from text is to obtain a set of representative concepts to model a domain and organize them into a hierarchical structure (taxonomy) from unstructured information. In the process of building a taxonomy, the identification of hypernym/hyponym relations between terms is essential. How to automatically build the appropriate structure to represent the information contained in unstructured texts is a challenging task. This paper presents a novel method to obtain, from unstructured texts, representative concepts and their taxonomic relationships in a specific knowledge domain. This approach builds a concept hierarchy from a specific-domain corpus by using a clustering algorithm, a set of linguistic patterns, and additional contextual information extracted from the Web that improves the discovery of the most representative hypernym/hyponym relationships. A set of experiments were carried out using four different corpora. We evaluated the quality of the constructed taxonomies against gold standard ontologies, the experiments show promising results.
Journal of Systems and Software | 2013
José Luis González; Jesús Carretero Pérez; Victor Sosa-Sosa; Juan F. Rodriguez Cardoso; Ricardo Marcelín-Jiménez
Organizations are gradually outsourcing storage services such as online hosting files, backup, and archival to public providers. There are however concerns with this process because organizations cannot access files when the service provider is unavailable as well as they have no control and no assurance on the management procedures related to data. As a result, organizations are exploring alternatives to build their own multi-tenant storage capacities. This paper presents the design, implementation and performance evaluation of an approach for constructing private online storage services. A hierarchical multi-tier architecture has been proposed to concentrate these services in a unified storage system, which applies fault-tolerant and availability strategies to the files by passing redundant information among the services or tiers. Our approach automates the construction of such a unified system, the data allocation procedure and the recovery process to overcome site failures. The parameters involved in the performance of the storage services are concentrated into intuitive metrics based on utilization percentage, which simplifies the administration of the storage system. We show our performance assessments and the lessons learned from a case study in which a federated storage network has been built from four trusted organizations spanning two different continents.
Simulation Modelling Practice and Theory | 2015
José Luis González; Jesús Carretero Pérez; Victor Sosa-Sosa; Luis Miguel Sanchez; Borja Bergua
Cloud-based storage is a popular outsourcing solution for organizations to deliver contents to end-users. However, there is a need for contingency plans to ensure service provision when the provider either suffers outages or is going out of business. This paper presents SkyCDS: a resilient content delivery service based on a publish/subscribe overlay over diversified cloud storage. SkyCDS splits the content delivery into metadata and content storage flow layers. The metadata flow layer is based on publish–subscribe patterns for insourcing the metadata control back to content owner. The storage layer is based on dispersal information over multiple cloud locations with which organizations outsource content storage in a controlled manner. In SkyCDS, the content dispersion is performed on the publisher side and the content retrieving process on the end-user side (the subscriber), which reduces the load on the organization side only to metadata management. SkyCDS also lowers the overhead of the content dispersion and retrieving processes by taking advantage of multi-core technology. A new allocation strategy based on cloud storage diversification and failure masking mechanisms minimize side effects of temporary, permanent cloud-based service outages and vendor lock-in. We developed a SkyCDS prototype that was evaluated by using synthetic workloads and a study case with real traces. Publish/subscribe queuing patterns were evaluated by using a simulation tool based on characterized metrics taken from experimental evaluation. The evaluation revealed the feasibility of SkyCDS in terms of performance, reliability and storage space profitability. It also shows a novel way to compare the storage/delivery options through risk assessment.
intelligent information systems | 2013
Heidy M. Marin-Castro; Victor Sosa-Sosa; Jose F. Martinez-Trinidad; Ivan Lopez-Arevalo
The amount of information contained in databases available on the Web has grown explosively in the last years. This information, known as the Deep Web, is heterogeneous and dynamically generated by querying these back-end (relational) databases through Web Query Interfaces (WQIs) that are a special type of HTML forms. The problem of accessing to the information of Deep Web is a great challenge because the information existing usually is not indexed by general-purpose search engines. Therefore, it is necessary to create efficient mechanisms to access, extract and integrate information contained in the Deep Web. Since WQIs are the only means to access to the Deep Web, the automatic identification of WQIs plays an important role. It facilitates traditional search engines to increase the coverage and the access to interesting information not available on the indexable Web. The accurate identification of Deep Web data sources are key issues in the information retrieval process. In this paper we propose a new strategy for automatic discovery of WQIs. This novel proposal makes an adequate selection of HTML elements extracted from HTML forms, which are used in a set of heuristic rules that help to identify WQIs. The proposed strategy uses machine learning algorithms for classification of searchable (WQIs) and non-searchable (non-WQI) HTML forms using a prototypes selection algorithm that allows to remove irrelevant or redundant data in the training set. The internal content of Web Query Interfaces was analyzed with the objective of identifying only those HTML elements that are frequently appearing provide relevant information for the WQIs identification. For testing, we use three groups of datasets, two available at the UIUC repository and a new dataset that we created using a generic crawler supported by human experts that includes advanced and simple query interfaces. The experimental results show that the proposed strategy outperforms others previously reported works.
Proceedings of the Third International Workshop on Keyword Search on Structured Data | 2012
Jaime I. Lopez-Veyna; Victor Sosa-Sosa; Ivan Lopez-Arevalo
Most of the information on the Web can be currently classified according to its (information) structure in three different forms: unstructured (plain text), semi-structured (XML files) and structured (tables in a relational database). Currently Web search is the primary way to access massive information. Keyword search also becomes an alternative of querying over relational databases and XML documents, which is simple to people who are familiar with the use of Web search engines. There are several approaches to perform keyword search over relational databases such as Steiner Trees, Candidate Networks and Tuple Units. However these methods have some constraints. The Steiner Trees method is considered a NP-hard problem, moreover, a real databases can produce a large number of Steiner Trees, which are difficult to identify and index. The Candidate Network approach first needs to generate the candidate networks and then to evaluate them to find the best answer. The problem is that for a keyword query the number of Candidate Networks can be very large and to find a common join expression to evaluate all the candidate networks could require a big computational effort. Finally, the use of Tuple Units in a general conception produce very large structures that most of the time store redundant information. To address this problem we propose a novel approach for keywords search over structured data (KESOSD). KESOSD models the structured information as graphs and proposed the use of a keyword-structure-aware-index called KSAI that captures the implicit structural relationships of the information producing fast and accuracy search responses. We have conducted some experiments and the results show that KESOSD achieves high search efficiency and high accuracy for keyword search over structured data.
international conference on electrical engineering, computing science and automatic control | 2008
Dulce Aguilar-Lopez; Ivan Lopez-Arevalo; Victor Sosa-Sosa
This work describes a Web search approach taking into account the semantic content of Web pages. Eliminating irrelevant Web pages, the time-consuming task of revise the obtained results from actual search engines is reduced. The proposed approach is focused on Web pages that are not defined with semantic Web structure (most of the actual Web pages are in this format). The challenge is extract the semantic content from heterogeneous and human oriented Web pages. The approach integrates structures of ontologies, WordNet, and a hierarchical similarity measure to determine the relevance of a Web page.
International Journal of Digital Earth | 2018
J.L. Gonzalez-Compean; Victor Sosa-Sosa; Arturo Diaz-Perez; Jesús Carretero; Ricardo Marcelín-Jiménez
ABSTRACT Earth observation satellites produce large amounts of images/data that not only must be processed and preserved in reliable geospatial platforms but also efficiently disseminated among partners/researchers for creating derivative products through collaborative workflows. Organizations can face up this challenge in a cost-effective manner by using cloud services. However, outages and violations of integrity/confidentiality associated to this technology could arise. This article presents FedIDS, a suite of cloud-based components for building dependable geospatial platforms. The Fed component enables organizations to build shared geospatial data infrastructure through federation of independent cloud resources to withstand outages, whereas IDS avoids violations of integrity/confidentiality of images/data in sharing information and collaboration workflows. A FedIDS prototype, deployed in Spain and Mexico, was evaluated through a study case based on a satellite imagery captured by a Mexican antenna and another based on a satellite imagery of a European observation mission. The acquisition, storage and sharing of images among users of the federation, the exchange of images between Mexican and Spanish sites and outage scenarios were evaluated. The evaluation revealed the feasibility, reliability and efficiency of FedIDS, in comparison with available solutions, in terms of performance, storage consume and integrity/confidentiality when sharing images/data in collaborative scenarios.
international conference on cyber security and cloud computing | 2015
Jedidiah Yanez-Sierra; Arturo Diaz-Perez; Victor Sosa-Sosa; José Luis González
A major concern of users of cloud storage services is the lost of control over security, availability and privacy of their files. That is partially addressed by end-to-end encryption techniques. However, most of the solutions currently available offer rigid functionalities that cannot be rapidly integrated into customized tools to meet users requirements like, for example, file sharing with other users. This paper presents an end-to-end architecture that enables users to build secure and resilient work-flows for storing and sharing files in the cloud. The workflows are configurable structures executed on the user-side that perform processing operations on the files through chained stages such as data compression for capacity overhead reduction, file assurance for ensuring confidentiality when sharing files and information dispersion for storing files in n cloud locations and retrieving them even during outages of m cloud storage providers. The users can set up different workflows depending on their requirements because they can organize the processing units of each stage in either pipeline to improve its performance or stack for improving functionality. The stages and their processing units are connected using I/O communication interfaces which ensure a continuous data flow from the user/organization computers to multiple cloud locations. Based on our architecture, we developed a prototype for a private cloud infrastructure. The experimental evaluation revealed the feasibility of enabling flexible file sharing and storage user-defined workflows in terms of performance.
Journal of Medical Systems | 2015
Ana B. Rios-Alvarado; Ivan Lopez-Arevalo; Edgar Tello-Leal; Victor Sosa-Sosa
The access to medical information (journals, blogs, web-pages, dictionaries, and texts) has been increased due to availability of many digital media. In particular, finding an appropriate structure that represents the information contained in texts is not a trivial task. One of the structures for modeling the knowledge are ontologies. An ontology refers to a conceptualization of a specific domain of knowledge. Ontologies are especially useful because they support the exchange and sharing of information as well as reasoning tasks. The usage of ontologies in medicine is mainly focussed in the representation and organization of medical terminologies. Ontology learning techniques have emerged as a set of techniques to get ontologies from unstructured information. This paper describes a new ontology learning approach that consists of a method for the acquisition of concepts and its corresponding taxonomic relations, where also axioms disjointWith and equivalentClass are learned from text without human intervention. The source of knowledge involves files about medical domain. Our approach is divided into two stages, the first part corresponds to discover hierarchical relations and the second part to the axiom extraction. Our automatic ontology learning approach shows better results compared against previous work, giving rise to more expressive ontologies.
Information Sciences | 2014
Jaime I. Lopez-Veyna; Victor Sosa-Sosa; Ivan Lopez-Arevalo
A new keyword-search technique is described, which solves the problem of duplicate data in a Virtual Document approach.A complete keyword-based search engine architecture is presented.A reduction in indexing time and index size when applying the Virtual Document approach in large datasets is presented. Keyword Search has been recognised as a viable alternative for information search in semi-structured and structured data sources. Current state-of-the-art keyword-search techniques over relational databases do not take advantage of correlative meta-information included in structured and semi-structured data sources leaving relevant answers out. These techniques are also limited due to scalability, performance and precision issues that are evident when they are implemented on large datasets. Based on an in-depth analysis of issues related to indexing and ranking semi-structured and structured information. We propose a new keyword-search algorithm that takes into account the semantic information extracted from the schemes of the structured and semi-structured data sources and combine it with the textual relevance obtained by a common text retrieval approach. The algorithm is implemented in a keyword-based search engine called KESOSASD (Keyword Search Over Semi-structured and Structured Data), improving its precision and response time. Our approach models the semi-structured and structured information as graphs, and make use of a Virtual Document Structure Aware Inverted Index (VDSAII). This index is created from a set of logical structures called Virtual Documents, which capture and exploit the implicit structural relationships (semantics) depicted in the schemas of the structured and semi-structured data sources. Extensive experiments were conducted to demonstrate that KESOSASD outperforms existing approaches in terms of search efficiency and accuracy. Moreover, KESOSASD is prepared to scale out and manage large databases without degrading its effectiveness.