Rafael Corchuelo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rafael Corchuelo is active.

Explore More

Publication

Featured researches published by Rafael Corchuelo.

IEEE Transactions on Knowledge and Data Engineering | 2013

A Survey on Region Extractors from Web Documents

Hassan A. Sleiman; Rafael Corchuelo

Extracting information from web documents has become a research area in which new proposals sprout out year after year. This has motivated several researchers to work on surveys that attempt to provide an overall picture of the many existing proposals. Unfortunately, none of these surveys provide a complete picture, because they do not take region extractors into account. These tools are kind of preprocessors, because they help information extractors focus on the regions of a web document that contain relevant information. With the increasing complexity of web documents, region extractors are becoming a must to extract information from many websites. Beyond information extraction, region extractors have also found their way into information retrieval, focused web crawling, topic distillation, adaptive content delivery, mashups, and metasearch engines. In this paper, we survey the existing proposals regarding region extractors and compare them side by side.

Concurrency and Computation: Practice and Experience | 2004

An order‐based algorithm for implementing multiparty synchronization

José Antonio Pérez; Rafael Corchuelo; Miguel Toro

Multiparty interactions are a powerful mechanism for coordinating several entities that need to cooperate in order to achieve a common goal. In this paper, we present an algorithm for implementing them that improves on previous results in that it does not require the whole set of entities or interactions to be known at compile‐ or run‐time, and it can deal with both terminating and non‐terminating systems. We also present a comprehensive simulation analysis that shows how sensitive to changes our algorithm is, and compare the results with well‐known proposals by other authors. This study proves that our algorithm still performs comparably to other proposals in which the set of entities and interactions is known beforehand, but outperforms them in some situations that are clearly identified. In addition, these results prove that our algorithm can be combined with a technique called synchrony loosening without having an effect on efficiency. Copyright

IEEE Transactions on Knowledge and Data Engineering | 2014

Trinity: On Using Trinary Trees for Unsupervised Web Data Extraction

Hassan A. Sleiman; Rafael Corchuelo

Web data extractors are used to extract data from web documents in order to feed automated processes. In this article, we propose a technique that works on two or more web documents generated by the same server-side template and learns a regular expression that models it and can later be used to extract data from similar documents. The technique builds on the hypothesis that the template introduces some shared patterns that do not provide any relevant data and can thus be ignored. We have evaluated and compared our technique to others in the literature on a large collection of web documents; our results demonstrate that our proposal performs better than the others and that input errors do not have a negative impact on its effectiveness; furthermore, its efficiency can be easily boosted by means of a couple of parameters, without sacrificing its effectiveness.

Knowledge Based Systems | 2013

TEX: An efficient and effective unsupervised Web information extractor

Hassan A. Sleiman; Rafael Corchuelo

The World Wide Web is an immense information resource. Web information extraction is the task that transforms human friendly Web information into structured information that can be consumed by automated business processes. In this article, we propose an unsupervised information extractor that works on two or more web documents generated by the same server side template. It finds and removes shared token sequences amongst these web documents until finding the relevant information that should be extracted from them. The technique is completely unsupervised and does not require maintenance, it allows working on malformed web documents, and does not require the relevant information to be formatted using repetitive patterns. Our complexity analysis reveals that our proposal is computationally tractable and our empirical study on real-world web documents demonstrates that it performs very fast and has a very high precision and recall.

international conference on requirements engineering | 2002

Supporting requirements verification using XSLT

Amador Durán; Antonio Ruiz-Cortés; Rafael Corchuelo; Miguel Toro

We present a light-weight approach for the automatic verification of requirements. This approach is not based on natural language parsing techniques but on the representation of requirements in XML. In our approach, XSLT stylesheets are used not only to automatically generate requirements documents, but also to provide verification-oriented heuristics as well as to measure the quality of requirements using some verification-oriented metrics. These ideas have been implemented in REM, an experimental XML-based requirements management tool also described.

international conference on conceptual modeling | 2011

Generating SPARQL executable mappings to integrate ontologies

Carlos R. Rivero; Inma Hernández; David Ruiz; Rafael Corchuelo

Data translation is an integration task that aims at populating a target model with data of a source model by means of mappings. Generating them automatically is appealing insofar it may reduce integration costs. Matching techniques automatically generate uninterpreted mappings, a.k.a. correspondences, that must be interpreted to perform the data translation task. Other techniques automatically generate executable mappings, which encode an interpretation of these correspondences in a given query language. Unfortunately, current techniques to automatically generate executable mappings are based on instance examples of the target model, which usually contains no data, or based on nested relational models, which cannot be straightforwardly applied to semantic-web ontologies. In this paper, we present a technique to automatically generate SPARQL executable mappings between OWL ontologies. The original contributions of our technique are as follows: 1) it is not based on instance examples but on restrictions and correspondences, 2) we have devised an algorithm to make restrictions and correspondences explicit over a number of language-independent executable mappings, and 3) we have devised an algorithm to transform language-independent into SPARQL executable mappings. Finally, we evaluate our technique over ten scenarios and check that the interpretation of correspondences that it assumes is coherent with the expected results.

International Journal of Cooperative Information Systems | 2011

A DOMAIN-SPECIFIC LANGUAGE TO DESIGN ENTERPRISE APPLICATION INTEGRATION SOLUTIONS

Rafael Z. Frantz; Antonia M. Reina Quintero; Rafael Corchuelo

Enterprise Application Integration (EAI) solutions cope with two kinds of problems within software ecosystems, namely: keeping a number of applications data in synchrony or creating new functionality on top of them. Enterprise Service Bus (ESB) provides the technology required to implement a variety of EAI solutions at sensible costs, but they are still far from negligible. It is not surprising then that many authors are working on proposals to endow them with domain-specific tools to help software engineers reduce integration costs. In this article, we introduce a proposal called Guarana. Its key features are as follows: it provides explicit support to devise EAI solutions using enterprise integration patterns by means of a graphical model; its DSL enables software engineers to have not only the view of a process, but also a view of the whole set of processes of which an EAI solution is composed; both processes and tasks can have multiple inputs and multiple outputs; and, finally, its runtime system provides a task-based execution model that is usually more efficient than the process-based execution models in current use. We have also implemented a graphical editor for our DSL and a set of scripts to transform our models into Java code ready to be compiled and executed. To set up a solution from this code, a software engineer only needs to configure a number of adapters to communicate with the applications being integrated.

Knowledge Based Systems | 2014

CALA: An unsupervised URL-based web page classification system

Inma Hernández; Carlos R. Rivero; David Ruiz; Rafael Corchuelo

Unsupervised web page classification refers to the problem of clustering the pages in a web site so that each cluster includes a set of web pages that can be classified using a unique class. The existing proposals to perform web page classification do not fulfill a number of requirements that would make them suitable for enterprise web information integration, namely: to be based on a lightweight crawling, so as to avoid interfering with the normal operation of the web site, to be unsupervised, which avoids the need for a training set of pre-classified pages, or to use features from outside the page to be classified, which avoids having to download it. In this article, we propose CALA, a new automated proposal to generate URL-based web page classifiers. Our proposal builds a number of URL patterns that represent the different classes of pages in a web site, so further pages can be classified by matching their URLs to the patterns. Its salient features are that it fulfills all of the previous requirements, and it has been validated by a number of experiments using real-world, top-visited web sites. Our validation proves that CALA is very effective and efficient in practice.

Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium | 1999

Implementing multiparty interactions on a network computer

Rafael Corchuelo; David Ruiz; Miguel Toro; Antonio Ruiz

Classical client/server interaction primitives such as remote procedure call or rendez-vous are not adequate when we need to describe the behaviour of three or more processes that need to collaborate simultaneously in order to solve a problem. Multiparty interactions are the key to describe these problems, and there are several languages that use them for the description of reactive systems. In this paper, we show and compare two different fair implementations of this mechanism and also outline the research we are carrying out in an effort to improve them.

Science of Computer Programming | 2012

A bargaining-specific architecture for supporting automated service agreement negotiation systems

Manuel Resinas; Pablo Fernandez; Rafael Corchuelo

The provision of services is often regulated by means of agreements that must be negotiated beforehand. Automating such negotiations is appealing insofar as it overcomes one of the most often cited shortcomings of human negotiation: slowness. Our analysis of the requirements of automated negotiation systems in open environments suggests that some of them cannot be tackled in a protocol-independent manner, which motivates the need for a protocol-specific architecture. However, current state-of-the-art bargaining architectures fail to address all of these requirements together. Our key contribution is a bargaining architecture that addresses all of the requirements we have identified. The definition of the architecture includes a logical view that identifies the key architectural elements and their interactions, a process view that identifies how the architectural elements can be grouped together into processes, a development view that includes a software framework that provides a reference implementation developers can use to build their own negotiation systems, and a scenarios view by means of which the architecture is illustrated and validated.

Explore More