Patricia Jiménez | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Patricia Jiménez is active.

Explore More

Publication

Featured researches published by Patricia Jiménez.

practical applications of agents and multi agent systems | 2010

Integrating Deep-Web Information Sources

Iñaki Fernández de Viana; Inma Hernández; Patricia Jiménez; Carlos R. Rivero; Hassan A. Sleiman

Deep-web information sources are difficult to integrate into automated business processes if they only provide a search form. A wrapping agent is a piece of software that allows a developer to query such information sources without worrying about the details of interacting with such forms. Our goal is to help software engineers construct wrapping agents that interpret queries written in high-level structured languages.We think that this shall definitely help reduce integration costs because this shall relieve developers from the burden of transforming their queries into low-level interactions in an ad-hoc manner. In this paper, we report on our reference framework, delve into the related work, and highlight current research challenges. This is intended to help guide future research efforts in this area.

Knowledge and Information Systems | 2016

Roller: a novel approach to Web information extraction

Patricia Jiménez; Rafael Corchuelo

The research regarding Web information extraction focuses on learning rules to extract some selected information from Web documents. Many proposals are ad hoc and cannot benefit from the advances in machine learning; furthermore, they are likely to fade away as the Web evolves, and their intrinsic assumptions are not satisfied. Some authors have explored transforming Web documents into relational data and then using techniques that got inspiration from inductive logic programming. In theory, such proposals should be easier to adapt as the Web evolves because they build on catalogues of features that can be adapted without changing the proposals themselves. Unfortunately, they are difficult to scale as the number of documents or features increases. In the general field of machine learning, there are propositio-relational proposals that attempt to provide effective and efficient means to learn from relational data using propositional techniques, but they have seldom been explored regarding Web information extraction. In this article, we present a new proposal called Roller: it relies on a search procedure that uses a dynamic flattening technique to explore the context of the nodes that provide the information to be extracted; it is configured with an open catalogue of features, so that it can adapt to the evolution of the Web; it also requires a base learner and a rule scorer, which helps it benefit from the continuous advances in machine learning. Our experiments confirm that it outperforms other state-of-the-art proposals in terms of effectiveness and that it is very competitive in terms of efficiency; we have also confirmed that our conclusions are solid from a statistical point of view.

Information Systems | 2016

On learning web information extraction rules with TANGO

Patricia Jiménez; Rafael Corchuelo

The research on Enterprise Systems Integration focuses on proposals to support business processes by re-using existing systems. Wrappers help re-use web applications that provide a user interface only. They emulate a human user who interacts with them and extracts the information of interest in a structured format. In this article, we present TANGO, which is our proposal to learn rules to extract information from semi-structured web documents with high precision and recall, which is a must in the context of Enterprise Systems Integration. It relies on an open catalogue of features that helps map the input documents into a knowledge base in which every DOM node is represented by means of HTML, DOM, CSS, relational, and user-defined features. Then a procedure with many variation points is used to learn extraction rules from that knowledge base; the variation points include heuristics that range from how to select a condition to how to simplify the resulting rules. We also provide a systematic method to help re-configure our proposal. Our exhaustive experimentation proves that it beats others regarding effectiveness and is efficient enough for practical purposes. Our proposal was devised to be as configurable as possible, which helps adapt it to particular web sites and evolve it when necessary. HighlightsTANGO can be adapted to particular websites or to keep with the evolution of HTML.It relies on an open catalogue of features and a highly configurable learning process.We provide a method to help re-configure our proposal to improve the effectiveness.It beats other state-of-the-art proposals regarding effectiveness.

business information systems | 2015

A Novel Approach to Web Information Extraction

Antonia M. Reina Quintero; Patricia Jiménez; Rafael Corchuelo

Business Intelligence requires the acquisition and aggregation of key pieces of knowledge from multiple sources in order to provide valuable information to customers. The Web is the largest source of information nowadays. Unfortunately, the information it provides is available in semi-structured human-friendly formats, which makes it difficult to be processed by automated business processes. Classical propositional and ILP machine-learning techniques have been applied for this purpose. However, the former have not enough expressive power, whereas the latter are more expressive but intractable with large datasets. Propositionalisation was devised as a means to provide propositional techniques with more expressive power, enabling them to exploit structural information in a propositional way that allows them to be efficient. In this paper, we present a proposal to extract information from semi-structured web documents that uses this approach. It leverages a classical propositional machine learning technique and enhances it with the ability to learn from an unbounded context, which helps increase its precision and recall. Our experiments prove that our proposal outperforms other state-of-art techniques in the literature.

Knowledge Based Systems | 2016

ARIEX: Automated ranking of information extractors

Patricia Jiménez; Rafael Corchuelo; Hassan A. Sleiman

Abstract Information extractors are used to transform the user-friendly information in a web document into structured information that can be used to feed a knowledge-based system. Researchers are interested in ranking them to find out which one performs the best. Unfortunately, many rankings in the literature are deficient. There are a number of formal methods to rank information extractors, but they also have many problems and have not reached widespread popularity. In this article, we present ARIEX, which is an automated method to rank web information extraction proposals. It does not have any of the problems that we have identified in the literature. Our proposal shall definitely help authors make sure that they have advanced the state of the art not only conceptually, but from an empirical point of view; it shall also help practitioners make informed decisions on which proposal is the most adequate for a particular problem.

practical applications of agents and multi agent systems | 2015

Feeding Software Agents with Web Information

Patricia Jiménez; Hassan A. Sleiman; Rafael Corchuelo

Many software agents require information that is available in web documents. Unfortunately, the existing proposals to learn extraction rules are tightly coupled with the learning component and do not result in resilient rules. We present a novel approach that leverages neural networks and has proven to be very resilient.

Proceedings of the fifth ACM/IEEE Workshop on Hot Topics in Web Systems and Technologies | 2017

Extracting web information using representation patterns

Juan C. Roldán; Patricia Jiménez; Rafael Corchuelo

Feeding decision support systems with Web information typically requires sifting through an unwieldy amount of information that is available in human-friendly formats only. Our focus is on a scalable proposal to extract information from semi-structured documents in a structured format, with an emphasis on it being scalable and open. By semi-structured we mean that it must focus on information that is rendered using regular formats, not free text; by scalable, we mean that the system must require a minimum amount of human intervention and it must not be targeted to extracting information from a particular domain or web site; by open, we mean that it must extract as much useful information as possible and not be subject to any pre-defined data model. In the literature, there is only one open but not scalable proposal, since it requires human supervision on a per-domain basis. In this paper, we present a new proposal that relies on a number of heuristics to identify patterns that are typically used to represent the information in a web document. Our experimental results confirm that our proposal is very competitive in terms of effectiveness and efficiency.

international work-conference on artificial and natural neural networks | 2015

On Member Labelling in Social Networks

Rafael Corchuelo; Antonia M. Reina Quintero; Patricia Jiménez

Software agents are increasingly used to search for experts, recommend resources, assess opinions, and other similar tasks in the context of social networks, which requires to have accurate information that describes the features of the members of the network. Unfortunately, many member profiles are incomplete, which has motivated many authors to work on automatic member labelling, that is, on techniques that can infer the null features of a member from his or her neighbourhood. Current proposals are based on local or global approaches; the former compute predictors from local neighbourhoods, whereas the latter analyse social networks as a whole. Their main problem is that they tend to be inefficient and their effectiveness degrades significantly as the percentage of null labels increases. In this paper, we present Katz, which is a novel hybrid proposal to solve the member labelling problem using neural networks. Our experiments prove that it outperforms other proposals in the literature in terms of both effectiveness and efficiency.

business information systems | 2015

On Extracting Information from Semi-structured Deep Web Documents

Patricia Jiménez; Rafael Corchuelo

Some software agents need information that is provided by some web sites, which is difficult if they lack a query API. Information extractors are intended to extract the information of interest automatically and offer it in a structured format. Unfortunately, most of them rely on ad-hoc techniques, which make them fade away as the Web evolves. In this paper, we present a proposal that relies on an open catalogue of features that allows to adapt it easily; we have also devised an optimisation that allows it to be very efficient. Our experimental results prove that our proposal outperforms other state-of-the-art proposals.

practical applications of agents and multi agent systems | 2012

On Relational Learning for Information Extraction

Patricia Jiménez; José Luis Arjona; José Luis Álvarez

The extraction and integration of data from multiples sources are required in current companies which manage their business process by heterogeneous collaborating applications. However, integrating web applications is an arduous task because they are intended for human consumption and they do not provide APIs to access to their data automatically.Web Information extractors are used for this purpose but, they mostly provide ad-hoc highly domain dependent solutions. In this paper we aim at devising Information Extractors with a FOIL based core algorithm. It is a widely used first order rule learning algorithm since their rules are substantially more expressive and allow to learn complex concepts that cannot be represented in the attribute-value format. Furthermore, we focus on integrating other scoring functions to check if we can improve the rule search guide speeding up the learning process in order to make FOIL tractable in real-world domains such as Web sources.

Explore More