Is this you? Create Your Porfile

Eli Cortez

Federal University of Amazonas

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eli Cortez is active.

Explore More

Publication

Featured researches published by Eli Cortez.

acm/ieee joint conference on digital libraries | 2007

FLUX-CIM: flexible unsupervised extraction of citation metadata

Eli Cortez; Altigran Soares da Silva; Marcos André Gonçalves; Filipe de Sá Mesquita; Edleno Silva de Moura

In this paper we propose a knowledge-base approach to help extracting the correct components of citations in any given format. Differently from related approaches that rely on manually built knowledge-bases (KBs) for recognizing the components of a citation, in our case, such a KB is automatically constructed from an existing set of sample metadata records from a given area (e.g., computer science or health sciences). Our approach does not rely on patterns encoding specific delimitators of a particular citation style. It is also unsupervised, in the sense that it does not rely on a learning method that requires a training phase. These features assign to our technique a high degree of automation and flexibility. To demonstrate the effectiveness and applicability of our proposed approach we have run experiments in which we applied it to extract information from citations in papers of two different domains. Results of these experiments indicate precision and recall levels above 94% and perfect extraction for the large majority of citations tested.

international conference on management of data | 2010

ONDUX: on-demand unsupervised learning for information extraction

Eli Cortez; Altigran Soares da Silva; Marcos André Gonçalves; Edleno Silva de Moura

Information extraction by text segmentation (IETS) applies to cases in which data values of interest are organized in implicit semi-structured records available in textual sources (e.g. postal addresses, bibliographic information, ads). It is an important practical problem that has been frequently addressed in the recent literature. In this paper we introduce ONDUX (On Demand Unsupervised Information Extraction), a new unsupervised probabilistic approach for IETS. As other unsupervised IETS approaches, ONDUX relies on information available on pre-existing data to associate segments in the input string with attributes of a given domain. Unlike other approaches, we rely on very effective matching strategies instead of explicit learning strategies. The effectiveness of this matching strategy is also exploited to disambiguate the extraction of certain attributes through a reinforcement step that explores sequencing and positioning of attribute values directly learned on-demand from test data, with no previous human-driven training, a feature unique to ONDUX. This assigns to ONDUX a high degree of flexibility and results in superior effectiveness, as demonstrated by the experimental evaluation we report with textual sources from different domains, in which ONDUX is compared with a state-of-art IETS approach.

very large data bases | 2010

A probabilistic approach for automatically filling form-based web interfaces

Guilherme A. Toda; Eli Cortez; Altigran Soares da Silva; Edleno Silva de Moura

In this paper we present a proposal for the implementation and evaluation of a novel method for automatically using data-rich text for filling form-based input interfaces. Our solution takes a text as input, extracts implicit data values from it and fills appropriate fields. For this task, we rely on knowledge obtained from values of previous submissions for each field, which are freely obtained from the usage of the interfaces. Our approach, called iForm, exploits features related to the content and the style of these values, which are combined through a Bayesian framework. Through extensive experimentation, we show that our approach is feasible and effective, and that it works well even when only a few previous submissions to the input interface are available.

international conference on management of data | 2011

Joint unsupervised structure discovery and information extraction

Eli Cortez; Daniel Oliveira; Altigran Soares da Silva; Edleno Silva de Moura; Alberto H. F. Laender

In this paper we present JUDIE (Joint Unsupervised Structure Discovery and Information Extraction), a new method for automatically extracting semi-structured data records in the form of continuous text (e.g., bibliographic citations, postal addresses, classified ads, etc.) and having no explicit delimiters between them. While in state-of-the-art Information Extraction methods the structure of the data records is manually supplied the by user as a training step, JUDIE is capable of detecting the structure of each individual record being extracted without any user assistance. This is accomplished by a novel Structure Discovery algorithm that, given a sequence of labels representing attributes assigned to potential values, groups these labels into individual records by looking for frequent patterns of label repetitions among the given sequence. We also show how to integrate this algorithm in the information extraction process by means of successive refinement steps that alternate information extraction and structure discovery. Through an extensively experimental evaluation with different datasets in distinct domains, we compare JUDIE with state-of-the-art information extraction methods and conclude that, even without any user intervention, it is able to achieve high quality results on the tasks of discovering the structure of the records and extracting information from them.

acm ieee joint conference on digital libraries | 2011

Building a research social network from an individual perspective

Alberto H. F. Laender; Mirella M. Moro; Marcos André Gonçalves; Clodoveu A. Davis; Altigran Soares da Silva; Allan J. C. Silva; Carolina A. S. Bigonha; Daniel Hasan Dalip; Eduardo M. Barbosa; Eli Cortez; Peterson S. Procópio; Rafael Odon de Alencar; Thiago N. C. Cardoso; Thiago Salles

In this poster paper, we present an overview of CiênciaBrasil, a research social network involving researchers within the Brazilian INCT program. We describe its architecture and the solutions adopted for data collection, extraction, and deduplication, and for materializing and visualizing the network.

international world wide web conferences | 2009

Automatically filling form-based web interfaces with free text inputs

Guilherme A. Toda; Eli Cortez; Filipe de Sá Mesquita; Altigran Soares da Silva; Edleno Silva de Moura; Marden S. Neubert

On the web of today the most prevalent solution for users to interact with data-intensive applications is the use of form-based interfaces composed by several data input fields, such as text boxes, radio buttons, pull-down lists, check boxes, etc. Although these interfaces are popular and effective, in many cases, free text interfaces are preferred over form-based ones. In this paper we discuss the proposal and the implementation of a novel IR-based method for using data rich free text to interact with form-based interfaces. Our solution takes a free text as input, extracts implicitly data values from it and fills appropriate fields using them. For this task, we rely on values of previous submissions for each field, which are freely obtained from the usage of form-based interfaces

Journal of the Association for Information Science and Technology | 2011

Lightweight methods for large-scale product categorization

Eli Cortez; Mauro Rojas Herrera; Altigran Soares da Silva; Edleno Silva de Moura; Marden S. Neubert

In this article, we present a study about classification methods for large-scale categorization of product offers on e-shopping web sites. We present a study about the performance of previously proposed approaches and deployed a probabilistic approach to model the classification problem. We also studied an alternative way of modeling information about the description of product offers and investigated the usage of price and store of product offers as features adopted in the classification process. Our experiments used two collections of over a million product offers previously categorized by human editors and taxonomies of hundreds of categories from a real e-shopping web site. In these experiments, our method achieved an improvement of up to 9% in the quality of the categorization in comparison with the best baseline we have found.

international conference on management of data | 2010

Unsupervised strategies for information extraction by text segmentation

Eli Cortez; Altigran Soares da Silva

Information extraction by text segmentation (IETS) applies to cases in which data values of interest are organized in implicit semi-structured records available in textual sources (e.g. postal addresses, bibliographic information, ads). It is an important practical problem that has been frequently addressed in the recent literature. We report here partial results from a PhD thesis work in which we introduce ONDUX (On Demand Unsupervised Information Extraction), a new unsupervised probabilistic approach for IETS. As other unsupervised IETS approaches, ONDUX relies on information available on pre-existing data to associate segments in the input string with attributes of a given domain. Unlike other approaches, we rely on very effective matching strategies instead of explicit learning strategies. The effectiveness of this matching strategy is also exploited to disambiguate the extraction of certain attributes through a reinforcement step that explores sequencing and positioning of attribute values directly learned on-demand from test data, with no previous human-driven training, a feature unique to ONDUX. This assigns to ONDUX a high degree of flexibility and results in superior effectiveness, as demonstrated by experimental evaluation we have carried out with textual sources from different domains, in which ONDUX is compared with a state-of-art IETS approach.

web information and data management | 2007

FleDEx: flexible data exchange

Filipe de Sá Mesquita; Denilson Barbosa; Eli Cortez; Altigran Soares da Silva

We propose a lightweight framework for data exchange that is suitable for non-expert and casual users sharing data on the Web or through peer-to-peer systems. Unlike previous work, we consider a simplistic data model and schema formalism that are suitable for describing typical online data, and propose algorithms for mapping such schemas as well as for translating the corresponding instances. Our solution requires minimal overhead and setup costs compared to existing data exchange systems, making it very attractive in the Web data exchange setting. We report experimental results indicating that our method works well with real Web data from various domains.

Archive | 2013

Exploiting Pre-Existing Datasets to Support IETS

Eli Cortez; Altigran Soares da Silva

This chapter describes in detail a new approach for exploiting preexisting datasets to support Information Extraction by Text Segmentation methods. First, it presents a brief overview of the approach and introduces the concept of knowledge base. Next, it discusses all the steps involved in the unsupervised approach, including how to learn content-based features from knowledge bases, how to automatically induce structure-based features with no previous human-driven training, a feature that is unique to this approach, and how to effectively combine these features to label segments of a text input.

Explore More