Altigran Soares da Silva
Universidade Federal de Minas Gerais
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Altigran Soares da Silva.
international conference on management of data | 2002
Alberto H. F. Laender; Berthier A. Ribeiro-Neto; Altigran Soares da Silva; Juliana Silveira Teixeira
In the last few years, several works in the literature have addressed the problem of data extraction from Web pages. The importance of this problem derives from the fact that, once extracted, the data can be handled in a way similar to instances of a traditional database. The approaches proposed in the literature to address the problem of Web data extraction use techniques borrowed from areas such as natural language processing, languages and grammars, machine learning, information retrieval, databases, and ontologies. As a consequence, they present very distinct features and capabilities which make a direct comparison difficult to be done. In this paper, we propose a taxonomy for characterizing Web data extraction fools, briefly survey major Web data extraction tools described in the literature, and provide a qualitative analysis of them. Hopefully, this work will stimulate other studies aimed at a more comprehensive analysis of data extraction approaches and tools for Web data.
conference on information and knowledge management | 1999
Berthier A. Ribeiro-Neto; Alberto H. F. Laender; Altigran Soares da Silva
In this paper, we describe an innovative approach to extracting semi-structured data from Web sources. The idea is to collect a couple of example objects from the user and to use this information to extract new objects from new pages or texts. To perform the extraction of new objects, we introduce a bottom-up extration strategy and, through experimentation, demonstrate that it works quite effectively with distinct Web sources, even if only a few examples are provided by the user.
conceptual modeling approaches for e business | 2000
Paulo Braz Golgher; Alberto H. F. Laender; Altigran Soares da Silva; Berthier A. Ribeiro-Neto
In the so-called Web information systems, the role of extracting data of interest from Web sites is played by software components generically known as wrappers. As a result, the existence of flexible tools for designing, developing and maintaining wrappers is crucial. In this paper, we present WByE (Wrapping By Example), a user-oriented set of tools for helping the user to build wrappers. WByE is based on information implicitly provided by the user by means of suitable and intuitive interfaces. It includes two components: the ASByE tool, used for generating specifications on how to fetch desired pages (be them static or dynamic), and the DEByE tool, used for the extraction of data implicitly present in the fetched pages.
conference on information and knowledge management | 2002
Pável Calado; Altigran Soares da Silva; Rodrigo C. Vieira; Alberto H. F. Laender; Berthier A. Ribeiro-Neto
On-line information services have become widespread in the Web nowadays. However, Web users are non-specialized and have a great variety of interests. Thus, interfaces for Web databases must be simple and uniform. In this paper we present an approach, based on Bayesian networks, for querying Web databases using keywords only. According to this approach, the user inputs a query through a simple search-box interface. From the input query, one or more plausible structured queries are derived and submitted to Web databases. The results are then retrieved and presented to the user as ranked answers. Our approach reduces the complexity of existing on-line interfaces and offers a solution to the problem of querying several distinct Web databases with a single interface. The applicability of the proposed approach was demonstrated by experimental results with 3 databases, obtained with a prototype search system that implements it. We have found that from 77% to 95% of the time, one of the top three resulting structured queries is the proper one. Further, when the user selects one of these three top queries for processing, the ranked answers present average precision figures from 60% to about 100%.
conference on information and knowledge management | 2001
Paulo Braz Golgher; Altigran Soares da Silva; Alberto H. F. Laender; Berthier A. Ribeiro-Neto
The effortless generation of wrappers for Web data sources is a crucial task if proper access to the huge amount of semi-structured data on the Web is to be granted. In particular, the development of strategies for wrapper generation based on user-given examples is currently one of the most promising research directions in Web data extraction. In this paper we show how to use a pre-existing data repository to automatically generate examples and allow full automated example-based data extraction. To demonstrate the feasibility of our approach we provide a number of results obtained from experiments we carried out and discuss how our ideas can be used to improve extraction rates and for providing resilience and adaptiveness for example-based generated wrappers.
international conference on conceptual modeling | 2002
Altigran Soares da Silva; Irna M. R. Evangelista Filha; Alberto H. F. Laender; David W. Embley
This paper proposes an approach to representing and querying semistructured Web data. The proposed approach is based on nested tables, which may have internal nested structural variations to accommodate semistructured data. Our motivation is to reduce the complexity found in typical query languages for semistructured data and to provide users with an alternative for quickly querying data obtained from multiple-record Web pages. We show the feasibility of our proposal by developing a prototype for a graphical query interface called QSByE (Querying Semistructured data By Example). For QSByE, we define a particular variation of nested tables and propose a set of QBE-like operations that extends typical nested-relational-algebra operations to handle semistructured data. We show examples of how users can pose interesting queries using QSByE.
electronic commerce and web technologies | 2000
Alberto H. F. Laender; Berthier A. Ribeiro-Neto; Altigran Soares da Silva; Elaine E. Silva
The popularization of the Web has made a huge volume of data available for a large audience. In a large number of Web sites, such as bookstores, electronic catalogs, travel agencies, etc., the pages constitute documents which are composed of pieces of data whose overall structure can be easily recognized. Such pages are called data-rich and can be seen as collections of complex objects. In this paper, we show how such objects can be represented by nested tables, which are simple, intuitive, and quite convenient for expressing their implicit structure. The assumption is that, for most sites of interest, only few examples are required to reveal the structure of the objects. To corroborate our assumption, we describe a data extraction tool that adopts this approach and present results of some experiments carried out with this tool.
conference on advanced information systems engineering | 2002
Irna M. R. Evangelista Filha; Altigran Soares da Silva; Alberto H. F. Laender; David W. Embley
This paper proposes an approach for representing and querying semistructured Web data, which is based on nested tables allowing internal nested structural variations. Our motivation is to reduce the complexity found in typical query languages for semistructured data and to provide users with an alternative for quickly querying data obtained from multiple-record Web pages. We show the feasibility of our proposal by developing a prototype for a graphical query interface called QSByE (Querying Semistructured data By Example), which implements a set of QBE-like operations that extends typical nested-relational-algebra operations to handle semistructured data.
Information Systems | 2000
Altigran Soares da Silva; Alberto H. F. Laender; Marco A. Casanova
Abstract The mapping of entity-relationship schemas (ER schemas) that contain complex specialization structures into the relational model requires the use of specific strategies to avoid inconsistent states in the final relational database. In this paper, we show that for this mapping to be correct it is required to enforce a special kind of integrity constraint, the key pairing constraint (KPC) . We present a mapping strategy that use simple inclusion dependencies to enforce KPC and show that this strategy can be used to correctly map specialization structures that are more general than the simple specialization structures considered by previous strategies.
string processing and information retrieval | 2002
Davi de Castro Reis; Robson Braga Araújo; Altigran Soares da Silva; Berthier A. Ribeiro-Neto
To cope with the irregularities of typical semistructured Web data, extraction tools usually break the extraction task in two phases: an extraction phase, in which atomic attribute values are extracted from Web pages, and an assembling phase, in which these atomic values are grouped to form complex objects. As a consequence, the whole process is highly dependent on the attribute values collected in the first phase. All attribute values of interest should be properly recognized and spurious values should be discarded. Thus, attribute values extraction is an important problem. In this paper, we propose a new framework for generating attribute value extractors. The main appeal of this framework is that it can be adapted for dealing with specific types of data sources and to incorporate distinct types of heuristics for achieving good extraction performance. To demonstrate the feasibility of this proposal, we present an implementation of this framework for data-rich Web pages and show how a number of simple heuristics, some of them presented in the recent literature, can be incorporated into this framework. We also show experimental results and, in most cases, our results are at least as good as results previously presented in the literature.