Alberto H. F. Laender
Universidade Federal de Minas Gerais
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alberto H. F. Laender.
international conference on management of data | 2002
Alberto H. F. Laender; Berthier A. Ribeiro-Neto; Altigran Soares da Silva; Juliana Silveira Teixeira
In the last few years, several works in the literature have addressed the problem of data extraction from Web pages. The importance of this problem derives from the fact that, once extracted, the data can be handled in a way similar to instances of a traditional database. The approaches proposed in the literature to address the problem of Web data extraction use techniques borrowed from areas such as natural language processing, languages and grammars, machine learning, information retrieval, databases, and ontologies. As a consequence, they present very distinct features and capabilities which make a direct comparison difficult to be done. In this paper, we propose a taxonomy for characterizing Web data extraction fools, briefly survey major Web data extraction tools described in the literature, and provide a qualitative analysis of them. Hopefully, this work will stimulate other studies aimed at a more comprehensive analysis of data extraction approaches and tools for Web data.
international world wide web conferences | 2004
Davi de Castro Reis; Paulo Braz Golgher; Altigran Soares da Silva; Alberto H. F. Laender
The Web poses itself as the largest data repository ever available in the history of humankind. Major efforts have been made in order to provide efficient access to relevant information within this huge repository of data. Although several techniques have been developed to the problem of Web data extraction, their use is still not spread, mostly because of the need for high human intervention and the low quality of the extraction results.In this paper, we present a domain-oriented approach to Web data extraction and discuss its application to automatically extracting news from Web sites. Our approach is based on a highly efficient tree structure analysis that produces very effective results. We have tested our approach with several important Brazilian on-line news sites and achieved very precise results, correctly extracting 87.71% of the news in a set of 4088 pages distributed among 35 different sites.
international conference on management of data | 2012
Anderson A. Ferreira; Marcos André Gonçalves; Alberto H. F. Laender
Name ambiguity in the context of bibliographic citation records is a hard problem that affects the quality of services and content in digital libraries and similar systems. The challenges of dealing with author name ambiguity have led to a myriad of disambiguation methods. Generally speaking, the proposed methods usually attempt to group citation records of a same author by finding some similarity among them or try to directly assign them to their respective authors. Both approaches may either exploit supervised or unsupervised techniques. In this article, we propose a taxonomy for characterizing the current author name disambiguation methods described in the literature, present a brief survey of the most representative ones and discuss several open challenges.
Geoinformatica | 2001
Karla A. V. Borges; Clodoveu A. Davis; Alberto H. F. Laender
Semantic and object-oriented data models, such as ER, OMT, IFO, and others, have been extensively used for modeling geographic applications. Despite their semantic expressiveness, such models present limitations to adequately model those applications, since they do not provide appropriate primitives for representing spatial data. This paper presents OMT-G, an object oriented data model for geographic applications. OMT-G provides primitives for modeling the geometry and the topology of spatial data, supporting different topological structures, multiple views of objects, and spatial relationships. OMT-G also includes tools to specify transformation processes and presentation alternatives, that allow, among many other possibilities, modeling for multiple representations and multiple presentations. In this way, it overcomes the main limitations of the existing models, thus providing more adequate tools for modeling geographic applications. A comparison with other data models is also presented in order to stress the main advantages of OMT-G.
Archive | 2000
Alberto H. F. Laender; Stephen W. Liddle; Veda C. Storey
Model management is a framework for supporting meta-data related applications where models and mappings are manipulated as first class objects using operations such as Match, Merge, ApplyFunction, and Compose. To demonstrate the approach, we show how to use model management in two scenarios related to loading data warehouses. The case study illustrates the value of model management as a methodology for approaching meta-data related problems. It also helps clarify the required semantics of key operations. These detailed scenarios provide evidence that generic model management is useful and, very likely, implementable.
Archive | 2009
Alberto H. F. Laender; Silvana Castano; Umeshwar Dayal; Fabio Casati; José Palazzo Moreira de Oliveira
ER 30th Anniversary Paper.- Thirty Years of ER Conferences: Milestones, Achievements, and Future Directions.- Keynotes.- A Frame Manipulation Algebra for ER Logical Stage Modelling.- Conceptual Modeling in the Time of the Revolution: Part II.- Data Auditor: Analyzing Data Quality Using Pattern Tableaux.- Schema AND Data: A Holistic Approach to Mapping, Resolution and Fusion in Information Integration.- Conceptual Modeling.- A Generic Set Theory-Based Pattern Matching Approach for the Analysis of Conceptual Models.- An Empirical Study of Enterprise Conceptual Modeling.- Formalizing Linguistic Conventions for Conceptual Models.- Requirements Engineering.- Monitoring and Diagnosing Malicious Attacks with Autonomic Software.- A Modeling Ontology for Integrating Vulnerabilities into Security Requirements Conceptual Foundations.- Modeling Domain Variability in Requirements Engineering with Contexts.- Foundational Aspects.- Information Networking Model.- Towards an Ontological Modeling with Dependent Types: Application to Part-Whole Relations.- Inducing Metaassociations and Induced Relationships.- Query Approaches.- Tractable Query Answering over Conceptual Schemata.- Query-By-Keywords (QBK): Query Formulation Using Semantics and Feedback.- Cluster-Based Exploration for Effective Keyword Search over Semantic Datasets.- Space and Time Modeling.- Geometrically Enhanced Conceptual Modelling.- Anchor Modeling.- Evaluating Exceptions on Time Slices.- Schema Matching and Integration.- A Strategy to Revise the Constraints of the Mediated Schema.- Schema Normalization for Improving Schema Matching.- Extensible User-Based XML Grammar Matching.- Ontology-Based Approaches.- Modeling Associations through Intensional Attributes.- Modeling Concept Evolution: A Historical Perspective.- FOCIH: Form-Based Ontology Creation and Information Harvesting.- Specifying Valid Compound Terms in Interrelated Faceted Taxonomies.- Application Contexts.- Conceptual Modeling in Disaster Planning Using Agent Constructs.- Modelling Safe Interface Interactions in Web Applications.- A Conceptual Modeling Approach for OLAP Personalization.- Creating User Profiles Using Wikipedia.- Process and Service Modeling.- Hosted Universal Composition: Models, Languages and Infrastructure in mashArt.- From Static Methods to Role-Driven Service Invocation - A Metamodel for Active Content in Object Databases.- Business Process Modeling: Perceived Benefits.- Industrial Session.- Designing Law-Compliant Software Requirements.- A Knowledge-Based and Model-Driven Requirements Engineering Approach to Conceptual Satellite Design.- Virtual Business Operating Environment in the Cloud: Conceptual Architecture and Challenges.
data and knowledge engineering | 2004
Juliano Palmieri Lage; Altigran Soares da Silva; Paulo Braz Golgher; Alberto H. F. Laender
As the Web grows, more and more data has become available under dynamic forms of publication, such as legacy databases accessed by an HTML form (the so called hidden Web). In situations such as this, integration of this data relies more and more on the fast generation of agents that can automatically fetch pages for further processing. As a result, there is an increasing need for tools that can help users generate such agents. In this paper, we describe a method for automatically generating agents to collect hidden Web pages. This method uses a pre-existing data repository for identifying the contents of these pages and takes the advantage of some patterns that can be found among Web sites to identify the navigation paths to follow. To demonstrate the accuracy of our method, we discuss the results of a number of experiments carried out with sites from different domains.
acm/ieee joint conference on digital libraries | 2009
Denilson Alves Pereira; Berthier A. Ribeiro-Neto; Nivio Ziviani; Alberto H. F. Laender; Marcos André Gonçalves; Anderson A. Ferreira
In digital libraries, ambiguous author names may occur due to the existence of multiple authors with the same name (polysemes) or different name variations for the same author (synonyms). We proposed here a new method that uses information available on the Web to deal with both problems at the same time. Our idea consists of gathering information from input citations and submitting queries to a Web search engine, aiming at finding curricula vitae and Web pages containing publications of the ambiguous authors. From the content of documents in the answer sets returned by the Web search engine, useful information that can help in the disambiguation process is extracted. Using this information, author names are disambiguated by leveraging a hierarchical clustering method that groups citations in the same document together in a bottom-up fashion. Experimental results show that the our method yields results that outperform those of two state-of-the-art unsupervised methods and are statistically comparable with those of a supervised one, but requiring no training. We observe gains of up to 65.2% in the pairwise F1 metric when compared with our best unsupervised baseline method.
acm ieee joint conference on digital libraries | 2011
Cristiano Nascimento; Alberto H. F. Laender; Altigran Soares da Silva; Marcos André Gonçalves
As the number of research papers available on the Web has increased enormously over the years, paper recommender systems have been proposed to help researchers on automatically finding works of interest. The main problem with the current approaches is that they assume that recommending algorithms are provided with a rich set of evidence (e.g., document collections, citations, profiles) which is normally not widely available. In this paper we propose a novel source independent framework for research paper recommendation. The framework requires as input only a single research paper and generates several potential queries by using terms in that paper, which are then submitted to existing Web information sources that hold research papers. Once a set of candidate papers for recommendation is generated, the framework applies content-based recommending algorithms to rank the candidates in order to recommend the ones most related to the input paper. This is done by using only publicly available metadata (i.e., title and abstract). We evaluate our proposed framework by performing an extensive experimentation in which we analyzed several strategies for query generation and several ranking strategies for paper recommendation. Our results show that good recommendations can be obtained with simple and low cost strategies.
acm/ieee joint conference on digital libraries | 2010
Anderson A. Ferreira; Adriano Veloso; Marcos André Gonçalves; Alberto H. F. Laender
Name ambiguity in the context of bibliographic citation records is a hard problem that affects the quality of services and content in digital libraries and similar systems. Supervised methods that exploit training examples in order to distinguish ambiguous author names are among the most effective solutions for the problem, but they require skilled human annotators in a laborious and continuous process of manually labeling citations in order to provide enough training examples. Thus, addressing the issues of (i) automatic acquisition of examples and (ii) highly effective disambiguation even when only few examples are available, are the need of the hour for such systems. In this paper, we propose a novel two-step disambiguation method, SAND (Self-training Associative Name Disambiguator), that deals with these two issues. The first step eliminates the need of any manual labeling effort by automatically acquiring examples using a clustering method that groups citation records based on the similarity among coauthor names. The second step uses a supervised disambiguation method that is able to detect unseen authors not included in any of the given training examples. Experiments conducted with standard public collections, using the minimum set of attributes present in a citation (i.e., author names, work title and publication venue), demonstrated that our proposed method outperforms representative unsupervised disambiguation methods that exploit similarities between citation records and is as effective as, and in some cases superior to, supervised ones, without manually labeling any training example.