Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marcos André Gonçalves is active.

Publication


Featured researches published by Marcos André Gonçalves.


conference on information and knowledge management | 2003

Combining link-based and content-based methods for web document classification

Pável Calado; Marco Cristo; Edleno Silva de Moura; Nivio Ziviani; Berthier A. Ribeiro-Neto; Marcos André Gonçalves

This paper studies how link information can be used to improve classification results for Web collections. We evaluate four different measures of subject similarity, derived from the Web link structure, and determine how accurate they are in predicting document categories. Using a Bayesian network model, we combine these measures with the results obtained by traditional content-based classifiers. Experiments on a Web directory show that best results are achieved when links from pages outside the directory are considered. Link information alone is able to obtain gains of up to 46 points in F1, when compared to a traditional content-based classifier. The combination with content-based methods can further improve the results, but too much noise may be introduced, since the text of Web pages is a much less reliable source of information. This work provides an important insight on which measures derived from links are more appropriate to compare Web documents and how these measures can be combined with content-based algorithms to improve the effectiveness of Web classification.


acm/ieee joint conference on digital libraries | 2009

Using web information for author name disambiguation

Denilson Alves Pereira; Berthier A. Ribeiro-Neto; Nivio Ziviani; Alberto H. F. Laender; Marcos André Gonçalves; Anderson A. Ferreira

In digital libraries, ambiguous author names may occur due to the existence of multiple authors with the same name (polysemes) or different name variations for the same author (synonyms). We proposed here a new method that uses information available on the Web to deal with both problems at the same time. Our idea consists of gathering information from input citations and submitting queries to a Web search engine, aiming at finding curricula vitae and Web pages containing publications of the ambiguous authors. From the content of documents in the answer sets returned by the Web search engine, useful information that can help in the disambiguation process is extracted. Using this information, author names are disambiguated by leveraging a hierarchical clustering method that groups citations in the same document together in a bottom-up fashion. Experimental results show that the our method yields results that outperform those of two state-of-the-art unsupervised methods and are statistically comparable with those of a supervised one, but requiring no training. We observe gains of up to 65.2% in the pairwise F1 metric when compared with our best unsupervised baseline method.


acm/ieee joint conference on digital libraries | 2010

Effective self-training author name disambiguation in scholarly digital libraries

Anderson A. Ferreira; Adriano Veloso; Marcos André Gonçalves; Alberto H. F. Laender

Name ambiguity in the context of bibliographic citation records is a hard problem that affects the quality of services and content in digital libraries and similar systems. Supervised methods that exploit training examples in order to distinguish ambiguous author names are among the most effective solutions for the problem, but they require skilled human annotators in a laborious and continuous process of manually labeling citations in order to provide enough training examples. Thus, addressing the issues of (i) automatic acquisition of examples and (ii) highly effective disambiguation even when only few examples are available, are the need of the hour for such systems. In this paper, we propose a novel two-step disambiguation method, SAND (Self-training Associative Name Disambiguator), that deals with these two issues. The first step eliminates the need of any manual labeling effort by automatically acquiring examples using a clustering method that groups citation records based on the similarity among coauthor names. The second step uses a supervised disambiguation method that is able to detect unseen authors not included in any of the given training examples. Experiments conducted with standard public collections, using the minimum set of attributes present in a citation (i.e., author names, work title and publication venue), demonstrated that our proposed method outperforms representative unsupervised disambiguation methods that exploit similarities between citation records and is as effective as, and in some cases superior to, supervised ones, without manually labeling any training example.


acm ieee joint conference on digital libraries | 2003

The web-DL environment for building digital libraries from the web

Pável Calado; Marcos André Gonçalves; Edward A. Fox; Berthier A. Ribeiro-Neto; Alberto H. F. Laender; A.S. da Silva; Davi de Castro Reis; Paulo Roberto; Monique V. Vieira; Juliano Palmieri Lage

The Web contains a huge volume of unstructured data, which is difficult to manage. In digital libraries, on the other hand, information is explicitly organized, described, and managed. Community-oriented services are built to attend specific information needs and tasks. In this paper, we describe an environment, Web-DL, that allows the construction of digital libraries from the Web. The Web-DL environment will allow us to collect data from the Web, standardize it, and publish it through a digital library system. It provides support to services and organizational structure normally available in digital libraries, but benefiting from the breadth of the Web contents. We experimented with applying the Web-DL environment to the Networked Digital Library of Theses and Dissertations (NDLTD), thus demonstrating that the rapid construction of DLs from the Web is possible. Also, Web-DL provides an alternative as a largescale solution for interoperability between independent digital libraries.


International Journal on Digital Libraries | 2008

Towards a digital library theory: a formal digital library ontology

Marcos André Gonçalves; Edward A. Fox; Layne T. Watson

Digital libraries (DLs) have eluded definitional consensus and lack agreement on common theories and frameworks. This makes comparison of DLs extremely difficult, promotes ad-hoc development, and impedes interoperability. In this paper we propose a formal ontology for DLs that defines the fundamental concepts, relationships, and axiomatic rules that govern the DL domain, therefore providing a frame of reference for the discussion of essential concepts of DL design and construction. The ontology is an axiomatic, formal treatment of DLs, which distinguishes it from other approaches that informally define a number of architectural variants. The process of construction of the ontology was guided by 5S, a formal framework for digital libraries. To test its expressibility we have used the ontology to create a taxonomy of DL services and to reason about issues of reusability, extensibility, and composability. Some practical applications of the ontology are also described including: the definition of a digital library services taxonomy, the proposal of a modeling language for digital libraries, and the specification of quality metrics to evaluate digital libraries. We also demonstrate how to use the ontology to formally describe DL architectures and to prove some properties about them, thus helping to further validate the ontology.


Information Processing and Management | 2012

Cost-effective on-demand associative author name disambiguation

Adriano Veloso; Anderson A. Ferreira; Marcos André Gonçalves; Alberto H. F. Laender; Wagner Meira

Authorship disambiguation is an urgent issue that affects the quality of digital library services and for which supervised solutions have been proposed, delivering state-of-the-art effectiveness. However, particular challenges such as the prohibitive cost of labeling vast amounts of examples (there are many ambiguous authors), the huge hypothesis space (there are several features and authors from which many different disambiguation functions may be derived), and the skewed author popularity distribution (few authors are very prolific, while most appear in only few citations), may prevent the full potential of such techniques. In this article, we introduce an associative author name disambiguation approach that identifies authorship by extracting, from training examples, rules associating citation features (e.g., coauthor names, work title, publication venue) to specific authors. As our main contribution we propose three associative author name disambiguators: (1) EAND (Eager Associative Name Disambiguation), our basic method that explores association rules for name disambiguation; (2) LAND (Lazy Associative Name Disambiguation), that extracts rules on a demand-driven basis at disambiguation time, reducing the hypothesis space by focusing on examples that are most suitable for the task; and (3) SLAND (Self-Training LAND), that extends LAND with self-training capabilities, thus drastically reducing the amount of examples required for building effective disambiguation functions, besides being able to detect novel/unseen authors in the test set. Experiments demonstrate that all our disambigutators are effective and that, in particular, SLAND is able to outperform state-of-the-art supervised disambiguators, providing gains that range from 12% to more than 400%, being extremely effective and practical.


acm/ieee joint conference on digital libraries | 2004

BDBComp: building a digital library for the Brazilian computer science community

Alberto H. F. Laender; Marcos André Gonçalves; Pablo A. Roberto

We report initial efforts towards building BDBComp, a digital library for the Brazilian computer science community. BDBComp is based on a number of standards (e.g., OAI, Dublin Core, SQL) as well as on new technologies (e.g., Web data extraction tools), which allowed fast and easy prototyping. We focus on architectural issues and specific challenges faced during the construction of this digital library as well as on proposed solutions.


acm ieee joint conference on digital libraries | 2003

The XML log standard for digital libraries: analysis, evolution, and deployment

Marcos André Gonçalves; Ganesh Panchanathan; Unnikrishnan Ravindranathan; Aaron Krowne; Edward A. Fox; Filip Jagodzinski; Lillian N. Cassel

We describe current efforts and developments building on our proposal for an XML log standard format for digital library (DL) logging analysis and companion tools. Focus is given to the evolution of formats and tools, based on analysis of deployment in several DL systems and testbeds. Recent development of analysis tools also is discussed.


international conference theory and practice digital libraries | 2004

Prototyping Digital Libraries Handling Heterogeneous Data Sources – The ETANA-DL Case Study

Unni Ravindranathan; Rao Shen; Marcos André Gonçalves; Weiguo Fan; Edward A. Fox; James W. Flanagan

Information systems used in archaeology have several needs: interoperability among heterogeneous systems, making information available without significant delay, long-term preservation of data, and providing a suite of services to users. In this paper, we show how digital library techniques can be employed to provide solutions to three of these problems. We show this by describing a prototype for an archaeological Digital Library (ETANA-DL). First, ETANA-DL applies and extends the metadata harvesting approach to address some of the needs interoperability, rapid access to data, and data preservation. Second, we show that availability of a pool of components that implement common DL services has helped in rapidly creating the prototype, which was subsequently used for requirements elicitation. However, understanding complex archaeological information systems is a difficult task. Third, therefore, we describe our efforts to model these systems using the 5S framework, and show how the partially developed model has been used to implement complex services helping users carry out key tasks with the integrated data.


acm/ieee joint conference on digital libraries | 2004

The effectiveness of automatically structured queries in digital libraries

Marcos André Gonçalves; Edward A. Fox; Aaron Krowne; Pável Calado; Alberto H. F. Laender; Altigran Soares da Silva; Berthier A. Ribeiro-Neto

Structured or fielded metadata is the basis for many digital library services, including searching and browsing. Yet, little is known about the impact of using structure on the effectiveness of such services. We investigate a key research question: do structured queries improve effectiveness in DL searching? To answer this question, we empirically compared the use of unstructured queries to the use of structured queries. We then tested the capability of a simple Bayesian network system, built on top of a DL retrieval engine, to infer the best structured queries from the keywords entered by the user. Experiments performed with 20 subjects working with a DL containing a large collection of computer science literature clearly indicate that structured queries, either manually constructed or automatically generated, perform better than their unstructured counterparts, in the majority of cases. Also, automatic structuring of queries appears to be an effective and viable alternative to manual structuring that may significantly reduce the burden users.

Collaboration


Dive into the Marcos André Gonçalves's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alberto H. F. Laender

Universidade Federal de Minas Gerais

View shared research outputs
Top Co-Authors

Avatar

Anderson A. Ferreira

Universidade Federal de Ouro Preto

View shared research outputs
Top Co-Authors

Avatar

Adriano Veloso

Universidade Federal de Minas Gerais

View shared research outputs
Top Co-Authors

Avatar

Berthier A. Ribeiro-Neto

Universidade Federal de Minas Gerais

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nivio Ziviani

Universidade Federal de Minas Gerais

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

James W. Flanagan

Case Western Reserve University

View shared research outputs
Researchain Logo
Decentralizing Knowledge