David Fernandes
Federal University of Amazonas
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David Fernandes.
conference on information and knowledge management | 2007
David Fernandes; Edleno Silva de Moura; Berthier A. Ribeiro-Neto; Altigran Soares da Silva; Marcos André Gonçalves
In this paper we consider the problem of using the block structure of a Web page to improve ranking results when searching for information on Web sites. Given the block structure of the Web pages as input, we propose a method for computing the importance of each block (in the form of block weights) in a Web collection. As we show through experiments, the deployment of our method may allow a significant improvement in the quality of search results. We ran experiments to compare the quality of search results when using our method to the quality obtained when using no structure information. When compared to a ranking method that considered pages as monolithic units, our block-based ranking method led to improvements in the quality of search results in experiments with two sites with heterogeneous structures. Further, our method does not increase the cost of processing queries when compared to the systems using no structural information.
Information Processing and Management | 2013
Flavio Figueiredo; Henrique Pinto; Fabiano Muniz Belém; Jussara M. Almeida; Marcos André Gonçalves; David Fernandes; Edleno Silva de Moura
Social media is increasingly becoming a significant fraction of the content retrieved daily by Web users. However, the potential lack of quality of user generated content poses a challenge to information retrieval services, which rely mostly on textual features generated by users (particularly tags) commonly associated with the multimedia objects. This paper presents what, to the best of our knowledge, is currently the most comprehensive study of the relative quality of textual features in social media. We analyze four different features, namely, title, tags, description and comments posted by users, in four popular applications, namely, YouTube, Yahoo! Video, LastFM and CiteULike. Our study is based on an extensive characterization of data crawled from the four applications with respect to usage, amount and semantics of content, descriptive and discriminative power as well as content and information diversity across features. It also includes a series of object classification and tag recommendation experiments as case studies of two important information retrieval tasks, aiming at analyzing how these tasks are affected by the quality of the textual features. Classification and recommendation effectiveness is analyzed in light of our characterization results. Our findings provide valuable insights for future research and design of Web 2.0 applications and services.
conference on information and knowledge management | 2009
Flavio Figueiredo; Fabiano Muniz Belém; Henrique Pinto; Jussara M. Almeida; Marcos André Gonçalves; David Fernandes; Edleno Silva de Moura; Marco Cristo
The growth of popularity of Web 2.0 applications greatly increased the amount of social media content available on the Internet. However, the unsupervised, user-oriented nature of this source of information, and thus, its potential lack of quality, have posed a challenge to information retrieval (IR) services. Previous work focuses mostly only on tags, although a consensus about its effectiveness as supporting information for IR services has not yet been reached. Moreover, other textual features of the Web 2.0 are generally overseen by previous research. In this context, this work aims at assessing the relative quality of distinct textual features available on the Web 2.0. Towards this goal, we analyzed four features (title, tags, description and comments) in four popular applications (CiteULike, Last.FM, Yahoo! Video, and Youtube). Firstly, we characterized data from these applications in order to extract evidence of quality of each feature with respect to usage, amount of content, descriptive and discriminative power as well as of content diversity across features. Afterwards, a series of classification experiments were conducted as a case study for quality evaluation. Characterization and classification results indicate that: 1) when considered separately, tags is the most promising feature, achieving the best classification results, although its absence in a non-negligible fraction of objects may affect its potential use; and 2) each feature may bring different pieces of information, and combining their contents can improve classification.
Journal of the Association for Information Science and Technology | 2010
Edleno Silva de Moura; David Fernandes; Berthier A. Ribeiro-Neto; Altigran Soares da Silva; Marcos André Gonçalves
A huge number of informal messages are posted every day in social network sites, blogs, and discussion forums. Emotions seem to be frequently important in these texts for expressing friendship, showing social support or as part of online arguments. Algorithms to identify sentiment and sentiment strength are needed to help understand the role of emotion in this informal communication and also to identify inappropriate or anomalous affective utterances, potentially associated with threatening behavior to the self or others. Nevertheless, existing sentiment detection algorithms tend to be commercially oriented, designed to identify opinions about products rather than user behaviors. This article partly fills this gap with a new algorithm, SentiStrength, to extract sentiment strength from informal English text, using new methods to exploit the de facto grammars and spelling styles of cyberspace. Applied to MySpace comments and with a lookup table of term sentiment strengths optimized by machine learning, SentiStrength is able to predict positive emotion with 60.6p accuracy and negative emotion with 72.8p accuracy, both based upon strength scales of 1–5. The former, but not the latter, is better than baseline and a wide range of general machine learning approaches.
international acm sigir conference on research and development in information retrieval | 2011
David Fernandes; Edleno Silva de Moura; Altigran Soares da Silva; Berthier A. Ribeiro-Neto; Edisson Braga
Information about how to segment a Web page can be used nowadays by applications such as segment aware Web search, classification and link analysis. In this research, we propose a fully automatic method for page segmentation and evaluate its application through experiments with four separate Web sites. While the method may be used in other applications, our main focus in this article is to use it as input to segment aware Web search systems. Our results indicate that the proposed method produces better segmentation results when compared to the best segmentation method we found in literature. Further, when applied as input to a segment aware Web search method, it produces results close to those produced when using a manual page segmentation method.
international conference on conceptual modeling | 2004
Keyla Ahnizeret; David Fernandes; João M. B. Cavalcanti; Edleno Silva de Moura; Altigran Soares da Silva
Design and maintenance of large corporate Web sites have become a challenging problem due to the continuing increase in their size and complexity. One particular feature present in the majority of this sort of Web sites is searching for information. However the solutions provided so far, which is based on the same techniques used for search in the open Web, have not provided a satisfactory performance to specific Web sites, often resulting in too much irrelevant content in a query answer. This paper proposes an approach to Web site modelling and generation of intrasite search engines, combining application modelling and information retrieval techniques. Our assumption is that giving search engines access to the information provided by conceptual representations of the Web site improves their performance and accuracy. We demonstrate our proposal by describing a Web site modelling language that represent both traditional modelling features and information retrieval aspects, as well as presenting experiments to evaluate the resulting intrasite search engine generated by our method.
brazilian symposium on multimedia and the web | 2012
Maisa Vidal; Guilherme Vale Menezes; Klessius Berlt; Edleno Silva de Moura; Karla Okada; Nivio Ziviani; David Fernandes; Marco Cristo
In this paper we present three new methods to extract keywords from web pages using Wikipedia as an external source of information. The information used from Wikipedia includes the titles of articles, co-occurrence of keywords and categories associated with each Wikipedia definition. We compare our methods with three keyword extraction methods used as baselines: (i) all the terms of a web page, (ii) a TF-IDF implementation that extracts single weighted words of a web page and (iii) a previously proposed Wikipedia-based keyword extraction method presented in the literature. We compare our three keyword extraction methods with the baseline methods in three distinct scenarios, all related to our target application, which is the selection of ads in a context-based advertising system. In the first scenario, the target pages to place ads were extracted from Wikipedia articles, whereas the target pages in the other two scenarios were extracted from a news web site. Experimental results show that our methods are quite competitive solutions for the task of selecting good keywords to represent target web pages, albeit being simple, effective and time efficient. For instance, in the first scenario our best method used to extract keywords from Wikipedia articles achieved an improvement of 33% when compared to the second best baseline, and a gain of 26% when considering all the terms.
Information Processing and Management | 2016
Caio Moura Daoud; Edleno Silva de Moura; André Luiz da Costa Carvalho; Altigran Soares da Silva; David Fernandes; Cristian Rossi
We present a new query processing method for text search.We extend the BMW-CS algorithm to now preserve the top-k results, proposing BMW-CSP.We show through experiments that the method is competitive when compared to baselines. In this paper we propose and evaluate the Block Max WAND with Candidate Selection and Preserving Top-K Results algorithm, or BMW-CSP. It is an extension of BMW-CS, a method previously proposed by us. Although very efficient, BMW-CS does not guarantee preserving the top-k results for a given query. Algorithms that do not preserve the top results may reduce the quality of ranking results in search systems. BMW-CSP extends BMW-CS to ensure that the top-k results will have their rankings preserved. In the experiments we performed for computing the top-10 results, the final average time required for processing queries with BMW-CSP was lesser than the ones required by the baselines adopted. For instance, when computing top-10 results, the average time achieved by MBMW, the best multi-tier baseline we found in the literature, was 36.29źms per query, while the average time achieved by BMW-CSP was 19.64źms per query. The price paid by BMW-CSP is an extra memory required to store partial scores of documents. As we show in the experiments, this price is not prohibitive and, in cases where it is acceptable, BMW-CSP may constitute an excellent alternative query processing method.
Journal of the Association for Information Science and Technology | 2012
André Luiz da Costa Carvalho; Cristian Rossi; Edleno Silva de Moura; Altigran Soares da Silva; David Fernandes
State-of-the-art search engine ranking methods combine several distinct sources of relevance evidence to produce a high-quality ranking of results for each query. The fusion of information is currently done at query-processing time, which has a direct effect on the response time of search systems. Previous research also shows that an alternative to improve search efficiency in textual databases is to precompute term impacts at indexing time. In this article, we propose a novel alternative to precompute term impacts, providing a generic framework for combining any distinct set of sources of evidence by using a machine-learning technique. This method retains the advantages of producing high-quality results, but avoids the costs of combining evidence at query-processing time. Our method, called Learn to Precompute Evidence Fusion (LePrEF), uses genetic programming to compute a unified precomputed impact value for each term found in each document prior to query processing, at indexing time. Compared with previous research on precomputing term impacts, our method offers the advantage of providing a generic framework to precompute impact using any set of relevance evidence at any text collection, whereas previous research articles do not. The precomputed impact values are indexed and used later for computing document ranking at query-processing time. By doing so, our method effectively reduces the query processing to simple additions of such impacts. We show that this approach, while leading to results comparable to state-of-the-art ranking methods, also can lead to a significant decrease in computational costs during query processing.
brazilian symposium on multimedia and the web | 2009
Flavio Figueiredo; Fabiano Muniz Belém; Henrique Pinto; Jussara M. Almeida; Marcos André Gonçalves; David Fernandes; Edleno Silva de Moura; Virgílio A. F. Almeida; Marco Cristo
Despite the large amount of multimedia content in Web 2.0 applications, most of its services in Information Retrieval (IR) use only attributes associated with textual content (eg, labels or tags). However, because they are typically generated by users, such attributes do not offer guarantees of quality for IR services. Here, we investigate evidence of quality of textual attributes in popular Web 2.0 applications related to three aspects: utilization; discriminative and descriptive power. We have performed a characterization of the use of four textual attributes (title, description, tags and comments) in the following systems: Youtube, YahooVideo, LastFM and CiteULike. Some of our results, which may be considered in the design of IR services in Web 2.0, are: (1) collaborative textual attributes, although not significantly exploited in some applications, contain the largest amount of information when present, (2) there is a significant diversity of information between the textual attributes, and (3) the title and tags of the objects seem to be the most promising attributes for IR services, whereas the former is almost always present and has a high power specification, and second, when used, has a high discriminative and descriptive power.