Célia Nunes | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Célia Nunes is active.

Explore More

Publication

Featured researches published by Célia Nunes.

conference on information and knowledge management | 2012

GTE: a distributional second-order co-occurrence approach to improve the identification of top relevant dates in web snippets

Ricardo Campos; Gaël Dias; Alípio Mário Jorge; Célia Nunes

In this paper, we present an approach to identify top relevant dates in Web snippets with respect to a given implicit temporal query. Our approach is two-fold. First, we propose a generic temporal similarity measure called GTE, which evaluates the temporal similarity between a query and a date. Second, we propose a classification model to accurately relate relevant dates to their corresponding query terms and withdraw irrelevant ones. We suggest two different solutions: a threshold-based classification strategy and a supervised classifier based on a combination of multiple similarity measures. We evaluate both strategies over a set of real-world text queries and compare the performance of our Web snippet approach with a query log approach over the same set of queries. Experiments show that determining the most relevant dates of any given implicit temporal query can be improved with GTE combined with the second order similarity measure InfoSimba, the Dice coefficient and the threshold-based strategy compared to (1) first-order similarity measures and (2) the query log based approach.

web intelligence | 2006

WISE: Hierarchical Soft Clustering of Web Page Search Results Based on Web Content Mining Techniques

Ricardo Campos; Gaël Dias; Célia Nunes

Typically, search engines are low precision in response to a query, retrieving lots of useless Web pages, and missing some other important ones. In this paper, we study the problem of the hierarchical clustering of Web pages search results. In particular, we propose an architecture called WISE, a meta-search engine that automatically builds clusters of related Web pages embodying one meaning of the query. These clusters are then hierarchically organized and labeled with a phrase representing the key concept of the cluster and the corresponding Web documents. The system which is a Web-based interface (soon available at wise.di.ubi.pt), introduces some interesting new ideas, such as the preselection of the retrieved Web pages, the capacity to statistically detect phrases within documents and the representation of documents based on their most relevant key concepts by using Web content mining techniques. The final step of the system is supported by a graph-based overlapping clustering algorithm which groups the selected documents into a hierarchy of clusters

Proceedings of the 2nd Temporal Web Analytics Workshop on | 2012

Enriching temporal query understanding through date identification: how to tag implicit temporal queries?

Ricardo Campos; Gaël Dias; Alípio Mário Jorge; Célia Nunes

Generically, search engines fail to understand the users temporal intents when expressed as implicit temporal queries. This causes the retrieval of less relevant information and prevents users from being aware of the possible temporal dimension of the query results. In this paper, we aim to develop a language-independent model that tackles the temporal dimensions of a query and identifies its most relevant time periods. For this purpose, we propose a temporal similarity measure capable of associating a relevant date(s) to a given query and filtering out irrelevant ones. Our approach is based on the exploitation of temporal information from web content, particularly within the set of k-top retrieved web snippets returned in response to a query. We particularly focus on extracting years, which are a kind of temporal information that often appears in this type of collection. We evaluate our methodology using a set of real-world text temporal queries, which are clear concepts (i.e. queries which are non-ambiguous in concept and temporal in their purpose). Experiments show that when compared to baseline methods, determining the most relevant dates relating to any given implicit temporal query can be improved with a new temporal similarity measure.

Journal of Applied Statistics | 2012

F-tests with a rare pathology

Célia Nunes; Dário Ferreira; Sandra S. Ferreira; João T. Mexia

ANOVA is routinely used to compare pathologies. Nevertheless, in many situations, the sample dimensions may not be known when planning the study. This is specially relevant when one of the pathologies is rare. Thus, the sample size for that pathology or for all pathologies must be considered as random. Sample selection for the non-rare pathologies may be carried out to increase the balance of the model. This leads to F-tests with random non-centrality parameters and random degrees of freedom for the errors. The distribution of such tests statistics is obtained.

ICNAAM 2010: International Conference of Numerical Analysis and Applied Mathematics 2010 | 2010

F Tests with Random Sample Sizes

Célia Nunes; Dário Ferreira; Sandra S. Ferreira; João T. Mexia

ANOVA is routinely used to compare pathologies. We now want to consider the case in which one of the pathologies is rare so that it may not be possible to know the dimension of the corresponding sample. In this case the distribution of the F test have random non‐centrality parameters, when there are differences between the pathologies, and random degrees of freedom for the errors.

Journal of Statistical Computation and Simulation | 2014

Fixed effects ANOVA: an extension to samples with random size

Célia Nunes; Dário Ferreira; Sandra S. Ferreira; João T. Mexia

In many relevant situations, such as in medical research, sample sizes may not be previously known. The aim of this paper is to extend one and more than one-way analysis of variance to those situations and show how to compute correct critical values. The interest of this approach lies in avoiding false rejections obtained when using the classical fixed size F-tests. Sample sizes are assumed as random and we then proceed with the application of this approach to a database on cancer.

Journal of Statistical Computation and Simulation | 2012

Control of the truncation errors for generalized F distributions

Célia Nunes; Dário Ferreira; Sandra S. Ferreira; João T. Mexia

F-tests may not be used for all relevant hypothesis, even in rather simple models, which led to the introduction of generalized F-tests. The statistics of these tests are quotients of linear combinations of independent chi-squares, which may be non-central. When the observations are collected under non-standardized conditions, the non-centrality parameters may be random. The generalized F distributions are given by infinite sums. In this study, we show that there is an excellent control of the truncations errors for those sums. The case in which the non-centrality parameters have Gamma distributions is singled out.

european conference on information retrieval | 2014

GTE-Cluster: A Temporal Search Interface for Implicit Temporal Queries

Ricardo Campos; Gaël Dias; Alípio Mário Jorge; Célia Nunes

In this paper, we present GTE-Cluster an online temporal search interface which consistently allows searching for topics in a temporal perspective by clustering relevant temporal Web search results. GTE-Cluster is designed to improve user experience by augmenting document relevance with temporal relevance. The rationale is that offering the user a comprehensive temporal perspective of a topic is intuitively more informative than retrieving a result that only contains topical information. Our system does not pose any constraint in terms of language or domain, thus users can issue queries in any language ranging from business, cultural, political to musical perspective, to cite just a few. The ability to exploit this information in a temporal manner can be, from a user perspective, potentially useful for several tasks, including user query understanding or temporal clustering.

11TH INTERNATIONAL CONFERENCE OF NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2013: ICNAAM 2013 | 2013

ANOVA with random sample sizes: An application to a Brazilian database on cancer registries

Célia Nunes; Gilberto Capistrano; Dário Ferreira; Sandra S. Ferreira

We apply our results on random sample size ANOVA to a Brazilian database on cancer registries. The samples sizes will be considered as realizations of random variables. The interest of this approach lies in avoiding false rejections obtained when using the classical fixed size F-tests.

Information Processing and Management | 2016

GTE-Rank

Ricardo Campos; Gaël Dias; Alípio Mário Jorge; Célia Nunes

We propose a novel temporal re-ranking algorithm.We devise and provide new datasets for time-sensitive evaluation purposes.We conduct comparative experiments (including algorithms with a temporal focus).We investigate the effectiveness of GRank by running a crowdsourcing experiment.We build a prototype system that can be tested by the research community. In the web environment, most of the queries issued by users are implicit by nature. Inferring the different temporal intents of this type of query enhances the overall temporal part of the web search results. Previous works tackling this problem usually focused on news queries, where the retrieval of the most recent results related to the query are usually sufficient to meet the users information needs. However, few works have studied the importance of time in queries such as Philip Seymour Hoffman where the results may require no recency at all. In this work, we focus on this type of queries named time-sensitive queries where the results are preferably from a diversified time span, not necessarily the most recent one. Unlike related work, we follow a content-based approach to identify the most important time periods of the query and integrate time into a re-ranking model to boost the retrieval of documents whose contents match the query time period. For that purpose, we define a linear combination of topical and temporal scores, which reflects the relevance of any web document both in the topical and temporal dimensions, thus contributing to improve the effectiveness of the ranked results across different types of queries. Our approach relies on a novel temporal similarity measure that is capable of determining the most important dates for a query, while filtering out the non-relevant ones. Through extensive experimental evaluation over web corpora, we show that our model offers promising results compared to baseline approaches. As a result of our investigation, we publicly provide a set of web services and a web search interface so that the system can be graphically explored by the research community.

Explore More