José Borges | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where José Borges is active.

Explore More

Publication

Featured researches published by José Borges.

knowledge discovery and data mining | 1999

Data Mining of User Navigation Patterns

José Borges; Mark Levene

We propose a data mining model that captures the user navigation behaviour patterns. The user navigation sessions are modelled as a hypertext probabilistic grammar whose higher probability strings correspond to the users preferred trails. An algorithm to efficiently mine such trails is given. We make use of the N gram model which assumes that the last N pages browsed affect the probability of the next page to be visited. The model is based on the theory of probabilistic grammars providing it with a sound theoretical foundation for future enhancements. Moreover, we propose the use of entropy as an estimator of the grammars statistical properties. Extensive experiments were conducted and the results show that the algorithm runs in linear time, the grammars entropy is a good estimator of the number of mined trails and the real data rules confirm the effectiveness of the model.

IEEE Transactions on Knowledge and Data Engineering | 2007

Evaluating Variable-Length Markov Chain Models for Analysis of User Web Navigation Sessions

José Borges; Mark Levene

Markov models have been widely used to represent and analyze user Web navigation data. In previous work, we have proposed a method to dynamically extend the order of a Markov chain model and a complimentary method for assessing the predictive power of such a variable-length Markov chain. Herein, we review these two methods and propose a novel method for measuring the ability of a variable-length Markov model to summarize user Web navigation sessions up to a given length. Although the summarization ability of a model is important to enable the identification of user navigation patterns, the ability to make predictions is important in order to foresee the next link choice of a user after following a given trail so as, for example, to personalize a Web site. We present an extensive experimental evaluation providing strong evidence that prediction accuracy increases linearly with summarization ability

Sigkdd Explorations | 2000

A fine grained heuristic to capture web navigation patterns

José Borges; Mark Levene

In previous work we have proposed a statistical model to capture the user behaviour when browsing the web. The user navigation information obtained from web logs is modelled as a hypertext probabilistic grammar (HPG) which is within the class of regular probabilistic grammars. The set of highest probability strings generated by the grammar corresponds to the user preferred navigation trails. We have previously conducted experiments with a Breadth-First Search algorithm (BFS) to perform the exhaustive computation of all the strings with probability above a specified cut-point, which we call the rules. Although the algorithm’s running time varies linearly with the number of grammar states, it has the drawbacks of returning a large number of rules when the cut-point is small and a small set of very short rules when the cut-point is high. In this work, we present a new heuristic that implements an iterative deepening search wherein the set of rules is incrementally augmented by first exploring trails with high probability. A stopping parameter is provided which measures the distance between the current rule-set and its corresponding maximal set obtained by the BFS algorithm. When the stopping parameter takes the value zero the heuristic corresponds to the BFS algorithm and as the parameter takes values closer to one the number of rules obtained decreases accordingly. Experiments were conducted with both real and synthetic data and the results show that for a given cut-point the number of rules induced increases smoothly with the decrease of the stopping criterion. Therefore, by setting the value of the stopping criterion the analyst can determine the number and quality of rules to be induced; the quality of a rule is measured by both its length and probability.

european conference on machine learning | 2005

Generating dynamic higher-order markov models in web usage mining

José Borges; Mark Levene

Markov models have been widely used for modelling users’ web navigation behaviour. In previous work we have presented a dynamic clustering-based Markov model that accurately represents second-order transition probabilities given by a collection of navigation sessions. Herein, we propose a generalisation of the method that takes into account higher-order conditional probabilities. The method makes use of the state cloning concept together with a clustering technique to separate the navigation paths that reveal differences in the conditional probabilities. We report on experiments conducted with three real world data sets. The results show that some pages require a long history to understand the users choice of link, while others require only a short history. We also show that the number of additional states induced by the method can be controlled through a probability threshold parameter.

electronic commerce and web technologies | 2000

An Heuristic to Capture Longer User Web Navigation Patterns

José Borges; Mark Levene

In previous work we have proposed a data mining model to capture user web navigation patterns, which models the navigation sessions as a hypertext probabilistic grammar. The grammars higher probability strings correspond to the user preferred trails and an algorithm was given to find all strings with probability above a threshold. Herein, we propose a heuristic aimed at finding longer trails composed of links whose average probability is above the threshold. A dynamic threshold is provided whose value is at all times proportional to the length of the trail being evaluated. We report on experiments with both real and synthetic data which were conducted to assess the heuristics utility.

International Journal of Information Technology and Decision Making | 2004

AN AVERAGE LINEAR TIME ALGORITHM FOR WEB USAGE MINING

José Borges; Mark Levene

In this paper, we study the complexity of a data mining algorithm for extracting patterns from user web navigation data that was proposed in previous work.3 The user web navigation sessions are inferred from log data and modeled as a Markov chain. The chains higher probability trails correspond to the preferred trails on the web site. The algorithm implements a depth-first search that scans the Markov chain for the high probability trails. We show that the average behaviour of the algorithm is linear time in the number of web pages accessed.

International Journal of Biometeorology | 2015

Partitioning the grapevine growing season in the Douro Valley of Portugal: accumulated heat better than calendar dates

António C. Real; José Borges; J.A. Sarsfield Cabral; Gregory V. Jones

Temperature and water status profiles during the growing season are the most important factors influencing the ripening of wine grapes. To model weather influences on the quality and productivity of the vintages, it is necessary to partition the growing season into smaller growth intervals in which weather variables are evaluated. A significant part of past and ongoing research on the relationships between weather and wine quality uses calendar-defined intervals to partition the growing season. The phenology of grapevines is not determined by calendar dates but by several factors such as accumulated heat. To examine the accuracy of different approaches, this work analyzed the difference in average temperature and accumulated precipitation using growth intervals with boundaries defined by means of estimated historical phenological dates and intervals defined by means of accumulated heat or average calendar dates of the Douro Valley of Portugal. The results show that in situations where there is an absence of historical phenological dates and/or no available data that makes the estimation of those dates possible, it is more accurate to use grapevine heat requirements than calendar dates to define growth interval boundaries. Additionally, we analyzed the ability of the length of growth intervals with boundaries based on grapevine heat requirements to differentiate the best from the worst vintage years with the results showing that vintage quality is strongly related to the phenological events. Finally, we analyzed the variability of growth interval lengths in the Douro Valley during 1980–2009 with the results showing a tendency for earlier grapevine physiology.

soft computing | 2007

Testing the Predictive Power of Variable History Web Usage

José Borges; Mark Levene

We present two methods for testing the predictive power of a variable length Markov chain induced from a collection of user web navigation sessions. The collection of sessions is split into a training and a test set. The first method uses a χ2 statistical test to measure the significance of the distance between the distribution of the probabilities assigned to the test trails by a Markov model build from the full collection of sessions and a model built from the training set. The statistical test measures the ability of the model to generalise its predictions to the unseen sessions from the test set. The second method evaluates the model ability to predict the last page of a navigation session based on the preceding pages viewed by recording the mean absolute error of the rank of the last occurring page among the predictions provided by the model. Experimental results conducted on both real and random data sets are reported and the results show that in most cases a second-order model is able to capture sufficient history to predict the next link choice with high accuracy.

international world wide web conferences | 2006

Ranking Pages by Topology and Popularity within Web Sites

José Borges; Mark Levene

We compare two link analysis ranking methods of web pages in a site. The first, called Site Rank, is an adaptation of PageRank to the granularity of a web site and the second, called Popularity Rank, is based on the frequencies of user clicks on the outlinks in a page that are captured by navigation sessions of users through the web site. We ran experiments on artificially created web sites of different sizes and on two real data sets, employing the relative entropy to compare the distributions of the two ranking methods. For the real data sets we also employ a nonparametric measure, called Spearmans footrule, which we use to compare the top-ten web pages ranked by the two methods. Our main result is that the distributions of the Popularity Rank and Site Rank are surprisingly close to each other, implying that the topology of a web site is very instrumental in guiding users through the site. Thus, in practice, the Site Rank provides a reasonable first order approximation of the aggregate behaviour of users within a web site given by the Popularity Rank.

European Journal of Engineering Education | 2009

A new group-formation method for student projects

José Borges; Teresa Galvão Dias; João Falcão e Cunha

In BSc/MSc engineering programmes at Faculty of Engineering of the University of Porto (FEUP), the need to provide students with teamwork experiences close to a real world environment was identified as an important issue. A new group-formation method that aims to provide an enriching teamwork experience is proposed. Students are asked to answer a questionnaire to evaluate their teamwork profiles and are assigned to groups by an algorithm aiming to achieve maximum diversity within groups and homogeneity among groups. The profile diversity/complementarity within a group is an important factor to promote members’ commitment and coordination in order to achieve the proposed goals. The proposed method is compared to a standard self-selection method for three engineering programmes in three academic years. The results show that, with the new method, there are a higher number of medium ranked groups which surpass the expectation and that, contrary to some students’ beliefs, the method does not have a negative impact on the overall final marks.

Explore More