José Hernández-Orallo
Polytechnic University of Valencia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by José Hernández-Orallo.
Pattern Recognition Letters | 2009
César Ferri; José Hernández-Orallo; R. Modroiu
Performance metrics in classification are fundamental in assessing the quality of learning methods and learned models. However, many different measures have been defined in the literature with the aim of making better choices in general or for a specific application area. Choices made by one metric are claimed to be different from choices made by other metrics. In this work, we analyse experimentally the behaviour of 18 different performance metrics in several scenarios, identifying clusters and relationships between measures. We also perform a sensitivity analysis for all of them in terms of several traits: class threshold choice, separability/ranking quality, calibration performance and sensitivity to changes in prior class distribution. From the definitions and experiments, we make a comprehensive analysis of the relationships between metrics, and a taxonomy and arrangement of them according to the previous traits. This can be useful for choosing the most adequate measure (or set of measures) for a specific application. Additionally, the study also highlights some niches in which new measures might be defined and also shows that some supposedly innovative measures make the same choices (or almost) as existing ones. Finally, this work can also be used as a reference for comparing experimental results in pattern recognition and machine learning literature, when using different measures.
european conference on machine learning | 2003
César Ferri; Peter A. Flach; José Hernández-Orallo
In this work we investigate several issues in order to improve the performance of probabilistic estimation trees (PETs). First, we derive a new probability smoothing that takes into account the class distributions of all the nodes from the root to each leaf. Secondly, we introduce or adapt some new splitting criteria aimed at improving probability estimates rather than improving classification accuracy, and compare them with other accuracy-aimed splitting criteria. Thirdly, we analyse the effect of pruning methods and we choose a cardinality-based pruning, which is able to significantly reduce the size of the trees without degrading the quality of the estimates. The quality of probability estimates of these three issues is evaluated by the 1-vs-1 multi-class extension of the Area Under the ROC Curve (AUC) measure, which is becoming widespread for evaluating probability estimators, ranking of predictions in particular.
Journal of Logic, Language and Information | 2000
José Hernández-Orallo
The main factor of intelligence is defined as the ability tocomprehend, formalising this ability with the help of new constructsbased on descriptional complexity. The result is a comprehension test,or C-test, which is exclusively defined in computational terms. Due toits absolute and non-anthropomorphic character, it is equally applicableto both humans and non-humans. Moreover, it correlates with classicalpsychometric tests, thus establishing the first firm connection betweeninformation theoretical notions and traditional IQ tests. The TuringTest is compared with the C-test and the combination of the two isquestioned. In consequence, the idea of using the Turing Test as apractical test of intelligence should be surpassed, and substituted bycomputational and factorial tests of different cognitive abilities, amuch more useful approach for artificial intelligence progress and formany other intriguing questions that present themselves beyond theTuring Test.
international conference on machine learning | 2004
César Ferri; Peter A. Flach; José Hernández-Orallo
A sensible use of classifiers must be based on the estimated reliability of their predictions. A cautious classifier would delegate the difficult or uncertain predictions to other, possibly more specialised, classifiers. In this paper we analyse and develop this idea of delegating classifiers in a systematic way. First, we design a two-step scenario where a first classifier chooses which examples to classify and delegates the difficult examples to train a second classifier. Secondly, we present an iterated scenario involving an arbitrary number of chained classifiers. We compare these scenarios to classical ensemble methods, such as bagging and boosting. We show experimentally that our approach is not far behind these methods in terms of accuracy, but with several advantages: (i) improved efficiency, since each classifier learns from fewer examples than the previous one; (ii) improved comprehensibility, since each classification derives from a single classifier; and (iii) the possibility to simplify the overall multi-classifier by removing the parts that lead to delegation.
Communications of The ACM | 2015
Sumit Gulwani; José Hernández-Orallo; Emanuel Kitzelmann; Stephen Muggleton; Ute Schmid; Benjamin G. Zorn
Inductive programming can liberate users from performing tedious and repetitive tasks.
Kluwer Academic Publishers | 2003
Peter A. Flach; Hendrik Blockeel; César Ferri; José Hernández-Orallo; Jan Struyf
In this chapter we give an introduction to ROC (‘receiver operating characteristics’) analysis and its applications to data mining. We argue that ROC analysis provides decision support for data mining in several ways. For model selection, ROC analysis establishes a method to determine the optimal model once the operating characteristics for the model deployment context are known. We also show how ROC analysis can aid in constructing and refining models in the modeling stage.
Pattern Recognition | 2013
José Hernández-Orallo
Several efforts have been done to bring ROC analysis beyond (binary) classification, especially in regression. However, the mapping and possibilities of these proposals do not correspond to what we expect from the analysis of operating conditions, dominance, hybrid methods, etc. In this paper we present a new representation of regression models in the so-called regression ROC (RROC) space. The basic idea is to represent over-estimation on the xaxis and under-estimation on the y-axis. The curves are just drawn by adjusting a shift, a constant that is added (or subtracted) to the predictions, and plays a similar role as a threshold in classification. From here, we develop the notions of optimal operating condition, convexity, dominance, and explore several evaluation metrics that can be shown graphically, such as the area over the RROC curve (AOC). In particular, we show a novel and significant result, the AOC is equal to the error variance (multiplied by a factor which does not depend on the model). The derivation of RROC curves with non-constant shifts and soft regression models, and the relation with cost plots is also discussed.Receiver Operating Characteristic (ROC) analysis is one of the most popular tools for the visual assessment and understanding of classifier performance. In this paper we present a new representation of regression models in the so-called regression ROC (RROC) space. The basic idea is to represent over-estimation against under-estimation. The curves are just drawn by adjusting a shift, a constant that is added (or subtracted) to the predictions, and plays a similar role as a threshold in classification. From here, we develop the notions of optimal operating condition, convexity, dominance, and explore several evaluation metrics that can be shown graphically, such as the area over the RROC curve (AOC). In particular, we show a novel and significant result: the AOC is equivalent to the error variance. We illustrate the application of RROC curves to resource estimation, namely the estimation of software project effort.
international conference on data mining | 2010
Antonio Bella; Cèsar Ferri; José Hernández-Orallo; María José Ramírez-Quintana
Quantification is the name given to a novel machine learning task which deals with correctly estimating the number of elements of one class in a set of examples. The output of a quantifier is a real value, since training instances are the same as a classification problem, a natural approach is to train a classifier and to derive a quantifier from it. Some previous works have shown that just classifying the instances and counting the examples belonging to the class of interest classify count typically yields bad quantifiers, especially when the class distribution may vary between training and test. Hence, adjusted versions of classify count have been developed by using modified thresholds. However, previous works have explicitly discarded (without a deep analysis) any possible approach based on the probability estimations of the classifier. In this paper, we present a method based on averaging the probability estimations of a classifier with a very simple scaling that does perform reasonably well, showing that probability estimators for quantification capture a richer view of the problem than methods based on a threshold.
artificial general intelligence | 2011
Javier Insa-Cabrera; David L. Dowe; Sergio España-Cubillo; M. Victoria Hernández-Lloreda; José Hernández-Orallo
Comparing humans and machines is one important source of information about both machine and human strengths and limitations. Most of these comparisons and competitions are performed in rather specific tasks such as calculus, speech recognition, translation, games, etc. The information conveyed by these experiments is limited, since it portrays that machines are much better than humans at some domains and worse at others. In fact, CAPTCHAs exploit this fact. However, there have only been a few proposals of general intelligence tests in the last two decades, and, to our knowledge, just a couple of implementations and evaluations. In this paper, we implement one of the most recent test proposals, devise an interface for humans and use it to compare the intelligence of humans and Q-learning, a popular reinforcement learning algorithm. The results are highly informative in many ways, raising many questions on the use of a (universal) distribution of environments, on the role of measuring knowledge acquisition, and other issues, such as speed, duration of the test, scalability, etc.
international symposium on functional and logic programming | 2001
C. Ferri-Ramírez; José Hernández-Orallo; M. José Ramírez-Quintana
In this work, we consider the extension of the Inductive Functional Logic Programming (IFLP) framework in order to learn functions in an incremental way. In general, incremental learning is necessary when the number of examples is infinite, very large or presented one by one. We have performed this extension in the FLIP system, an implementation of the IFLP framework. Several examples of programs which have been induced indicate that our extension pays off in practice. An experimental study of some parameters which affect this efficiency is performed and some applications for programming practice are illustrated, especially small classification problems and data-mining of semi-structured data.