Rui Camacho
University of Porto
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rui Camacho.
Archive | 2005
Alípio Mário Jorge; Luís Torgo; Pavel Brazdil; Rui Camacho; João Gama
Invited Talks.- Data Analysis in the Life Sciences - Sparking Ideas -.- Machine Learning for Natural Language Processing (and Vice Versa?).- Statistical Relational Learning: An Inductive Logic Programming Perspective.- Recent Advances in Mining Time Series Data.- Focus the Mining Beacon: Lessons and Challenges from the World of E-Commerce.- Data Streams and Data Synopses for Massive Data Sets.- Long Papers.- k-Anonymous Patterns.- Interestingness is Not a Dichotomy: Introducing Softness in Constrained Pattern Mining.- Generating Dynamic Higher-Order Markov Models in Web Usage Mining.- Tree 2 - Decision Trees for Tree Structured Data.- Agglomerative Hierarchical Clustering with Constraints: Theoretical and Empirical Results.- Cluster Aggregate Inequality and Multi-level Hierarchical Clustering.- Ensembles of Balanced Nested Dichotomies for Multi-class Problems.- Protein Sequence Pattern Mining with Constraints.- An Adaptive Nearest Neighbor Classification Algorithm for Data Streams.- Support Vector Random Fields for Spatial Classification.- Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication.- A Correspondence Between Maximal Complete Bipartite Subgraphs and Closed Patterns.- Improving Generalization by Data Categorization.- Mining Model Trees from Spatial Data.- Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification.- Mining Paraphrases from Self-anchored Web Sentence Fragments.- M2SP: Mining Sequential Patterns Among Several Dimensions.- A Systematic Comparison of Feature-Rich Probabilistic Classifiers for NER Tasks.- Knowledge Discovery from User Preferences in Conversational Recommendation.- Unsupervised Discretization Using Tree-Based Density Estimation.- Weighted Average Pointwise Mutual Information for Feature Selection in Text Categorization.- Non-stationary Environment Compensation Using Sequential EM Algorithm for Robust Speech Recognition.- Hybrid Cost-Sensitive Decision Tree.- Characterization of Novel HIV Drug Resistance Mutations Using Clustering, Multidimensional Scaling and SVM-Based Feature Ranking.- Object Identification with Attribute-Mediated Dependences.- Weka4WS: A WSRF-Enabled Weka Toolkit for Distributed Data Mining on Grids.- Using Inductive Logic Programming for Predicting Protein-Protein Interactions from Multiple Genomic Data.- ISOLLE: Locally Linear Embedding with Geodesic Distance.- Active Sampling for Knowledge Discovery from Biomedical Data.- A Multi-metric Index for Euclidean and Periodic Matching.- Fast Burst Correlation of Financial Data.- A Propositional Approach to Textual Case Indexing.- A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston.- Efficient Classification from Multiple Heterogeneous Databases.- A Probabilistic Clustering-Projection Model for Discrete Data.- Short Papers.- Collaborative Filtering on Data Streams.- The Relation of Closed Itemset Mining, Complete Pruning Strategies and Item Ordering in Apriori-Based FIM Algorithms.- Community Mining from Multi-relational Networks.- Evaluating the Correlation Between Objective Rule Interestingness Measures and Real Human Interest.- A Kernel Based Method for Discovering Market Segments in Beef Meat.- Corpus-Based Neural Network Method for Explaining Unknown Words by WordNet Senses.- Segment and Combine Approach for Non-parametric Time-Series Classification.- Producing Accurate Interpretable Clusters from High-Dimensional Data.- Stress-Testing Hoeffding Trees.- Rank Measures for Ordering.- Dynamic Ensemble Re-Construction for Better Ranking.- Frequency-Based Separation of Climate Signals.- Efficient Processing of Ranked Queries with Sweeping Selection.- Feature Extraction from Mass Spectra for Classification of Pathological States.- Numbers in Multi-relational Data Mining.- Testing Theories in Particle Physics Using Maximum Likelihood and Adaptive Bin Allocation.- Improved Naive Bayes for Extremely Skewed Misclassification Costs.- Clustering and Prediction of Mobile User Routes from Cellular Data.- Elastic Partial Matching of Time Series.- An Entropy-Based Approach for Generating Multi-dimensional Sequential Patterns.- Visual Terrain Analysis of High-Dimensional Datasets.- An Auto-stopped Hierarchical Clustering Algorithm for Analyzing 3D Model Database.- A Comparison Between Block CEM and Two-Way CEM Algorithms to Cluster a Contingency Table.- An Imbalanced Data Rule Learner.- Improvements in the Data Partitioning Approach for Frequent Itemsets Mining.- On-Line Adaptive Filtering of Web Pages.- A Bi-clustering Framework for Categorical Data.- Privacy-Preserving Collaborative Filtering on Vertically Partitioned Data.- Indexed Bit Map (IBM) for Mining Frequent Sequences.- STochFS: A Framework for Combining Feature Selection Outcomes Through a Stochastic Process.- Speeding Up Logistic Model Tree Induction.- A Random Method for Quantifying Changing Distributions in Data Streams.- Deriving Class Association Rules Based on Levelwise Subspace Clustering.- An Incremental Algorithm for Mining Generators Representation.- Hybrid Technique for Artificial Neural Network Architecture and Weight Optimization.
International Journal of Legal Medicine | 2011
Luísa Pereira; Farida Alshamali; Rune Andreassen; Ruth Ballard; Wasun Chantratita; Nam Soo Cho; Clotilde Coudray; Jean-Michel Dugoujon; Marta Espinoza; Fabricio González-Andrade; Sibte Hadi; Uta-Dorothee Immel; Catalin Marian; Antonio González-Martín; Gerhard Mertens; Walther Parson; Carlos Perone; Lourdes Prieto; Haruo Takeshita; Héctor Rangel Villalobos; Zhaoshu Zeng; Rui Camacho; Nuno A. Fonseca
Because of their sensitivity and high level of discrimination, short tandem repeat (STR) maker systems are currently the method of choice in routine forensic casework and data banking, usually in multiplexes up to 15–17 loci. Constraints related to sample amount and quality, frequently encountered in forensic casework, will not allow to change this picture in the near future, notwithstanding the technological developments. In this study, we present a free online calculator named PopAffiliator (http://cracs.fc.up.pt/popaffiliator) for individual population affiliation in the three main population groups, Eurasian, East Asian and sub-Saharan African, based on genotype profiles for the common set of STRs used in forensics. This calculator performs affiliation based on a model constructed using machine learning techniques. The model was constructed using a data set of approximately fifteen thousand individuals collected for this work. The accuracy of individual population affiliation is approximately 86%, showing that the common set of STRs routinely used in forensics provide a considerable amount of information for population assignment, in addition to being excellent for individual identification.
Proteins | 2007
Nuno A. Fonseca; Rui Camacho; Alexandre L. Magalhães
A systematic survey was carried out in an unbiased sample of 815 protein chains with a maximum of 20% homology selected from the Protein Data Bank, whose structures were solved at a resolution higher than 1.6 Å and with a R‐factor lower than 25%. A set of 5556 subsequences with α‐helix or 310‐helix motifs was extracted from the protein chains considered. Global and local propensities were then calculated for all possible amino acid pairs of the type (i, i + 1), (i, i + 2), (i, i + 3), and (i, i + 4), starting at the relevant helical positions N1, N2, N3, C3, C2, C1, and N‐int (interior positions), and also at the first nonhelical positions in both termini of the helices, namely, N‐cap and C‐cap. The statistical analysis of the propensity values has shown that pairing is significantly dependent on the type of the amino acids and on the position of the pair. A few sequences of three and four amino acids were selected and their high prevalence in helices is outlined in this work. The Glu‐Lys‐Tyr‐Pro sequence shows a peculiar distribution in proteins, which may suggest a relevant structural role in α‐helices when Pro is located at the C‐cap position. A bioinformatics tool was developed, which updates automatically and periodically the results and makes them available in a web site. Proteins 2008.
Journal of Logic Programming | 1999
Ashwin Srinivasan; Rui Camacho
Abstract Using problem-specific background knowledge, computer programs developed within the framework of Inductive Logic Programming (ILP) have been used to construct restricted first-order logic solutions to scientific problems. However, their approach to the analysis of data with substantial numerical content has been largely limited to constructing clauses that: (a) provide qualitative descriptions (“high”, “low” etc.) of the values of response variables; and (b) contain simple inequalities restricting the ranges of predictor variables. This has precluded the application of such techniques to scientific and engineering problems requiring a more sophisticated approach. A number of specialised methods have been suggested to remedy this. In contrast, we have chosen to take advantage of the fact that the existing theoretical framework for ILP places very few restrictions of the nature of the background knowledge. We describe two issues of implementation that make it possible to use background predicates that implement well-established statistical and numerical analysis procedures. Any improvements in analytical sophistication that result are evaluated empirically using artificial and real-life data. Experiments utilising artificial data are concerned with extracting constraints for response variables in the text-book problem of balancing a pole on a cart. They illustrate the use of clausal definitions of arithmetic and trigonometric functions, inequalities, multiple linear regression, and numerical derivatives. A non-trivial problem concerning the prediction of mutagenic activity of nitroaromatic molecules is also examined. In this case, expert chemists have been unable to devise a model for explaining the data. The result demonstrates the combined use by an ILP program of logical and numerical capabilities to achieve an analysis that includes linear modelling, clustering and classification. In all experiments, the predictions obtained compare favourably against benchmarks set by more traditional methods of quantitative methods, namely, regression and neural-network.
european conference on machine learning | 1998
Rui Camacho
A new model of human control skills is proposed and empirically evaluated. It is called the incremental correction model and is more adequate for reverse engineering human control skills than any other previously proposed models. The experimental results show a considerable increase in robustness of the controllers that use the new model. The new model also attenuates the problem of unbalanced classes, noticed already in previous experiments. By means of Parameterised Decision Trees, propositional learners are still usable within the new models framework.
european conference on logics in artificial intelligence | 2006
Nuno A. Fonseca; Fernando M. A. Silva; Rui Camacho
Inductive Logic Programming (ILP) is a Machine Learning research field that has been quite successful in knowledge discovery in relational domains. ILP systems use a set of pre-classified examples (positive and negative) and prior knowledge to learn a theory in which positive examples succeed and the negative examples fail. In this paper we present a novel ILP system called April, capable of exploring several parallel strategies in distributed and shared memory machines.
Machine Learning | 2009
Nuno A. Fonseca; Ashwin Srinivasan; Fernando M. A. Silva; Rui Camacho
The growth of machine-generated relational databases, both in the sciences and in industry, is rapidly outpacing our ability to extract useful information from them by manual means. This has brought into focus machine learning techniques like Inductive Logic Programming (ILP) that are able to extract human-comprehensible models for complex relational data. The price to pay is that ILP techniques are not efficient: they can be seen as performing a form of discrete optimisation, which is known to be computationally hard; and the complexity is usually some super-linear function of the number of examples. While little can be done to alter the theoretical bounds on the worst-case complexity of ILP systems, some practical gains may follow from the use of multiple processors. In this paper we survey the state-of-the-art on parallel ILP. We implement several parallel algorithms and study their performance using some standard benchmarks. The principal findings of interest are these: (1) of the techniques investigated, one that simply constructs models in parallel on each processor using a subset of data and then combines the models into a single one, yields the best results; and (2) sequential (approximate) ILP algorithms based on randomized searches have lower execution times than (exact) parallel algorithms, without sacrificing the quality of the solutions found.
inductive logic programming | 2005
Nuno A. Fonseca; Fernando M. A. Silva; Rui Camacho
It is well known by Inductive Logic Programming (ILP) practioners that ILP systems usually take a long time to find valuable models (theories). The problem is specially critical for large datasets, preventing ILP systems to scale up to larger applications. One approach to reduce the execution time has been the parallelization of ILP systems. In this paper we overview the state-of-the-art on parallel ILP implementations and present work on the evaluation of some major parallelization strategies for ILP. Conclusions about the applicability of each strategy are presented.
Robotica | 1991
Eugénio C. Oliveira; Rui Camacho; Carlos Ramos
The use of Multi-Agent Systems as a Distributed AI paradigm for Robotics is the principal aim of our present work. In this paper we consider the needed concepts and a suitable architecture for a set of Agents in order to make it possible for them to cooperate in solving non-trivial tasks. Agents are sets of different software modules, each one implementing a function required for cooperation. A Monitor, an Acquaintance and Self-knowledge Modules, an Agenda and an Input queue, on the top of each Intelligent System, are fundamental modules that guarantee the process of cooperation, while the overall aim is devoted to the community of cooperative Agents. These Agents, which our testbed concerns, include Vision, Planner, World Model and the Robot itself.
practical aspects of declarative languages | 2013
Nicos Angelopoulos; Vítor Santos Costa; João Azevedo; Jan Wielemaker; Rui Camacho; Lodewyk F. A. Wessels
We present r..eal , a library that integrates the R statistical environment with Prolog. Due to Rs functional programming affinity the interface introduced has a minimalistic feel. Programs utilising the library syntax are elegant and succinct with intuitive semantics and clear integration. In effect, the library enhances logic programming with the ability to tap into the vast wealth of statistical and probabilistic reasoning available in R. The software is a useful addition to the efforts towards the integration of statistical reasoning and knowledge representation within an AI context. Furthermore it can be used to open up new application areas for logic programming and AI techniques such as bioinformatics, computational biology, text mining, psychology and neuro sciences, where R has particularly strong presence.