Jorge Alberto Jaramillo-Garzón
National University of Colombia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jorge Alberto Jaramillo-Garzón.
BMC Bioinformatics | 2013
Jorge Alberto Jaramillo-Garzón; Joan Josep Gallardo-Chacón; César Germán Castellanos-Domínguez; Alexandre Perera-Lluna
BackgroundProteins are the key elements on the path from genetic information to the development of life. The roles played by the different proteins are difficult to uncover experimentally as this process involves complex procedures such as genetic modifications, injection of fluorescent proteins, gene knock-out methods and others. The knowledge learned from each protein is usually annotated in databases through different methods such as the proposed by The Gene Ontology (GO) consortium. Different methods have been proposed in order to predict GO terms from primary structure information, but very few are available for large-scale functional annotation of plants, and reported success rates are much less than the reported by other non-plant predictors. This paper explores the predictability of GO annotations on proteins belonging to the Embryophyta group from a set of features extracted solely from their primary amino acid sequence.ResultsHigh predictability of several GO terms was found for Molecular Function and Cellular Component. As expected, a lower degree of predictability was found on Biological Process ontology annotations, although a few biological processes were easily predicted. Proteins related to transport and transcription were particularly well predicted from primary structure information. The most discriminant features for prediction were those related to electric charges of the amino-acid sequence and hydropathicity derived features.ConclusionsAn analysis of GO-slim terms predictability in plants was carried out, in order to determine single categories or groups of functions that are most related with primary structure information. For each highly predictable GO term, the responsible features of such successfulness were identified and discussed. In addition to most published studies, focused on few categories or single ontologies, results in this paper comprise a complete landscape of GO predictability from primary structure encompassing 75 GO terms at molecular, cellular and phenotypical level. Thus, it provides a valuable guide for researchers interested on further advances in protein function prediction on Embryophyta plants.
iberoamerican congress on pattern recognition | 2013
Andrés Felipe Giraldo-Forero; Jorge Alberto Jaramillo-Garzón; José Francisco Ruiz-Muñoz; César Germán Castellanos-Domínguez
Multi-label learning has been becoming an increasingly active area into the machine learning community since a wide variety of real world problems are naturally multi-labeled. However, it is not uncommon to find disparities among the number of samples of each class, which constitutes an additional challenge for the learning algorithm. Smote is an oversampling technique that has been successfully applied for balancing single-labeled data sets, but has not been used in multi-label frameworks so far. In this work, several strategies are proposed and compared in order to generate synthetic samples for balancing data sets in the training of multi-label algorithms. Results show that a correct selection of seed samples for oversampling improves the classification performance of multi-label algorithms. The uniform generation oversampling, provides an efficient methodology for a wide scope of real world problems.
international conference of the ieee engineering in medicine and biology society | 2010
Jorge Alberto Jaramillo-Garzón; A. Perera-Lluna; César Germán Castellanos-Domínguez
An analysis of the predictability of subcellular locations is performed by using simple pattern recognition techniques in an attempt to capture the real dimensions of the problem at hand. Results show that there are some particular locations that does not need of high complexity classification models to be predicted with high accuracies, and some partial biological explanations are formulated. All the experiments were carried out over a set of Arabidopsis Thaliana proteins and classes were defined according to the plants GO slim.
international conference of the ieee engineering in medicine and biology society | 2011
G. A. Arango-Argoty; Jorge Alberto Jaramillo-Garzón; S. Röthlisberger; César Germán Castellanos-Domínguez
Predict the function of unknown proteins is one of the principal goals in computational biology. The subcellular localization of a protein allows further understanding its structure and molecular function. Numerous prediction techniques have been developed, usually focusing on global information of the protein. But, predictions can be done through the identification of functional sub-sequence patterns known as motifs. For motifs discovery problem, many methods requires a predefined fixed window size in advance and aligned sequences. To confront these problems we proposed a method based on variable length motifs characterization and detection using the continuous wavelet transform (CWT) and a dissimilarity space representation. For analyzing the motifs results generated by our approach, we divide the entire dataset into training (60%) and validation (40%). A Support Vector Machine (SVM) classifier is used as predictor for validation set. The highest Sn = 82.58% and Sp = 92.86%, across 10-fold cross validation, is obtained for endosome proteins. Average results Sn = 74% and Sp = 75.58% are comparable to current state of the art. For data sets whose identity is low (< 40%), the motifs characterization and localization based on CWT shows a good performance and the interpretability of the subsequences in each subcellular localization.
international conference of the ieee engineering in medicine and biology society | 2014
Martinez-Tabares Fj; Jorge Alberto Jaramillo-Garzón; Germán Castellanos-Domínguez
Nowadays, the use of Wearable User Interfaces has been extensively growing in medical monitoring applications. However, production and manufacture of prototypes without automation tools may lead to non viable results since it is often common to find an optimization problem where several variables are in conflict with each other. Thus, it is necessary to design a strategy for balancing the variables and constraints, systematizing the design in order to reduce the risks that are present when it is exclusively guided by the intuition of the developer. This paper proposes a framework for designing wearable ECG monitoring systems using multi-objective optimization. The main contributions of this work are the model to automate the design process, including a mathematical expression relating the principal variables that make up the criteria of functionality and wearability. We also introduce a novel yardstick for deciding the location of electrodes, based on reducing interference from ECG by maximizing the electrode-skin contact.
international conference of the ieee engineering in medicine and biology society | 2013
Andrés Felipe Giraldo-Forero; Jorge Alberto Jaramillo-Garzón; César Germán Castellanos-Domínguez
A comparative analysis of four multi-label classification methods is performed in order to determine the best topology for the problem of protein function prediction, using support vector machines as base classifiers. Comparisons are done in terms of performance and computational cost of parallelized versions of the algorithms, for determining its applicability in high-throughput scenarios. Results show that the performance of the binary relevance strategy, together with a technique of class balance, remains above several recently proposed techniques for the problem at hand, while employing the smallest computational cost when parallelized. However, stacked classifiers and chain classifications can be conveniently used in pipelines, due to the low number of false positives reported.
international conference of the ieee engineering in medicine and biology society | 2012
G. A. Arango-Argoty; José Francisco Ruiz-Muñoz; Jorge Alberto Jaramillo-Garzón; César Germán Castellanos-Domínguez
Predicting the sub-cellular localization of a protein can provide useful information to uncover its molecular functions. In this sense, numerous prediction techniques have been developed, which usually have been focused on global information of the protein or sequence alignments. However, several studies have shown that the functional nature of proteins is ruled by conserved sub-sequence patterns known as domains. In this paper, an alternative methodology (PfamFeat) for gram-positive bacterial sub-cellular localization was developed. PfamFeat is based on information provided by Pfam database, which stores a series of HMM-profiles describing common protein domains. The likelihood of a sequence, to be generated by a given HMM-profile, can be used to characterize sequences in order to use pattern recognition techniques. Success rates obtained with a simple one-nearest neighbor classifier demonstrate that this method is competitive with popular sub-cellular prediction algorithms and it constitutes a promising research trend.
international conference on bioinformatics and biomedical engineering | 2015
Andrés Felipe Giraldo-Forero; Jorge Alberto Jaramillo-Garzón; César Germán Castellanos-Domínguez
This work presents an analysis of six example-based metrics conventionally used to measure the classification performance in multi-label problems. roc curves are used for depicting the different trade-offs generated from each measure. The results show that measures diverge when performances decrease, which demonstrates the importance of selecting the right performance measure regarding to the application at hand. Hamming loss proved to be the wrong choice when sensitive classifiers are wanted, since it does not take into account the imbalance between classes. In turn, geometric mean showed a higher affinity to identify true positives. Additionally, the Matthews correlation coefficient and F-measure showed comparable results in most cases.
international conference on bioinformatics and biomedical engineering | 2015
Jorge Alberto Jaramillo-Garzón; Jacobo Castro-Ceballos; Germán Castellanos-Domínguez
Sub-cellular localization prediction is an important step for inferring protein functions. Several strategies have been developed in the recent years to solve this problem, from alignment-based solutions to feature-based solutions. However, under some identity thesholds, these kind of approaches fail to detect homologous sequences, achieving predictions with low specificity and sensitivity. Here, a novel methodology is proposed for classifying proteins with low identity levels. This approach implements a simple, yet powerful assumption that employs hierarchical clustering and hidden Markov models, obtaining high performance on the prediction of four different sub-cellular localizations.
2015 20th Symposium on Signal Processing, Images and Computer Vision (STSIVA) | 2015
A. F. Cardona-Escobar; J. C. Pineda-Iral; N. Guarnizo-Cutiva; Jorge Alberto Jaramillo-Garzón
This work implements a type of string kernel called Mismatch kernel, together with a methodology involving Support Vector Machines (SVM) for solving 14 molecular function classification problems of land plants (Embryophyta). The implemented methodology uses metaheuristic bio-inspired algorithms for finding optimal hyperparameters of the SVM, to solve the problem of imbalanced data class weights are also taken as hyperparameters in order to avoid sampling methods. The results were compared with the RBF (radial basis function) kernel over the same methodology. Geometric mean between specificity and sensitivity was used as the performance measure, showing that string kernels are the most suitable choice for the problem at hand.