Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Axel J. Soto is active.

Publication


Featured researches published by Axel J. Soto.


Molecular Informatics | 2011

Target-Driven Subspace Mapping Methods and Their Applicability Domain Estimation.

Axel J. Soto; Gustavo E. Vazquez; Marc Strickert; Ignacio Ponzoni

This work describes a methodology for assisting virtual screening of drugs during the early stages of the drug development process. This methodology is proposed to improve the reliability of in silico property prediction and it is structured in two steps. Firstly, a transformation is sought for mapping a high‐dimensional space defined by potentially redundant or irrelevant molecular descriptors into a low‐dimensional application‐related space. For this task we evaluate three different target‐driven subspace mapping methods, out of which we highlight the recent Correlative Matrix Mapping (CMM) as the most stable. Secondly, we apply an applicability domain model on the low‐dimensional space for assessing confidentiality of compound classification. By a probabilistic framework the applicability domain approach identifies poorly represented compounds in the training set (extrapolation problems) and regions in the space where the uncertainty about the correct class is higher than normal (interpolation problems). This two‐step approach represents an important contribution to the development of confident prediction tools in the chemoinformatics area, where the field is in need of both interpretable models and methods that estimate the confidence of predictions.


Journal of Cheminformatics | 2015

Visual analytics in cheminformatics: user-supervised descriptor selection for QSAR methods

María Jimena Martínez; Ignacio Ponzoni; Mónica F. Díaz; Gustavo E. Vazquez; Axel J. Soto

BackgroundThe design of QSAR/QSPR models is a challenging problem, where the selection of the most relevant descriptors constitutes a key step of the process. Several feature selection methods that address this step are concentrated on statistical associations among descriptors and target properties, whereas the chemical knowledge is left out of the analysis. For this reason, the interpretability and generality of the QSAR/QSPR models obtained by these feature selection methods are drastically affected. Therefore, an approach for integrating domain expert’s knowledge in the selection process is needed for increase the confidence in the final set of descriptors.ResultsIn this paper a software tool, which we named Visual and Interactive DEscriptor ANalysis (VIDEAN), that combines statistical methods with interactive visualizations for choosing a set of descriptors for predicting a target property is proposed. Domain expertise can be added to the feature selection process by means of an interactive visual exploration of data, and aided by statistical tools and metrics based on information theory. Coordinated visual representations are presented for capturing different relationships and interactions among descriptors, target properties and candidate subsets of descriptors. The competencies of the proposed software were assessed through different scenarios. These scenarios reveal how an expert can use this tool to choose one subset of descriptors from a group of candidate subsets or how to modify existing descriptor subsets and even incorporate new descriptors according to his or her own knowledge of the target property.ConclusionsThe reported experiences showed the suitability of our software for selecting sets of descriptors with low cardinality, high interpretability, low redundancy and high statistical performance in a visual exploratory way. Therefore, it is possible to conclude that the resulting tool allows the integration of a chemist’s expertise in the descriptor selection process with a low cognitive effort in contrast with the alternative of using an ad-hoc manual analysis of the selected descriptors.


ambient intelligence | 2013

Detection of daily living activities using a two-stage Markov model

Love P. Kalra; Xinghui Zhao; Axel J. Soto; Evangelos E. Milios

A supervised statistical model for detecting the activities of daily living ADL from sensor data streams is proposed in this paper. This method works in two stages aiming at capturing temporal intra-and inter-activity relationships. In the first stage each activity is modeled separately by a Markov model where sensors correspond to states. By modeling each sensor as a state we capture the absolute and relational temporal features within the activities. A novel data segmentation approach is proposed for accurate inferencing at the first stage. To boost the accuracy, a second stage consisting of a Hidden Markov Model is added that serves two purposes. Firstly, it acts as a corrective stage, as it learns the probability of each activity being incorrectly inferred by the first stage, so that they can be corrected at the second stage. Secondly, it introduces inter-activity transition information to capture possible time-dependent relationships between two contiguous activities. We applied our method to three smart house datasets. Comparison of the results to other traditional approaches for ADL identification shows competitive or better performance. The paper also proposes a deployment of our methodology using an agent-based architecture.


Molecules | 2012

QSPR Models for Predicting Log P liver Values for Volatile Organic Compounds Combining Statistical Methods and Domain Knowledge

Damián Palomba; María Jimena Martínez; Ignacio Ponzoni; Mónica Fátima Díaz; Gustavo E. Vazquez; Axel J. Soto

Volatile organic compounds (VOCs) are contained in a variety of chemicals that can be found in household products and may have undesirable effects on health. Thereby, it is important to model blood-to-liver partition coefficients (log Pliver) for VOCs in a fast and inexpensive way. In this paper, we present two new quantitative structure-property relationship (QSPR) models for the prediction of log Pliver, where we also propose a hybrid approach for the selection of the descriptors. This hybrid methodology combines a machine learning method with a manual selection based on expert knowledge. This allows obtaining a set of descriptors that is interpretable in physicochemical terms. Our regression models were trained using decision trees and neural networks and validated using an external test set. Results show high prediction accuracy compared to previous log Pliver models, and the descriptor selection approach provides a means to get a small set of descriptors that is in agreement with theoretical understanding of the target property.


evolutionary computation machine learning and data mining in bioinformatics | 2008

A wrapper-based feature selection method for ADMET prediction using evolutionary computing

Axel J. Soto; Rocío L. Cecchini; Gustavo E. Vazquez; Ignacio Ponzoni

Wrapper methods look for the selection of a subset of features or variables in a data set, in such a way that these features are the most relevant for predicting a target value. In chemoinformatics context, the determination of the most significant set of descriptors is of great importance due to their contribution for improving ADMET prediction models. In this paper, a comprehensive analysis of descriptor selection aimed to physicochemical property prediction is presented. In addition, we propose an evolutionary approach where different fitness functions are compared. The comparison consists in establishing which method selects the subset of descriptors that best predicts a given property, as well as maintaining the cardinality of the subset to a minimum. The performance of the proposal was assessed for predicting hydrophobicity, using an ensemble of neural networks for the prediction task. The results showed that the evolutionary approach using a non linear fitness function constitutes a novel and a promising technique for this bioinformatic application.


document engineering | 2015

Similarity-Based Support for Text Reuse in Technical Writing

Axel J. Soto; Abidalrahman Mohammad; Andrew Albert; Aminul Islam; Evangelos E. Milios; Michael Doyle; Rosane Minghim; Maria Cristina Ferreira de Oliveira

Technical writing in professional environments, such as user manual authoring for new products, is a task that relies heavily on reuse of content. Therefore, technical content is typically created following a strategy where modular units of text have references to each other. One of the main challenges faced by technical authors is to avoid duplicating existing content, as this adds unnecessary effort, generates undesirable inconsistencies, and dramatically increases maintenance and translation costs. However, there are few computational tools available to support this activity. This paper investigates the use of different similarity methods for the task of identification of reuse opportunities in technical writing. We evaluated our results using existing ground truth as well as feedback from technical authors. Finally, we also propose a tool that combines text similarity algorithms with interactive visualizations to aid authors in understanding differences in a collection of topics and identifying reuse opportunities.


Ksii Transactions on Internet and Information Systems | 2015

Exploratory Visual Analysis and Interactive Pattern Extraction from Semi-Structured Data

Axel J. Soto; Ryan Kiros; Vlado Keselj; Evangelos E. Milios

Semi-structured documents are a common type of data containing free text in natural language (unstructured data) as well as additional information about the document, or meta-data, typically following a schema or controlled vocabulary (structured data). Simultaneous analysis of unstructured and structured data enables the discovery of hidden relationships that cannot be identified from either of these sources when analyzed independently of each other. In this work, we present a visual text analytics tool for semi-structured documents (ViTA-SSD), that aims to support the user in the exploration and finding of insightful patterns in a visual and interactive manner in a semi-structured collection of documents. It achieves this goal by presenting to the user a set of coordinated visualizations that allows the linking of the metadata with interactively generated clusters of documents in such a way that relevant patterns can be easily spotted. The system contains two novel approaches in its back end: a feature-learning method to learn a compact representation of the corpus and a fast-clustering approach that has been redesigned to allow user supervision. These novel contributions make it possible for the user to interact with a large and dynamic document collection and to perform several text analytical tasks more efficiently. Finally, we present two use cases that illustrate the suitability of the system for in-depth interactive exploration of semi-structured document collections, two user studies, and results of several evaluations of our text-mining components.


ISAmI | 2012

A Two-Stage Corrective Markov Model for Activities of Daily Living Detection

Love P. Kalra; Xinghui Zhao; Axel J. Soto; Evangelos E. Milios

In this paper we propose a two-stage, supervised statistical model for detecting the activities of daily living (ADL) from sensor data streams. In the first stage each activity is modeled separately by a Markov model where sensors correspond to states. By modeling each sensor as a state we capture the absolute and relational temporal features of the atomic activities. A novel data segmentation approach is proposed for accurate inferencing at the first stage. To boost the accuracy, a second stage consisting of a Hidden Markov Model is added that serves two purposes. Firstly, it acts as a corrective stage, as it learns the probability of each activity being incorrectly inferred by the first stage, so that they can be corrected at the second stage. Secondly, it introduces inter-activity transition information to capture possible time-dependent relationships between two contiguous activities. We applied our method to three ADL datasets to show its suitability to this domain.


document engineering | 2013

A graph-based topic extraction method enabling simple interactive customization

Ajitesh Srivastava; Axel J. Soto; Evangelos E. Milios

It is often desirable to identify the concepts that are present in a corpus. A popular way to deal with this objective is to discover clusters of words or topics, for which many algorithms exist in the literature. Yet most of these methods lack the interpretability that would enable interaction with a user not familiar with their inner workings. The paper proposes a graph-based topic extraction algorithm, which can also be viewed as a soft-clustering of words present in a given corpus. Each topic, in the form of a set of words, represents an underlying concept in the corpus. The method allows easy interpretation of the clustering process, and hence enables the scope of user involvement at various steps. For a quantitative evaluation of the topics extracted, we use them as features to get a compact representation of documents for classification tasks. We compare the classification accuracy achieved by a reduced feature set obtained with our method versus other topic extraction techniques, namely Latent Dirichlet Allocation and Non-negative Matrix Factorization. While the results from all the three algorithms are comparable, the speed and easy interpretability of our algorithm makes it more appropriate to be used interactively by lay users.


international conference on machine learning and applications | 2015

Active Information Retrieval for Linking Twitter Posts with Political Debates

Raheleh Makki; Axel J. Soto; Stephen Brooks; Evangelos E. Milios

Users of microblogging social networks produce millions of short messages every day. Retrieving relevant information to a particular event from this sheer volume of data is not a trivial task. In this paper, we present a framework for the retrieval of Twitter posts that are relevant to a set of political debates. Our main contribution is the proposal of a set of strategies for involving the user in the retrieval process, so that by presenting to her meaningful posts to be labeled, the method achieves a noticeably higher accuracy. The correct retrieval or labeling could be provided by an external information source such as a domain expert, or simulated with an oracle. A key aspect of active retrieval methods is to request the labels of the instances that help improve the retrieval accuracy the most, while keeping the number of labeling requests to a minimum. The proposed strategies for selecting labeling requests make use of the textual content of tweets and their structural information. The experimental results show the advantages of the proposed methods and the effectiveness of the selection strategies for involving the user in the retrieval process.

Collaboration


Dive into the Axel J. Soto's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gustavo E. Vazquez

Universidad Católica del Uruguay Dámaso Antonio Larrañaga

View shared research outputs
Top Co-Authors

Avatar

Ignacio Ponzoni

Universidad Nacional del Sur

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge