Abraham Yosipof | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Abraham Yosipof is active.

Explore More

Publication

Featured researches published by Abraham Yosipof.

Journal of Chemical Information and Modeling | 2014

Optimization of Molecular Representativeness

Abraham Yosipof; Hanoch Senderowitz

Representative subsets selected from within larger data sets are useful in many chemoinformatics applications including the design of information-rich compound libraries, the selection of compounds for biological evaluation, and the development of reliable quantitative structure-activity relationship (QSAR) models. Such subsets can overcome many of the problems typical of diverse subsets, most notably the tendency of the latter to focus on outliers. Yet only a few algorithms for the selection of representative subsets have been reported in the literature. Here we report on the development of two algorithms for the selection of representative subsets from within parent data sets based on the optimization of a newly devised representativeness function either alone or simultaneously with the MaxMin function. The performances of the new algorithms were evaluated using several measures representing their ability to produce (1) subsets which are, on average, close to data set compounds; (2) subsets which, on average, span the same space as spanned by the entire data set; (3) subsets mirroring the distribution of biological indications in a parent data set; and (4) test sets which are well predicted by qualitative QSAR models built on data set compounds. We demonstrate that for three data sets (containing biological indication data, logBBB permeation data, and Plasmodium falciparum inhibition data), subsets obtained using the new algorithms are more representative than subsets obtained by hierarchical clustering, k-means clustering, or the MaxMin optimization at least in three of these measures.

Journal of Computational Chemistry | 2015

k-Nearest neighbors optimization-based outlier removal.

Abraham Yosipof; Hanoch Senderowitz

Datasets of molecular compounds often contain outliers, that is, compounds which are different from the rest of the dataset. Outliers, while often interesting may affect data interpretation, model generation, and decisions making, and therefore, should be removed from the dataset prior to modeling efforts. Here, we describe a new method for the iterative identification and removal of outliers based on a k‐nearest neighbors optimization algorithm. We demonstrate for three different datasets that the removal of outliers using the new algorithm provides filtered datasets which are better than those provided by four alternative outlier removal procedures as well as by random compound removal in two important aspects: (1) they better maintain the diversity of the parent datasets; (2) they give rise to quantitative structure activity relationship (QSAR) models with much better prediction statistics. The new algorithm is, therefore, suitable for the pretreatment of datasets prior to QSAR modeling.

Molecular Informatics | 2015

Data Mining and Machine Learning Tools for Combinatorial Material Science of All-Oxide Photovoltaic Cells

Abraham Yosipof; Oren E. Nahum; Assaf Y. Anderson; Hannah-Noa Barad; Arie Zaban; Hanoch Senderowitz

Growth in energy demands, coupled with the need for clean energy, are likely to make solar cells an important part of future energy resources. In particular, cells entirely made of metal oxides (MOs) have the potential to provide clean and affordable energy if their power conversion efficiencies are improved. Such improvements require the development of new MOs which could benefit from combining combinatorial material sciences for producing solar cells libraries with data mining tools to direct synthesis efforts. In this work we developed a data mining workflow and applied it to the analysis of two recently reported solar cell libraries based on Titanium and Copper oxides. Our results demonstrate that QSAR models with good prediction statistics for multiple solar cells properties could be developed and that these models highlight important factors affecting these properties in accord with experimental findings. The resulting models are therefore suitable for designing better solar cells.

Journal of Cheminformatics | 2017

RANdom SAmple Consensus (RANSAC) algorithm for material-informatics: application to photovoltaic solar cells

Omer Kaspi; Abraham Yosipof; Hanoch Senderowitz

An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a “one stop shop” algorithm for developing and validating QSAR models, performing outlier removal, descriptors selection, model development and predictions for test set samples using applicability domain. For “future” predictions (i.e., for samples not included in the original test set) RANSAC provides a statistical estimate for the probability of obtaining reliable predictions, i.e., predictions within a pre-defined number of standard deviations from the true values. In this work we describe the first application of RNASAC in material informatics, focusing on the analysis of solar cells. We demonstrate that for three datasets representing different metal oxide (MO) based solar cell libraries RANSAC-derived models select descriptors previously shown to correlate with key photovoltaic properties and lead to good predictive statistics for these properties. These models were subsequently used to predict the properties of virtual solar cells libraries highlighting interesting dependencies of PV properties on MO compositions.

Frontiers in chemistry | 2018

Data Mining and Machine Learning Models for Predicting Drug Likeness and Their Disease or Organ Category

Abraham Yosipof; Rita C. Guedes; Alfonso T. García-Sosa

Data mining approaches can uncover underlying patterns in chemical and pharmacological property space decisive for drug discovery and development. Two of the most common approaches are visualization and machine learning methods. Visualization methods use dimensionality reduction techniques in order to reduce multi-dimension data into 2D or 3D representations with a minimal loss of information. Machine learning attempts to find correlations between specific activities or classifications for a set of compounds and their features by means of recurring mathematical models. Both models take advantage of the different and deep relationships that can exist between features of compounds, and helpfully provide classification of compounds based on such features or in case of visualization methods uncover underlying patterns in the feature space. Drug-likeness has been studied from several viewpoints, but here we provide the first implementation in chemoinformatics of the t-Distributed Stochastic Neighbor Embedding (t-SNE) method for the visualization and the representation of chemical space, and the use of different machine learning methods separately and together to form a new ensemble learning method called AL Boost. The models obtained from AL Boost synergistically combine decision tree, random forests (RF), support vector machine (SVM), artificial neural network (ANN), k nearest neighbors (kNN), and logistic regression models. In this work, we show that together they form a predictive model that not only improves the predictive force but also decreases bias. This resulted in a corrected classification rate of over 0.81, as well as higher sensitivity and specificity rates for the models. In addition, separation and good models were also achieved for disease categories such as antineoplastic compounds and nervous system diseases, among others. Such models can be used to guide decision on the feature landscape of compounds and their likeness to either drugs or other characteristics, such as specific or multiple disease-category(ies) or organ(s) of action of a molecule.

Journal of Physical Chemistry A | 2013

Nucleophilic and electrophilic reactions of polyynes catalyzed by an electric field: toward barcoding of carbon nanotubes like long homogeneous substrates.

Abraham Yosipof; Harold Basch; Shmaryahu Hoz

Computational studies at the B3LYP/6-31+G* level were carried out on the addition of pyridine to polyynes (C6-C18) and on the protonation of polyynes by methyl ammonium fluoride under electric fields of 2.5 and 5 MV/cm. The electric field in each case was oriented along the polyyne axis in a direction that enhances the reaction by stabilizing the incipient dipole. It was found that the reaction of pyridine addition is endothermic with a late transition state. The longer the polyynes and the stronger the field, the electric field catalysis was more efficient. Extrapolation of the data to long polyynes shows that at 1000 nm an electric field of 50 000 V/cm will reduce the barrier by 10 kcal/mol. This reduction is equivalent to 7 orders of magnitude in rate enhancement. A similar barrier reduction could be achieved with a 2.5 MV/cm field at a polyyne length of 20 nm. Protonation reactions were found to be much more affected by the electric field. A reduction of the reaction barrier by 10 kcal/mol using a 2.5 MV/cm electric field could be achieved at a polyyne length of 10 nm. Thus the electric field along the long axis of a substrate could induce a gradient of reactivity which could, in principle, enable the barcoding of substrates by using a sequence of reactants having different reactivities.

Molecular Informatics | 2016

Visualization Based Data Mining for Comparison Between Two Solar Cell Libraries

Abraham Yosipof; Omer Kaspi; Koushik Majhi; Hanoch Senderowitz

Material informatics may provide meaningful insights and powerful predictions for the development of new and efficient Metal Oxide (MO) based solar cells. The main objective of this paper is to establish the usefulness of data reduction and visualization methods for analyzing data sets emerging from multiple all‐MOs solar cell libraries. For this purpose, two libraries, TiO2|Co3O4 and TiO2|Co3O4|MoO3, differing only by the presence of a MoO3 layer in the latter were analyzed with Principal Component Analysis and Self‐Organizing Maps. Both analyses suggest that the addition of the MoO3 layer to the TiO2|Co3O4 library has affected the overall photovoltaic (PV) activity profile of the solar cells making the two libraries clearly distinguishable from one another. Furthermore, while MoO3 had an overall favorable effect on PV parameters, a sub‐population of cells was identified which were either indifferent to its presence or even demonstrated a reduction in several parameters.

Molecular Informatics | 2016

Materials Informatics: Statistical Modeling in Material Science

Abraham Yosipof; Klimentiy Shimanovich; Hanoch Senderowitz

Material informatics is engaged with the application of informatic principles to materials science in order to assist in the discovery and development of new materials. Central to the field is the application of data mining techniques and in particular machine learning approaches, often referred to as Quantitative Structure Activity Relationship (QSAR) modeling, to derive predictive models for a variety of materials‐related “activities”. Such models can accelerate the development of new materials with favorable properties and provide insight into the factors governing these properties. Here we provide a comparison between medicinal chemistry/drug design and materials‐related QSAR modeling and highlight the importance of developing new, materials‐specific descriptors. We survey some of the most recent QSAR models developed in materials science with focus on energetic materials and on solar cells. Finally we present new examples of material‐informatic analyses of solar cells libraries produced from metal oxides using combinatorial material synthesis. Different analyses lead to interesting physical insights as well as to the design of new cells with potentially improved photovoltaic parameters.

Molecular Informatics | 2018

PV Analyzer: A Decision Support System for Photovoltaic Solar Cells Libraries

Omer Kaspi; Abraham Yosipof; Hanoch Senderowitz

This work describes the integration of several data mining and machine learning tools for researching Photovoltaic (PV) solar cells libraries into a unified workflow embedded within a GUI‐supported Decision Support System (DSS), named PV Analyzer. The analyzers workflow is composed of several data analysis components including basic statistical and visualization methods as well as an algorithm for building predictive machine learning models. The analyzer allows for the identification of interesting trends within the libraries, not easily observable using simple bi‐parametric correlations. This may lead to new insights into factor affecting solar cells performances with the ultimate goal of designing better solar cells. The analyzer was developed using MATLAB version R2014a and consequently could be easily extended by adding additional tools and algorithms. Furthermore, while in our hands, the analyzer has been primarily used in the area of PV cells, is it equally applicable to the analysis of any other dataset composed of activities as dependent variables and descriptors as independent variables.

Archive | 2016

Optimization Algorithms for Chemoinformatics and Material-informatics

Abraham Yosipof; Hanoch Senderowitz

Modeling complex phenomena in chemoinformatics and material-informatics can often be formulated as single-objective or multi-objective optimization problems (SOOPs or MOOPs). For example, the design of new drugs or new materials is inherently a MOOP since drugs/materials require the simultaneous optimization of multiple parameters. In this chapter, we present several algorithms based on global stochastic optimiza‐ tion. These algorithms are applicable to multiple tasks in chemoinformatics and material-informatics including the following: (1) representativeness analysis, namely the selection of a representative subset from within a parent data set. (2) Derivation of quantitative structure–activity relationship models. Such models are used in multiple areas to predict activities from structures and to provide insight into factors (e.g., descriptors) governing activities. (3) Outlier removal to clean a parent data set from objects (e.g., compounds) that may demonstrate abnormal behavior. The performances of the new algorithms were evaluated using different data sets and multiple measures and were found to outperform previously reported methods. Due to the modular nature of the algorithms, they could be combined into machinelearning workflows. In the final section, we provide an example of one such work‐ flow and apply it to the development of predictive models in pharmaceutical and material sciences.

Explore More