Giorgos Borboudakis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Giorgos Borboudakis is active.

Explore More

Publication

Featured researches published by Giorgos Borboudakis.

npj Computational Materials | 2017

Chemically intuited, large-scale screening of MOFs by machine learning techniques

Giorgos Borboudakis; Taxiarchis Stergiannakos; Maria G. Frysali; Emmanuel Klontzas; Ioannis Tsamardinos; George E. Froudakis

A novel computational methodology for large-scale screening of MOFs is applied to gas storage with the use of machine learning technologies. This approach is a promising trade-off between the accuracy of ab initio methods and the speed of classical approaches, strategically combined with chemical intuition. The results demonstrate that the chemical properties of MOFs are indeed predictable (stochastically, not deterministically) using machine learning methods and automated analysis protocols, with the accuracy of predictions increasing with sample size. Our initial results indicate that this methodology is promising to apply not only to gas storage in MOFs but in many other material science projects.Machine learning: Quickly screening materials for effective gas storageThe gas storage properties of metal-organic frameworks can now be quickly and accurately predicted by artificial intelligence. George Froudakis at the University of Crete has developed a machine learning approach to predict the H2/CO2 adsorption properties of metal-organic frameworks (MOFs), highly porous materials promising for catalysis and gas storage, based on their chemical structure. Previous methods were either too slow, or not accurate enough. Here, Froudakis and his team encoded ‘chemical intuition’ into their algorithm by training it to recognize certain structural features in MOFs with known properties. Then, when they applied the method to large-scale screening tests of new MOFs they found their predictions matched with experimental data. With this technique, it is hoped that new materials for CO2 sequestration or hydrogen storage will be discovered more quickly.

knowledge discovery and data mining | 2016

Towards Robust and Versatile Causal Discovery for Business Applications

Giorgos Borboudakis; Ioannis Tsamardinos

Causal discovery algorithms can induce some of the causal relations from the data, commonly in the form of a causal network such as a causal Bayesian network. Arguably however, all such algorithms lack far behind what is necessary for a true business application. We develop an initial version of a new, general causal discovery algorithm called ETIO with many features suitable for business applications. These include (a) ability to accept prior causal knowledge (e.g., taking senior driving courses improves driving skills), (b) admitting the presence of latent confounding factors, (c) admitting the possibility of (a certain type of) selection bias in the data (e.g., clients sampled mostly from a given region), (d) ability to analyze data with missing-by-design (i.e., not planned to measure) values (e.g., if two companies merge and their databases measure different attributes), and (e) ability to analyze data from different interventions (e.g., prior and posterior to an advertisement campaign). ETIO is an instance of the logical approach to integrative causal discovery that has been relatively recently introduced and enables the solution of complex reverse-engineering problems in causal discovery. ETIO is compared against the state-of-the-art and is shown to be more effective in terms of speed, with only a slight degradation in terms of learning accuracy, while incorporating all the features above. The code is available on the mensxmachina.org website.

Journal of data science | 2018

Constraint-based causal discovery with mixed data

Michail Tsagris; Giorgos Borboudakis; Vincenzo Lagani; Ioannis Tsamardinos

We address the problem of constraint-based causal discovery with mixed data types, such as (but not limited to) continuous, binary, multinomial, and ordinal variables. We use likelihood-ratio tests based on appropriate regression models and show how to derive symmetric conditional independence tests. Such tests can then be directly used by existing constraint-based methods with mixed data, such as the PC and FCI algorithms for learning Bayesian networks and maximal ancestral graphs, respectively. In experiments on simulated Bayesian networks, we employ the PC algorithm with different conditional independence tests for mixed data and show that the proposed approach outperforms alternatives in terms of learning accuracy.

Machine Learning | 2018

A greedy feature selection algorithm for Big Data of high dimensionality

Ioannis Tsamardinos; Giorgos Borboudakis; Pavlos Katsogridakis; Polyvios Pratikakis; Vassilis Christophides

We present the Parallel, Forward–Backward with Pruning (PFBP) algorithm for feature selection (FS) for Big Data of high dimensionality. PFBP partitions the data matrix both in terms of rows as well as columns. By employing the concepts of p-values of conditional independence tests and meta-analysis techniques, PFBP relies only on computations local to a partition while minimizing communication costs, thus massively parallelizing computations. Similar techniques for combining local computations are also employed to create the final predictive model. PFBP employs asymptotically sound heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores. An extensive comparative evaluation also demonstrates the effectiveness of PFBP against other algorithms in its class. The heuristics presented are general and could potentially be employed to other greedy-type of FS algorithms. An application on simulated Single Nucleotide Polymorphism (SNP) data with 500K samples is provided as a use case.

Machine Learning | 2018

Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation

Ioannis Tsamardinos; Elissavet Greasidou; Giorgos Borboudakis

Cross-Validation (CV), and out-of-sample performance-estimation protocols in general, are often employed both for (a) selecting the optimal combination of algorithms and values of hyper-parameters (called a configuration) for producing the final predictive model, and (b) estimating the predictive performance of the final model. However, the cross-validated performance of the best configuration is optimistically biased. We present an efficient bootstrap method that corrects for the bias, called Bootstrap Bias Corrected CV (BBC-CV). BBC-CV’s main idea is to bootstrap the whole process of selecting the best-performing configuration on the out-of-sample predictions of each configuration, without additional training of models. In comparison to the alternatives, namely the nested cross-validation (Varma and Simon in BMC Bioinform 7(1):91, 2006) and a method by Tibshirani and Tibshirani (Ann Appl Stat 822–829, 2009), BBC-CV is computationally more efficient, has smaller variance and bias, and is applicable to any metric of performance (accuracy, AUC, concordance index, mean squared error). Subsequently, we employ again the idea of bootstrapping the out-of-sample predictions to speed up the CV process. Specifically, using a bootstrap-based statistical criterion we stop training of models on new folds of inferior (with high probability) configurations. We name the method Bootstrap Bias Corrected with Dropping CV (BBCD-CV) that is both efficient and provides accurate performance estimates.

Journal of data science | 2018

Correction to: Constraint-based causal discovery with mixed data

Michail Tsagris; Giorgos Borboudakis; Vincenzo Lagani; Ioannis Tsamardinos

The article “Constraint-based causal discovery with mixed data,” written by Michail Tsagris, Giorgos Borboudakis, Vincenzo Lagani, and Ioannis Tsamardinos, was originally published electronically on the publisher’s internet portal (currently SpringerLink) on February 2, 2018, without open access.

npj Computational Materials | 2017