Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Rok Blagus is active.

Publication


Featured researches published by Rok Blagus.


BMC Bioinformatics | 2010

Class prediction for high-dimensional class-imbalanced data

Rok Blagus; Lara Lusa

BackgroundThe goal of class prediction studies is to develop rules to accurately predict the class membership of new samples. The rules are derived using the values of the variables available for each subject: the main characteristic of high-dimensional data is that the number of variables greatly exceeds the number of samples. Frequently the classifiers are developed using class-imbalanced data, i.e., data sets where the number of samples in each class is not equal. Standard classification methods used on class-imbalanced data often produce classifiers that do not accurately predict the minority class; the prediction is biased towards the majority class. In this paper we investigate if the high-dimensionality poses additional challenges when dealing with class-imbalanced prediction. We evaluate the performance of six types of classifiers on class-imbalanced data, using simulated data and a publicly available data set from a breast cancer gene-expression microarray study. We also investigate the effectiveness of some strategies that are available to overcome the effect of class imbalance.ResultsOur results show that the evaluated classifiers are highly sensitive to class imbalance and that variable selection introduces an additional bias towards classification into the majority class. Most new samples are assigned to the majority class from the training set, unless the difference between the classes is very large. As a consequence, the class-specific predictive accuracies differ considerably. When the class imbalance is not too severe, down-sizing and asymmetric bagging embedding variable selection work well, while over-sampling does not. Variable normalization can further worsen the performance of the classifiers.ConclusionsOur results show that matching the prevalence of the classes in training and test set does not guarantee good performance of classifiers and that the problems related to classification with class-imbalanced data are exacerbated when dealing with high-dimensional data. Researchers using class-imbalanced data should be careful in assessing the predictive accuracy of the classifiers and, unless the class imbalance is mild, they should always use an appropriate method for dealing with the class imbalance problem.


BMC Bioinformatics | 2013

SMOTE for high-dimensional class-imbalanced data

Rok Blagus; Lara Lusa

BackgroundClassification using class-imbalanced data is biased in favor of the majority class. The bias is even larger for high-dimensional data, where the number of variables greatly exceeds the number of samples. The problem can be attenuated by undersampling or oversampling, which produce class-balanced data. Generally undersampling is helpful, while random oversampling is not. Synthetic Minority Oversampling TEchnique (SMOTE) is a very popular oversampling method that was proposed to improve random oversampling but its behavior on high-dimensional data has not been thoroughly investigated. In this paper we investigate the properties of SMOTE from a theoretical and empirical point of view, using simulated and real high-dimensional data.ResultsWhile in most cases SMOTE seems beneficial with low-dimensional data, it does not attenuate the bias towards the classification in the majority class for most classifiers when data are high-dimensional, and it is less effective than random undersampling. SMOTE is beneficial for k-NN classifiers for high-dimensional data if the number of variables is reduced performing some type of variable selection; we explain why, otherwise, the k-NN classification is biased towards the minority class. Furthermore, we show that on high-dimensional data SMOTE does not change the class-specific mean values while it decreases the data variability and it introduces correlation between samples. We explain how our findings impact the class-prediction for high-dimensional data.ConclusionsIn practice, in the high-dimensional setting only k-NN classifiers based on the Euclidean distance seem to benefit substantially from the use of SMOTE, provided that variable selection is performed before using SMOTE; the benefit is larger if more neighbors are used. SMOTE for k-NN without variable selection should not be used, because it strongly biases the classification towards the minority class.


Neurourology and Urodynamics | 2011

Stroke patients who regain urinary continence in the first week after acute first‐ever stroke have better prognosis than patients with persistent lower urinary tract dysfunction

Melita Rotar; Rok Blagus; Miran Jeromel; Miha Škrbec; Bojan Tršinar; David B. Vodušek

Urinary incontinence (UI) is a predictor of greater mortality and poor functional recovery; however published studies failed to evaluate lower urinary tract (LUT) function immediately after stroke. The aim of our study was to evaluate the course of LUT function in the first week after stroke, and its impact on prognosis.


Histopathology | 2014

Evaluation of a new grading system for laryngeal squamous intraepithelial lesions—a proposed unified classification

Nina Gale; Rok Blagus; Samir K. El-Mofty; Tim Helliwell; Manju L. Prasad; Ann Sandison; Metka Volavšek; Bruce M. Wenig; Nina Zidar; Antonio Cardesa

To verify the applicability, reproducibility and predictive value of a proposed unified classification (amended Ljubljana classification) for laryngeal squamous intraepithelial lesions (SILs).


international conference on machine learning and applications | 2012

Evaluation of SMOTE for High-Dimensional Class-Imbalanced Microarray Data

Rok Blagus; Lara Lusa

Synthetic Minority Oversampling TEchnique (SMOTE) is a popular oversampling method that was proposed to improve random oversampling but its behavior on high-dimensional data has not been thoroughly investigated. In this paper we evaluate the performance of SMOTE on high-dimensional data, using gene expression microarray data. We observe that SMOTE does not attenuate the bias towards the classification in the majority class for most classifiers, and it is less effective than random undersampling. SMOTE is beneficial for k-NN classifiers based on the Euclidean distance if the number of variables is reduced performing some type of variable selection and the benefit is larger if more neighbors are used. If the variable selection is not performed than the k-NN classification is counter intuitively biased towards the minority class, so SMOTE for k-NN without variable selection should not be used in practice.


BMC Veterinary Research | 2014

Prevalence and molecular characterization of Clostridium difficile isolated from European Barn Swallows (Hirundo rustica) during migration

Petra Bandelj; Tomi Trilar; Rok Blagus; Matjaz Ocepek; Joyce Rousseau; J. Scott Weese; Modest Vengust

BackgroundClostridium difficile is an important bacterial pathogen of humans and a variety of animal species. Birds, especially migratory passerine species, can play a role in the spread of many pathogens, including Clostridium difficile. Barn Swallows (Hirundo rustica) nest in close proximity to human habitats and their biology is closely associated with cattle farming. Therefore, we hypothesized that Barn Swallows can be the reservoir of Clostridium difficile.ResultsBarn Swallows (n = 175) were captured on their autumn migration across Europe to sub-Saharan Africa. Droppings were collected from juvenile (n = 152) and adult birds (n = 23). Overall prevalence of Clostridium difficile was 4% (7/175); 4.6% (7/152) in juvenile birds and 0/23 in adults. Clostridium difficile ribotypes 078, 002 and 014 were identified, which are commonly found in farm animals and humans. Three new Clostridium difficile ribotypes were also identified: SB3, SB159 and SB166, one of which was toxigenic, harbouring genes for toxins A and B.ConclusionsResults of this study indicate that Barn Swallows might play a role in national and international dissemination of Clostridium difficile and could serve as a source for human and animal infection. Clostridium difficile ribotype 078 was identified, which has been reported as an emerging cause of community-associated Clostridium difficile infection in humans. Based on this and other studies, however, it is more likely that Barn Swallows have a more indicative than perpetuating role in Clostridium difficile epidemiology.


Muscle & Nerve | 2016

Single fiber EMG as a prognostic tool in myasthenia gravis.

Mateja Baruca; Lea Leonardis; Simon Podnar; Tanja Hojs‐Fabjan; Anton Grad; Aleš Jerin; Rok Blagus; Saša Šega‐Jazbec

Introduction: Single fiber electromyography (SFEMG) is the most sensitive diagnostic tool for diagnosis of myasthenia gravis (MG). Its prognostic value is not known. Methods: We retrospectively analyzed the clinical course of 232 MG patients who presented with only mild symptoms and had SFEMG of the orbicularis oculi muscle. We correlated their SFEMG results with the severity of their later clinical course. Results: During the observation period 39 patients (17%) developed severe disease exacerbations, and 193 (83%) remained stable. Patients with severe disease exacerbation had a significantly higher mean jitter value (P < 0.0001), a greater percentage of fibers with increased jitter (P < 0.0001), and/or impulse blocking (P < 0.0001) on SFEMG. Conclusions: The extent of the SFEMG abnormalities in this study correlated with the later clinical course of MG. Muscle Nerve 54: 1034–1040, 2016


BMC Bioinformatics | 2013

Improved shrunken centroid classifiers for high-dimensional class-imbalanced data

Rok Blagus; Lara Lusa

BackgroundPAM, a nearest shrunken centroid method (NSC), is a popular classification method for high-dimensional data. ALP and AHP are NSC algorithms that were proposed to improve upon PAM. The NSC methods base their classification rules on shrunken centroids; in practice the amount of shrinkage is estimated minimizing the overall cross-validated (CV) error rate.ResultsWe show that when data are class-imbalanced the three NSC classifiers are biased towards the majority class. The bias is larger when the number of variables or class-imbalance is larger and/or the differences between classes are smaller. To diminish the class-imbalance problem of the NSC classifiers we propose to estimate the amount of shrinkage by maximizing the CV geometric mean of the class-specific predictive accuracies (g-means).ConclusionsThe results obtained on simulated and real high-dimensional class-imbalanced data show that our approach outperforms the currently used strategy based on the minimization of the overall error rate when NSC classifiers are biased towards the majority class. The number of variables included in the NSC classifiers when using our approach is much smaller than with the original approach. This result is supported by experiments on simulated and real high-dimensional class-imbalanced data.


BMC Bioinformatics | 2015

Joint use of over- and under-sampling techniques and cross-validation for the development and assessment of prediction models

Rok Blagus; Lara Lusa

BackgroundPrediction models are used in clinical research to develop rules that can be used to accurately predict the outcome of the patients based on some of their characteristics. They represent a valuable tool in the decision making process of clinicians and health policy makers, as they enable them to estimate the probability that patients have or will develop a disease, will respond to a treatment, or that their disease will recur. The interest devoted to prediction models in the biomedical community has been growing in the last few years. Often the data used to develop the prediction models are class-imbalanced as only few patients experience the event (and therefore belong to minority class).ResultsPrediction models developed using class-imbalanced data tend to achieve sub-optimal predictive accuracy in the minority class. This problem can be diminished by using sampling techniques aimed at balancing the class distribution. These techniques include under- and oversampling, where a fraction of the majority class samples are retained in the analysis or new samples from the minority class are generated. The correct assessment of how the prediction model is likely to perform on independent data is of crucial importance; in the absence of an independent data set, cross-validation is normally used. While the importance of correct cross-validation is well documented in the biomedical literature, the challenges posed by the joint use of sampling techniques and cross-validation have not been addressed.ConclusionsWe show that care must be taken to ensure that cross-validation is performed correctly on sampled data, and that the risk of overestimating the predictive accuracy is greater when oversampling techniques are used. Examples based on the re-analysis of real datasets and simulation studies are provided. We identify some results from the biomedical literature where the incorrect cross-validation was performed, where we expect that the performance of oversampling techniques was heavily overestimated.


Scientometrics | 2012

Effects of international collaboration and status of journal on impact of papers

Stojan Pečlin; Primož Južnič; Rok Blagus; Mojca Čižek Sajko; Janez Stare

This study examines the effect of international collaboration of Slovenian authors and the status of journals where papers are published (as determined by their impact factors) on the impact of papers as measured by the number of citations papers receive. Research programme groups working in Slovenia in the 2004–2008 period in the fields of physics, chemistry, biology, biotechnology, and medical science were used for analyses. The results of the analyses show that the effects of the two factors differ among the fields. We discuss possible reasons for this, including the possibility that differences are the result of Slovenia’s science policy.

Collaboration


Dive into the Rok Blagus's collaboration.

Top Co-Authors

Avatar

Lara Lusa

University of Ljubljana

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gaj Vidmar

University of Ljubljana

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Aleš Jerin

University of Ljubljana

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge