Selcuk Korkmaz
Hacettepe University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Selcuk Korkmaz.
Computer Methods and Programs in Biomedicine | 2014
Selcuk Korkmaz; Gokmen Zararsiz; Dincer Goksuluk
In conjunction with the advance in computer technology, virtual screening of small molecules has been started to use in drug discovery. Since there are thousands of compounds in early-phase of drug discovery, a fast classification method, which can distinguish between active and inactive molecules, can be used for screening large compound collections. In this study, we used Support Vector Machines (SVM) for this type of classification task. SVM is a powerful classification tool that is becoming increasingly popular in various machine-learning applications. The data sets consist of 631 compounds for training set and 216 compounds for a separate test set. In data pre-processing step, the Pearsons correlation coefficient used as a filter to eliminate redundant features. After application of the correlation filter, a single SVM has been applied to this reduced data set. Moreover, we have investigated the performance of SVM with different feature selection strategies, including SVM-Recursive Feature Elimination, Wrapper Method and Subset Selection. All feature selection methods generally represent better performance than a single SVM while Subset Selection outperforms other feature selection methods. We have tested SVM as a classification tool in a real-life drug discovery problem and our results revealed that it could be a useful method for classification task in early-phase of drug discovery.
Clinical Ophthalmology | 2014
Zeynep Alkin; Irfan Perente; Abdullah Ozkaya; Dilek Alp; Alper Agca; Ebru Demet Aygit; Selcuk Korkmaz; Ahmet Taylan Yazici; Ahmet Demirok
Purpose To compare the efficacy of low-fluence photodynamic therapy (PDT) and PDT with half-dose verteporfin in chronic central serous chorioretinopathy (CSC). Patients and methods The medical records of 64 eyes from 60 patients with chronic CSC were retrospectively reviewed; 36 eyes received low-fluence PDT (25 J/cm2) and 28 eyes received half-dose verteporfin PDT (3 mg/m2). The primary outcome measure was the proportion of eyes with complete resolution of subretinal fluid. Secondary outcome measures were the changes in best corrected visual acuity (BCVA) and central foveal thickness, and the proportion of eyes that showed an increase of ≥5 letters in BCVA at the last visit. Results The mean follow-up period was 12.5±4.3 months and 13.1±4 months in the low-fluence group and half-dose group, respectively (P=0.568). Thirty-three eyes (91.6%) in the low-fluence group and 26 eyes (92.8%) in the half-dose verteporfin group showed complete resolution of subretinal fluid (P=0.703). BCVA increased by a mean of 7.4 letters and 4.8 letters in the low-fluence group and half-dose group, respectively (P=0.336). Seventeen eyes (52.8%) in the low-fluence group and 14 eyes (50%) in the half-dose group experienced a gain of ≥5 letters in BCVA (P=0.825). In the low-fluence and half-dose verteporfin group, the mean baseline central foveal thickness was 351±90 μm and 341±96 μm, and significantly decreased to 188±61 μm and 181±47 μm, respectively (P<0.01). Conclusion Both treatments resulted in complete subretinal fluid resolution in most of the eyes, with significantly better visual acuity outcomes compared to baseline at the last visit.
PLOS ONE | 2015
Selcuk Korkmaz; Gokmen Zararsiz; Dincer Goksuluk
Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/.
bioRxiv | 2014
Gokmen Zararsiz; Dincer Goksuluk; Selcuk Korkmaz; Vahap Eldem; İzzet Paruğ Duru; Ahmet Öztürk; Turgay Unver
Background RNA sequencing (RNA-Seq) is a powerful technique for transcriptome profiling of the organisms that uses the capabilities of next-generation sequencing (NGS) technologies. Recent advances in NGS let to measure the expression levels of tens to thousands of transcripts simultaneously. Using such information, developing expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of disease. Here, we present the bagging support vector machines (bagSVM), a machine learning approach and bagged ensembles of support vector machines (SVM), for classification of RNA-Seq data. The bagSVM basically uses bootstrap technique and trains each single SVM separately; next it combines the results of each SVM model using majority-voting technique. Results We demonstrate the performance of the bagSVM on simulated and real datasets. Simulated datasets are generated from negative binomial distribution under different scenarios and real datasets are obtained from publicly available resources. A deseq normalization and variance stabilizing transformation (vst) were applied to all datasets. We compared the results with several classifiers including Poisson linear discriminant analysis (PLDA), single SVM, classification and regression trees (CART), and random forests (RF). In slightly overdispersed data, all methods, except CART algorithm, performed well. Performance of PLDA seemed to be best and RF as second best for very slightly and substantially overdispersed datasets. While data become more spread, bagSVM turned out to be the best classifier. In overall results, bagSVM and PLDA had the highest accuracies. Conclusions According to our results, bagSVM algorithm after vst transformation can be a good choice of classifier for RNA-Seq datasets mostly for overdispersed ones. Thus, we recommend researchers to use bagSVM algorithm for the purpose of classification of RNA-Seq data. PLDA algorithm should be a method of choice for slight and moderately overdispersed datasets. An R/BIOCONDUCTOR package MLSeq with a vignette is freely available at http://www.bioconductor.org/packages/2.14/bioc/html/MLSeq.html
PLOS ONE | 2017
Gokmen Zararsiz; Dincer Goksuluk; Selcuk Korkmaz; Vahap Eldem; Gozde Erturk Zararsiz; İzzet Paruğ Duru; Ahmet Öztürk
RNA sequencing (RNA-Seq) is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of gene-expression data are either based on a continuous scale (eg. microarray data) or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data closer to microarrays and apply microarray-based classifiers. In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM), classification and regression trees (CART), and random forests (RF). We also examined the effect of several parameters such as overdispersion, sample size, number of genes, number of classes, differential-expression rate, and the transformation method on model performances. A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count-based classifier, the power transformed PLDA and, as a microarray-based classifier, vst or rlog transformed RF and SVM classifiers may be a good choice for classification. An R/BIOCONDUCTOR package, MLSeq, is freely available at https://www.bioconductor.org/packages/release/bioc/html/MLSeq.html.
bioRxiv | 2018
Sumeet Pal Singh; Sharan Janjuha; Samata Chaudhuri; Susanne Reinhardt; Sevina Dietz; Anne Eugster; Halil Bilgin; Selcuk Korkmaz; John Reid; Gokmen Zararsiz; Nikolay Ninov
Age-associated deterioration of cellular physiology leads to pathological conditions. The ability to detect premature aging could provide a window for preventive therapies against age-related diseases. However, the techniques for determining cellular age are limited, as they rely on a limited set of histological markers and lack predictive power. Here, we implement GERAS (GEnetic Reference for Age of Single-cell), a machine learning based framework capable of assigning individual cells to chronological stages based on their trans criptomes. GERAS displays greater than 90% accuracy in classifying the chronological stage of zebrafish and human pancreatic cells. The framework demonstrates robustness against biological and technical noise, as evaluated by its performance on independent samplings of single-cells. Additionally, GERAS determines the impact of differences in calorie intake and BMI on the aging of zebrafish and human pancreatic cells, respectively. We further harness the predictive power of GERAS to identify genome-wide molecular factors that correlate with aging. We show that one of these factors, junb, is necessary to maintain the proliferative state of juvenile beta-cells. Our results showcase the applicability of a machine learning framework to classify the chronological stage of heterogeneous cell populations, while enabling to detect pro-aging factors and candidate genes associated with aging.
bioRxiv | 2017
Selcuk Korkmaz; Jose M. Duarte; Andreas Prlić; Dincer Goksuluk; Gokmen Zararsiz; Osman Saracbasi; Stephen K. Burley; Peter W. Rose
The Protein Data Bank (PDB) is the single worldwide archive of experimentally-determined three-dimensional (3D) structures of proteins and nucleic acids. As of January 2017, the PDB housed more than 125,000 structures and was growing by more than 11,000 structures annually. Since the 3D structure of a protein is vital to understand the mechanisms of biological processes, diseases, and drug design, correct oligomeric assembly information is of critical importance. For example, it makes a difference if the protein is normally a dimer and not a monomer or a trimer or a tetramer or a hexamer in nature. Unfortunately, the biologically relevant oligomeric form of a 3D structure is not directly obtainable by X-ray crystallography. Instead, this information may be provided by the PDB Depositor as metadata coming from additional experiments, be inferred by sequence-sequence comparisons with similar proteins of known oligomeric state, or predicted using software, such as PISA (Proteins, Interfaces, Structures and Assemblies) or EPPIC (Evolutionary Protein Protein Interface Classifier). Despite significant efforts by professional PDB Biocurators during data deposition, there remain a number of structures in the archive with incorrect quaternary structure descriptions (or annotations). Further investigation is, therefore, needed to evaluate the correctness of quaternary structure annotations. In this study, we aim to identify the most probable oligomeric states for proteins represented in the PDB. Our approach evaluated the performance of four independent prediction methods, including text mining of primary publications, inference from homologous protein structures, and two computational methods (PISA and EPPIC). Aggregating predictions to give consensus results outperformed all four of the independent prediction methods, yielding 86% correct, 9% incorrect, and 5% inconclusive predictions, when tested with a well-curated benchmark dataset. We have developed a freely-available web-based tool to make this approach accessible to researchers and PDB Biocurators (http://quatstruct.rcsb.org).
Journal of Clinical Research in Pediatric Endocrinology | 2017
Cengiz Bal; Ahmet Öztürk; Betül Çiçek; Ahmet Ozdemir; Gokmen Zararsiz; Demet Ünalan; Gozde Erturk Zararsiz; Selcuk Korkmaz; Dincer Goksuluk; Vahap Eldem; Sevda Ismailogullari; Emine Erdem; M. Mümtaz Mazıcıoğlu; Selim Kurtoglu
Objective: As in adults, hypertension is also an important risk factor for cardiovascular disease in children. We aimed to evaluate the effect of sleep duration on blood pressure in normal weight Turkish children aged between 11-17 years. Methods: This cross-sectional study was conducted in the primary and secondary schools of the two central and ten outlying districts of Kayseri, Turkey. Subjects were 2860 children and adolescents (1385 boys, 1475 girls). Systolic and diastolic blood pressures were measured according to the recommendations of the Fourth Report of the National High Blood Pressure Education Program Working Group on High Blood Pressure in Children and Adolescents. Sleep duration was classified as follows: ≤8 hours, 8.1-8.9 hours, 9.0-9.9 hours or ≥10 hours. Results: For short sleeper boys and girls (participants with a sleep duration ≤8 h) the prevalence of prehypertension and hypertension was 35.0% and 30.8%, respectively. In univariate binary logistic regression analyses (age-adjusted), each unit increment in sleep duration (hours) in boys and girls, decreased the prehypertension and hypertension risk by 0.89 [odds ratio (OR)] [confidance interval (CI); 0.82-0.98] and 0.88 (OR) (CI; 0.81-0.97), respectively (p<0.05). In multiple binary logistic regression analyses [age- and body mass index (BMI)-adjusted] the location of the school and sleep duration categories were shown to be the most important factors for prehypertension and hypertension in both genders, while household income was the most important factor, only in boys. Conclusions: A sleep duration ≤8 h is an independent risk factor for prehypertension and hypertension in Turkish children aged 11-17 years.
Computers in Biology and Medicine | 2017
Selcuk Korkmaz; Dincer Goksuluk; Gokmen Zararsiz; Sevilay Karahan
Survival analysis methods are often used in cancer studies. It has been shown that the combination of clinical data with genomics increases the predictive performance of survival analysis methods. But, this leads to a high-dimensional data problem. Fortunately, new methods have been developed in the last decade to overcome this problem. However, there is a strong need for easily accessible, user-friendly and interactive tool to perform survival analysis in the presence of genomics data. We developed an open-source and freely available web-based tool for survival analysis methods that can deal with high-dimensional data. This tool includes classical methods, such as Kaplan-Meier, Cox proportional hazards regression, and advanced methods, such as penalized Cox regression and Random Survival Forests. It also offers an optimal cutoff determination method based on maximizing several test statistics. The tool has a simple and interactive interface, and it can handle high dimensional data through feature selection and ensemble methods. To dichotomize gene expressions, geneSurv can identify optimal cutoff points. Users can upload their microarray, RNA-Seq, chip-Seq, proteomics, metabolomics or clinical data as a nxp dimensional data matrix, where n refers to samples and p refers to genes. This tool is available free at www.biosoft.hacettepe.edu.tr/geneSurv. All source code is available at https://github.com/selcukorkmaz/geneSurv under the GPL-3 license.
R Journal | 2014
Selcuk Korkmaz; Dincer Goksuluk; Gokmen Zararsiz