Harun Pirim
King Fahd University of Petroleum and Minerals
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Harun Pirim.
Computers & Operations Research | 2012
Harun Pirim; Burak Eksioglu; Andy D. Perkins; Cetin Yuceer
High throughput biological data need to be processed, analyzed, and interpreted to address problems in life sciences. Bioinformatics, computational biology, and systems biology deal with biological problems using computational methods. Clustering is one of the methods used to gain insight into biological processes, particularly at the genomics level. Clearly, clustering can be used in many areas of biological data analysis. However, this paper presents a review of the current clustering algorithms designed especially for analyzing gene expression data. It is also intended to introduce one of the main problems in bioinformatics - clustering gene expression data - to the operations research community.
Computers in Biology and Medicine | 2015
Harun Pirim; Burak Eksioglu; Andy D. Perkins
To address important challenges in bioinformatics, high throughput data technologies are needed to interpret biological data efficiently and reliably. Clustering is widely used as a first step to interpreting high dimensional biological data, such as the gene expression data measured by microarrays. A good clustering algorithm should be efficient, reliable, and effective, as demonstrated by its capability of determining biologically relevant clusters. This paper proposes a new minimum spanning tree based heuristic B-MST, that is guided by an innovative objective function: the tightness and separation index (TSI). The TSI presented here obtains biologically meaningful clusters, making use of co-expression network topology, and this paper develops a local search procedure to minimize the TSI value. The proposed B-MST is tested by comparing results to: (1) adjusted rand index (ARI), for microarray data sets with known object classes, and (2) gene ontology (GO) annotations for data sets without documented object classes.
Archive | 2008
Harun Pirim; Engin Bayraktar; Burak Eksioglu
Problems encountered in fields like scheduling, assignment, vehicle routing are mostly NPhard. These problems need efficient solution procedures. If confronted with an NP-hard problem, one may have three ways to go: one chooses to apply an enumerative method that yields an optimum solution, or apply an approximation algorithm that runs in polynomial time, or one resorts to some type of heuristic technique without any a priori guarantee for quality of solution and time of computing (Aarts & Lenstra, 2003). Heuristics fall under the general heading of local search approaches. Hence, local search techniques are widely used to find “close-to-optimum” solutions to these problems in a “reasonable” amount of time. Tabu search (TS) is one of the most efficient heuristic techniques in the sense that it finds quality solutions in relatively short running time. This chapter will provide a basic description of TS giving insights for novice readers as well as introduce application areas and provide comparisons of TS to other meta-heuristic procedures for the readers with more experience on local search procedures. The chapter will be organized as follows: The second section is going to introduce the basic terminology. For example, definitions for global optimization, local search, heuristics, and meta-heuristics will be provided. The section will also provide brief descriptions of TS as well as the following meta-heuristics to which TS will be compared: simulated annealing (SA), genetic algorithms (GA), ant colony optimization (ACO), greedy randomized adaptive search procedure (GRASP), and particle swarm optimization (PSO). Second section is intended to give the readers a good overall view of the “local search” area and let them know that TS will be compared to several other meta-heuristic procedures. In the third section, basic steps of TS, SA, GA, ACO, GRASP and PSO will be described. As the mechanisms of these procedures are explained, differences and similarities between TS and each of the other procedures will be pointed out. Section three will familiarize the readers with the various meta-heuristic procedures that will be discussed throughout the chapter. The fourth section will be dedicated to identifying the different problems for which TS was used to generate solutions. For example; TS has been used to solve scheduling problems, routing problems, and assignment problems. We will try to generate a comprehensive list of the problems to which TS has been applied. This section will provide the reader with an understanding of how TS has been used. In the fifth section, efficiency and effectiveness of TS will be compared to other metaheuristic procedures. Reasons why TS is more efficient and/or effective than some of the O pe n A cc es s D at ab as e w w w .ite ch on lin e. co m
international symposium on neural networks | 2010
Harun Pirim; Dilip Gautam; Tanmay Bhowmik; Andy D. Perkins; Burak Eksioglu
Biological networks, social networks, and the World Wide Web are some examples of real world networks exhibiting community structure. We present a concise review of community structure finding (CSF) algorithms and applications. We apply a CSF algorithm and various other algorithms on three different microarray data sets. We calculate modularity and C-rand indices as an indication of the quality of each clustering of the three data sets. We compare the performance of the CSF algorithm with the performance of three other algorithms: hierarchical clustering (HC) algorithm, K-means, dynamic tree cut (DTC) algorithm and Naive Bayes Clustering (NBC) using both C-rand and modularity values. We report that the CSF algorithm detects clusters resulting in high modularity; however the CSF does not result in clusters with high C-rand values compared to the other methods.
Archive | 2018
Harun Pirim
Large biological data sets require powerful tools such as co-expression network construction for detailed analysis. Analyzing the gene co-expression data of a species using a clustering method is the crucial step in order to mine the relevant information to identify the key genes or the groups (modules) of key genes. In other words, clustering the expression data helps identify the genes co-expressed significantly in the species of interest. Similarly expressed genes may have a common function; they may be residing in the same pathway, regulatory and signaling mechanisms, while their products form complexes. Clusters of highly interacting genes can be identified by construction and analysis of co-expression networks. Furthermore, each cluster may be summarized using eigengene or a hub gene. Network analysis can relate clusters to each other or to external experiment traits. The network may also be employed in the calculation of cluster membership quality measures. By the application of graph mining algorithms, tight clusters of co-expressed genes might be discovered leading to finding out new gene functions, revealing biomarkers and disease-related genes. The chapter reviews the state-of-the-art gene co-expression network construction studies and discusses the recent applications while explaining the network concepts related to the gene co-expression network analysis.
Journal of Optimization Theory and Applications | 2018
Harun Pirim; Burak Eksioglu; Fred Glover
Integer programming models for clustering have applications in diverse fields addressing many problems such as market segmentation and location of facilities. Integer programming models are flexible in expressing objectives subject to some special constraints of the clustering problem. They are also important for guiding clustering algorithms that are capable of handling high-dimensional data. Here, we present a novel mixed integer linear programming model especially for clustering relational networks, which have important applications in social sciences and bioinformatics. Our model is applied to several social network data sets to demonstrate its ability to detect natural network structures.
Comparative and Functional Genomics | 2018
Ali Nabi Duman; Harun Pirim
Persistent homology, a topological data analysis (TDA) method, is applied to microarray data sets. Although there are a few papers referring to TDA methods in microarray analysis, the usage of persistent homology in the comparison of several weighted gene coexpression networks (WGCN) was not employed before to the very best of our knowledge. We calculate the persistent homology of weighted networks constructed from 38 Arabidopsis microarray data sets to test the relevance and the success of this approach in distinguishing the stress factors. We quantify multiscale topological features of each network using persistent homology and apply a hierarchical clustering algorithm to the distance matrix whose entries are pairwise bottleneck distance between the networks. The immunoresponses to different stress factors are distinguishable by our method. The networks of similar immunoresponses are found to be close with respect to bottleneck distance indicating the similar topological features of WGCNs. This computationally efficient technique analyzing networks provides a quick test for advanced studies.
Archive | 2014
Harun Pirim; Umar M. Al-Turki; B.S. Yilbas
Transportation and facility location decisions are crucial in strategic supply chain design. Optimization models guide location decisions giving the optimal site selection under certain assumptions and constraints. It is an art to decide which model to use and how to modify the results based on the needs of a company. This chapter presents some of the important optimization models in supply chain. Mathematical formulations and solution procedures are also given. The models can be expanded for multi-echelon supply chains and/or include multiple products.
Archive | 2014
Harun Pirim; Umar M. Al-Turki; B.S. Yilbas
This chapter introduces the scheduling models in supply chains. Models of scheduling within production units are discussed for different shop structures and objectives. Such models and solution methods are used as a base for further development across production units with objectives of increasing the synergy resulting from coordinated or integrated scheduling. The chapter briefly introduces some of the basic models in scheduling theory that mostly related to supply chain models followed by some of the basic models in supply chain scheduling.
international conference on social computing | 2013
Harun Pirim
Determining the number of clusters is required for most of the clustering algorithms. The number of clusters in a gene co-expression network is not known a prior. In this study, maximum independent set concept from graph theory is applied for a gene expression data set. The results indicate that employing independent set approach to approximate the number of clusters is promising.