Selim Mimaroglu | Researchain

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Selim Mimaroglu is active.

Explore More

Publication

Featured researches published by Selim Mimaroglu.

Pattern Recognition | 2011

Combining multiple clusterings using similarity graph

Selim Mimaroglu; Ertunc Erdil

Multiple clusterings are produced for various needs and reasons in both distributed and local environments. Combining multiple clusterings into a final clustering which has better overall quality has gained importance recently. It is also expected that the final clustering is novel, robust, and scalable. In order to solve this challenging problem we introduce a new graph-based method. Our method uses the evidence accumulated in the previously obtained clusterings, and produces a very good quality final clustering. The number of clusters in the final clustering is obtained automatically; this is another important advantage of our technique. Experimental test results on real and synthetically generated data sets demonstrate the effectiveness of our new method.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2012

DICLENS: Divisive Clustering Ensemble with Automatic Cluster Number

Selim Mimaroglu; Emin Aksehirli

Clustering has a long and rich history in a variety of scientific fields. Finding natural groupings of a data set is a hard task as attested by hundreds of clustering algorithms in the literature. Each clustering technique makes some assumptions about the underlying data set. If the assumptions hold, good clusterings can be expected. It is hard, in some cases impossible, to satisfy all the assumptions. Therefore, it is beneficial to apply different clustering methods on the same data set, or the same method with varying input parameters or both. We propose a novel method, DICLENS, which combines a set of clusterings into a final clustering having better overall quality. Our method produces the final clustering automatically and does not take any input parameters, a feature missing in many existing algorithms. Extensive experimental studies on real, artificial, and gene expression data sets demonstrate that DICLENS produces very good quality clusterings in a short amount of time. DICLENS implementation runs on standard personal computers by being scalable, and by consuming very little memory and CPU.

Expert Systems With Applications | 2012

CLICOM: Cliques for combining multiple clusterings

Selim Mimaroglu; A. Murat Yagci

Clustering has a long and rich history in a variety of scientific fields. Finding natural groupings of a data set is a hard task as attested by hundreds of clustering algorithms in the literature. Each clustering technique makes some assumptions about the underlying data set. If the assumptions hold, good clusterings can be expected. It is hard, in some cases impossible, to satisfy all the assumptions. Therefore, it is beneficial to apply different clustering methods on the same data set, or the same method with varying input parameters or both. Then, the clusterings obtained can be combined into a final clustering having better overall quality. Combining multiple clusterings into a final clustering which has better overall quality has gained significant importance recently. Our contributions are a novel method for combining a collection of clusterings into a final clustering which is based on cliques, and a novel output-sensitive clique finding algorithm which works on large and dense graphs and produces output in a short amount of time. Extensive experimental studies on real and artificial data sets demonstrate the effectiveness of our contributions.

international conference on information systems | 2009

A binary method for fast computation of inter and intra cluster similarities for combining multiple clusterings

Selim Mimaroglu; A. Murat Yagci

In this paper, we introduce a novel binary method for fast computation of an objective function to measure inter and intra class similarities, which is used for combining multiple clusterings. Our method has the advantages of using less memory and CPU time. Moreover, compared with the conventional technique, we reduce the time complexity of the problem considerably. Experimental test results demonstrate the effectiveness of our new method.

Bioinformatics | 2010

Obtaining better quality final clustering by merging a collection of clusterings

Selim Mimaroglu; Ertunc Erdil

MOTIVATION Clustering methods including k-means, SOM, UPGMA, DAA, CLICK, GENECLUSTER, CAST, DHC, PMETIS and KMETIS have been widely used in biological studies for gene expression, protein localization, sequence recognition and more. All these clustering methods have some benefits and drawbacks. We propose a novel graph-based clustering software called COMUSA for combining the benefits of a collection of clusterings into a final clustering having better overall quality. RESULTS COMUSA implementation is compared with PMETIS, KMETIS and k-means. Experimental results on artificial, real and biological datasets demonstrate the effectiveness of our method. COMUSA produces very good quality clusters in a short amount of time. AVAILABILITY http://www.cs.umb.edu/∼smimarog/comusa CONTACT [email protected]

Pattern Recognition Letters | 2011

Improving DBSCAN's execution time by using a pruning technique on bit vectors

Selim Mimaroglu; Emin Aksehirli

Clustering is the process of assigning a set of physical or abstract objects into previously unknown groups. The goal of clustering is to group similar objects into the same clusters and dissimilar objects into different clusters. Similarities between objects are evaluated by using the attribute values of objects. There are many clustering algorithms in the literature; among them, DBSCAN is a well known density-based clustering algorithm. We improve DBSCANs execution time performance for binary data sets and Hamming distances. We achieve considerable speed gains by using a novel pruning technique, as well as bit vectors, and binary operations. Our novel method effectively discards distant neighbors of an object and computes only the distances between an object and its possible neighbors. By discarding distant neighbors, we avoid unnecessary distance computations and use less CPU time when compared with the conventional DBSCAN algorithm. However, the accuracy of our method is identical to that of the original DBSCAN. Experimental test results on real and synthetic data sets demonstrate that, by using our pruning technique, we obtain considerably faster execution time results compared to DBSCAN.

Engineering Applications of Artificial Intelligence | 2011

ASOD: Arbitrary shape object detection

Selim Mimaroglu; Ertunc Erdil

Arbitrary shape object detection, which is mostly related to computer vision and image processing, deals with detecting objects from an image. In this paper, we consider the problem of detecting arbitrary shape objects as a clustering application by decomposing images into representative data points, and then performing clustering on these points. Our method for arbitrary shape object detection is based on COMUSA which is an efficient algorithm for combining multiple clusterings. Extensive experimental evaluations on real and synthetically generated data sets demonstrate that our method is very accurate and efficient.

Engineering Applications of Artificial Intelligence | 2013

An efficient and scalable family of algorithms for combining clusterings

Selim Mimaroglu; Ertunc Erdil

Clustering is the process of grouping objects that are similar, where similarity between objects is usually measured by a distance metric. The groups formed by a clustering method are referred as clusters. Clustering is a widely used activity with multiple applications ranging from biology to economics. Each clustering technique has some advantages and disadvantages. Some clustering algorithms may even require input parameters which strongly affect the result. In most cases, it is not possible to choose the best distance metric, the best clustering method, and the best input argument values for an input data set. Therefore, multiple clusterings can be obtained by several distance metrics, several clustering methods, and several input argument values. And, multiple clusterings can be combined into a new and better quality final clustering. We propose a family of combining multiple clustering algorithms that are memory efficient, scalable, robust, and intuitive. Our new algorithms offer tremendous speed gain and low memory requirements by working at cluster level, while producing very good quality final clusters. Extensive experimental evaluations on some very challenging artificially generated and real data sets from a diverse set of domains establish the usefulness of our methods.

The Journal of Supercomputing | 2012

Approximative distance computation by random hashing

Selim Mimaroglu; A. Murat Yagci; Dan A. Simovici

We propose an approximate computation technique for inter-object distances of binary data sets. Our approach is based on locality sensitive hashing. We randomly select a number of projections of the data set and group objects into buckets based on the hash values of these projections. For each pair of objects, occurrences in the same bucket are counted and the exact Hamming distance is approximated based on the number of co-occurrences in all buckets. We parallelize the computation using mainly two schemes. The first assigns each random subspace to a processor for calculating the local co-occurrence matrix, where all the local co-occurrence matrices are combined into the final co-occurrence matrix. The second method provides the same distance approximation in longer runtimes by limiting the total message size in a parallel computing environment, which is especially useful for very large data sets generating immense message traffic. Our methods produce very accurate results, scale up well with the number of objects, and tolerate processor failures. Experimental evaluations on supercomputers and workstations with several processors demonstrate the usefulness of our methods.

DMIN | 2008

Approximate Computation of Object Distances by Locality-Sensitive Hashing

Selim Mimaroglu; Dan A. Simovici

Explore More

Collaboration

Dive into the Selim Mimaroglu's collaboration.

Top Co-Authors

Dan A. Simovici

University of Massachusetts Boston

View shared research outputs

Top Co-Authors

Ertunc Erdil

Sabancı University

View shared research outputs

Top Co-Authors

A. Murat Yagci

Bahçeşehir University

View shared research outputs

Top Co-Authors

Emin Aksehirli

Bahçeşehir University

View shared research outputs

Explore More

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot

Dive into the research topics where Selim Mimaroglu is active.

Publication

Featured researches published by Selim Mimaroglu.

Combining multiple clusterings using similarity graph

DICLENS: Divisive Clustering Ensemble with Automatic Cluster Number

CLICOM: Cliques for combining multiple clusterings

A binary method for fast computation of inter and intra cluster similarities for combining multiple clusterings

Obtaining better quality final clustering by merging a collection of clusterings

Improving DBSCAN's execution time by using a pruning technique on bit vectors

ASOD: Arbitrary shape object detection

An efficient and scalable family of algorithms for combining clusterings

Approximative distance computation by random hashing

Approximate Computation of Object Distances by Locality-Sensitive Hashing

Collaboration

Dive into the Selim Mimaroglu's collaboration.