Is this you? Create Your Porfile

Osman Abul

TOBB University of Economics and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Osman Abul is active.

Explore More

Publication

Featured researches published by Osman Abul.

international conference on data engineering | 2008

Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases

Osman Abul; Francesco Bonchi; Mirco Nanni

Preserving individual privacy when publishing data is a problem that is receiving increasing attention. According to the fc-anonymity principle, each release of data must be such that each individual is indistinguishable from at least k - 1 other individuals. In this paper we study the problem of anonymity preserving data publishing in moving objects databases. We propose a novel concept of k-anonymity based on co-localization that exploits the inherent uncertainty of the moving objects whereabouts. Due to sampling and positioning systems (e.g., GPS) imprecision, the trajectory of a moving object is no longer a polyline in a three-dimensional space, instead it is a cylindrical volume, where its radius delta represents the possible location imprecision: we know that the trajectory of the moving object is within this cylinder, but we do not know exactly where. If another object moves within the same cylinder they are indistinguishable from each other. This leads to the definition of (k,delta) -anonymity for moving objects databases. We first characterize the (k, delta)-anonymity problem and discuss techniques to solve it. Then we focus on the most promising technique by the point of view of information preservation, namely space translation. We develop a suitable measure of the information distortion introduced by space translation, and we prove that the problem of achieving (k,delta) -anonymity by space translation with minimum distortion is NP-hard. Faced with the hardness of our problem we propose a greedy algorithm based on clustering and enhanced with ad hoc pre-processing and outlier removal techniques. The resulting method, named NWA (Never Walk .Alone), is empirically evaluated in terms of data quality and efficiency. Data quality is assessed both by means of objective measures of information distortion, and by comparing the results of the same spatio-temporal range queries executed on the original database and on the (k, delta)-anonymized one. Experimental results show that for a wide range of values of delta and k, the relative error introduced is kept low, confirming that NWA produces high quality (k, delta)-anonymized data.

systems man and cybernetics | 2000

Multiagent reinforcement learning using function approximation

Osman Abul; Faruk Polat; Reda Alhajj

Learning in a partially observable and nonstationary environment is still one of the challenging problems in the area of multiagent (MA) learning. Reinforcement learning is a generic method that suits the needs of MA learning in many aspects. This paper presents two new multiagent based domain independent coordination mechanisms for reinforcement learning; multiple agents do not require explicit communication among themselves to learn coordinated behavior. The first coordination mechanism is the perceptual coordination mechanism, where other agents are included in state descriptions and coordination information is learned from state transitions. The second is the observing coordination mechanism, which also includes other agents in state descriptions and additionally the rewards of nearby agents are observed from the environment. The observed rewards and agents own reward are used to construct an optimal policy. This way, the latter mechanism tends to increase region-wide joint rewards. The selected experimented domain is adversarial food-collecting world (AFCW), which can be configured both as single and multiagent environments. Function approximation and generalization techniques are used because of the huge state space. Experimental results show the effectiveness of these mechanisms.

Information Systems | 2010

Anonymization of moving objects databases by clustering and perturbation

Osman Abul; Francesco Bonchi; Mirco Nanni

Preserving individual privacy when publishing data is a problem that is receiving increasing attention. Thanks to its simplicity the concept of k-anonymity, introduced by Samarati and Sweeney [1], established itself as one fundamental principle for privacy preserving data publishing. According to the k-anonymity principle, each release of data must be such that each individual is indistinguishable from at least k-1 other individuals. In this article we tackle the problem of anonymization of moving objects databases. We propose a novel concept of k-anonymity based on co-localization, that exploits the inherent uncertainty of the moving objects whereabouts. Due to sampling and imprecision of the positioning systems (e.g., GPS), the trajectory of a moving object is no longer a polyline in a three-dimensional space, instead it is a cylindrical volume, where its radius @d represents the possible location imprecision: we know that the trajectory of the moving object is within this cylinder, but we do not know exactly where. If another object moves within the same cylinder they are indistinguishable from each other. This leads to the definition of (k,@d)-anonymity for moving objects databases. We first characterize the (k,@d)-anonymity problem, then we recall NWA (NeverWalkAlone), a method that we introduced in [2] based on clustering and spatial perturbation. Starting from a discussion on the limits of NWA we develop a novel clustering method that, being based on EDR distance [3], has the important feature of being time-tolerant. As a consequence it perturbs trajectories both in space and time. The novel method, named W4M (WaitforMe), is empirically shown to produce higher quality anonymization than NWA, at the price of higher computational requirements. Therefore, in order to make W4M scalable to large datasets, we introduce two variants based on a novel (and computationally cheaper) time-tolerant distance function, and on chunking. All the variants of W4M are empirically evaluated in terms of data quality and efficiency, and thoroughly compared to their predecessor NWA. Data quality is assessed both by means of objective measures of information distortion, and by more usability oriented measure, i.e., by comparing the results of (i) spatio-temporal range queries and (ii) frequent pattern mining, executed on the original database and on the (k,@d)-anonymized one. Experimental results over both real-world and synthetic mobility data confirm that, for a wide range of values of @d and k, the relative distortion introduced by our anonymization methods is kept low. Moreover, the techniques introduced to make W4M scalable to large datasets, achieve their goal without giving up data quality in the anonymization process.

BMC Bioinformatics | 2007

Improved benchmarks for computational motif discovery

Geir Kjetil Sandve; Osman Abul; Vegard Walseng; Finn Drabløs

BackgroundAn important step in annotation of sequenced genomes is the identification of transcription factor binding sites. More than a hundred different computational methods have been proposed, and it is difficult to make an informed choice. Therefore, robust assessment of motif discovery methods becomes important, both for validation of existing tools and for identification of promising directions for future research.ResultsWe use a machine learning perspective to analyze collections of transcription factors with known binding sites. Algorithms are presented for finding position weight matrices (PWMs), IUPAC-type motifs and mismatch motifs with optimal discrimination of binding sites from remaining sequence. We show that for many data sets in a recently proposed benchmark suite for motif discovery, none of the common motif models can accurately discriminate the binding sites from remaining sequence. This may obscure the distinction between the potential performance of the motif discovery tool itself versus the intrinsic complexity of the problem we are trying to solve. Synthetic data sets may avoid this problem, but we show on some previously proposed benchmarks that there may be a strong bias towards a presupposed motif model. We also propose a new approach to benchmark data set construction. This approach is based on collections of binding site fragments that are ranked according to the optimal level of discrimination achieved with our algorithms. This allows us to select subsets with specific properties. We present one benchmark suite with data sets that allow good discrimination between positive and negative instances with the common motif models. These data sets are suitable for evaluating algorithms for motif discovery that rely on these models. We present another benchmark suite where PWM, IUPAC and mismatch motif models are not able to discriminate reliably between positive and negative instances. This suite could be used for evaluating more powerful motif models.ConclusionOur improved benchmark suites have been designed to differentiate between the performance of motif discovery algorithms and the power of motif models. We provide a web server where users can download our benchmark suites, submit predictions and visualize scores on the benchmarks.

BMC Bioinformatics | 2008

Assessment of composite motif discovery methods

Kjetil Klepper; Geir Kjetil Sandve; Osman Abul; Jostein Johansen; Finn Drabløs

BackgroundComputational discovery of regulatory elements is an important area of bioinformatics research and more than a hundred motif discovery methods have been published. Traditionally, most of these methods have addressed the problem of single motif discovery – discovering binding motifs for individual transcription factors. In higher organisms, however, transcription factors usually act in combination with nearby bound factors to induce specific regulatory behaviours. Hence, recent focus has shifted from single motifs to the discovery of sets of motifs bound by multiple cooperating transcription factors, so called composite motifs or cis-regulatory modules. Given the large number and diversity of methods available, independent assessment of methods becomes important. Although there have been several benchmark studies of single motif discovery, no similar studies have previously been conducted concerning composite motif discovery.ResultsWe have developed a benchmarking framework for composite motif discovery and used it to evaluate the performance of eight published module discovery tools. Benchmark datasets were constructed based on real genomic sequences containing experimentally verified regulatory modules, and the module discovery programs were asked to predict both the locations of these modules and to specify the single motifs involved. To aid the programs in their search, we provided position weight matrices corresponding to the binding motifs of the transcription factors involved. In addition, selections of decoy matrices were mixed with the genuine matrices on one dataset to test the response of programs to varying levels of noise.ConclusionAlthough some of the methods tested tended to score somewhat better than others overall, there were still large variations between individual datasets and no single method performed consistently better than the rest in all situations. The variation in performance on individual datasets also shows that the new benchmark datasets represents a suitable variety of challenges to most methods for module discovery.

mobile data management | 2012

Privacy-Preserving Sharing of Sensitive Semantic Locations under Road-Network Constraints

Emre Yigitoglu; Maria Luisa Damiani; Osman Abul; Claudio Silvestri

This paper presents a privacy-preserving framework for the protection of sensitive positions in real time trajectories. We assume a scenario in which the sensitivity of users positions is space-varying, and so depends on the spatial context, while the users movement is confined to road networks and places. Typical users are the non-anonymous members of a geo-social network who agree to share their exact position whenever such position does not fall within a sensitive place, e.g. a hospital. Suspending location sharing while the user is inside a sensitive place is not an appropriate solution because the users stopovers can be easily inferred from the users trace. In this paper we present an extension of the semantic location cloaking model [1] originally developed for the cloaking of non-correlated positions in an unconstrained space. We investigate different algorithms for the generation of cloaked regions over the graph representing the urban setting. We also integrate methods to prevent velocity-based linkage attacks. Finally we evaluate experimentally the algorithms using a real data set.

BMC Bioinformatics | 2011

Identifying elemental genomic track types and representing them uniformly

Sveinung Gundersen; Matúš Kalaš; Osman Abul; Arnoldo Frigessi; Eivind Hovig; Geir Kjetil Sandve

BackgroundWith the recent advances and availability of various high-throughput sequencing technologies, data on many molecular aspects, such as gene regulation, chromatin dynamics, and the three-dimensional organization of DNA, are rapidly being generated in an increasing number of laboratories. The variation in biological context, and the increasingly dispersed mode of data generation, imply a need for precise, interoperable and flexible representations of genomic features through formats that are easy to parse. A host of alternative formats are currently available and in use, complicating analysis and tool development. The issue of whether and how the multitude of formats reflects varying underlying characteristics of data has to our knowledge not previously been systematically treated.ResultsWe here identify intrinsic distinctions between genomic features, and argue that the distinctions imply that a certain variation in the representation of features as genomic tracks is warranted. Four core informational properties of tracks are discussed: gaps, lengths, values and interconnections. From this we delineate fifteen generic track types. Based on the track type distinctions, we characterize major existing representational formats and find that the track types are not adequately supported by any single format. We also find, in contrast to the XML formats, that none of the existing tabular formats are conveniently extendable to support all track types. We thus propose two unified formats for track data, an improved XML format, BioXSD 1.1, and a new tabular format, GTrack 1.0.ConclusionsThe defined track types are shown to capture relevant distinctions between genomic annotation tracks, resulting in varying representational needs and analysis possibilities. The proposed formats, GTrack 1.0 and BioXSD 1.1, cater to the identified track distinctions and emphasize preciseness, flexibility and parsing convenience.

international conference on data mining | 2007

Hiding Sensitive Trajectory Patterns

Osman Abul; Maurizio Atzori; Francesco Bonchi; Fosca Giannotti

Spatio-temporal traces left behind by moving individuals are increasingly available. On the one hand, mining this kind of data is expected to produce interesting behavioral knowledge enabling novel classes of mobility applications; but on the other hand, due to the peculiar nature of position data, mining it creates important privacy concerns. Thus, studying privacy preserving data mining methods for mov- ing object data is interesting and challenging. In this paper, we address the problem of hiding sensi- tive trajectory patterns from moving objects databases. The aim is to modify the database such that a given set of sen- sitive trajectory patterns can no longer be extracted, while the others are preserved as much as possible. We provide the formal problem statement and show that it is NP-hard; so we devise heuristics and a polynomial sanitization al- gorithm. We discuss a possible attack to our model, that exploits the knowledge of the underlying road network, and we enhance our model to protect from this kind of attacks. Experimental results show the effectiveness of our proposal.

systems, man and cybernetics | 2003

Cluster validity analysis using subsampling

Osman Abul; Anthony Chiu Wa Lo; Reda Alhajj; Faruk Polat; Ken Barker

Cluster validity investigates whether generated clusters are true clusters or due to chance. This is usually done based on subsampling stability analysis. Related to this problem is estimating true number of clusters in a given dataset. There are a number of methods described in the literature to handle both purposes. In this paper, we propose three methods for estimating confidence in the validity of clustering result. The first method validates clustering result by employing supervised classifiers. The dataset is divided into training and test sets and the accuracy of the classifier is evaluated on the test set. This method computes confidence in the generalization capability of clustering. The second method is based on the fact that if a clustering is valid then each of its subsets should be valid as well. The third method is similar to second method; it takes the dual approach, i.e., each cluster is expected to be stable and compact. Confidence is estimated by repeating the process a number of times on subsamples. Experimental results illustrate effectiveness of the proposed methods.

BMC Bioinformatics | 2008

Compo: composite motif discovery using discrete models

Geir Kjetil Sandve; Osman Abul; Finn Drabløs

BackgroundComputational discovery of motifs in biomolecular sequences is an established field, with applications both in the discovery of functional sites in proteins and regulatory sites in DNA. In recent years there has been increased attention towards the discovery of composite motifs, typically occurring in cis-regulatory regions of genes.ResultsThis paper describes Compo: a discrete approach to composite motif discovery that supports richer modeling of composite motifs and a more realistic background model compared to previous methods. Furthermore, multiple parameter and threshold settings are tested automatically, and the most interesting motifs across settings are selected. This avoids reliance on single hard thresholds, which has been a weakness of previous discrete methods. Comparison of motifs across parameter settings is made possible by the use of p-values as a general significance measure. Compo can either return an ordered list of motifs, ranked according to the general significance measure, or a Pareto front corresponding to a multi-objective evaluation on sensitivity, specificity and spatial clustering.ConclusionCompo performs very competitively compared to several existing methods on a collection of benchmark data sets. These benchmarks include a recently published, large benchmark suite where the use of support across sequences allows Compo to correctly identify binding sites even when the relevant PWMs are mixed with a large number of noise PWMs. Furthermore, the possibility of parameter-free running offers high usability, the support for multi-objective evaluation allows a rich view of potential regulators, and the discrete model allows flexibility in modeling and interpretation of motifs.

Explore More