Hakan Ferhatosmanoglu

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hakan Ferhatosmanoglu is active.

Explore More

Publication

Featured researches published by Hakan Ferhatosmanoglu.

international conference on peer-to-peer computing | 2003

Peer-to-peer spatial queries in sensor networks

Murat Demirbas; Hakan Ferhatosmanoglu

Sensor networks, that consist of potentially several thousands of nodes each with sensing (heat, sound, light, magnetism, etc.) and wireless communication capabilities, provide great opportunities for monitoring spatial information about a region of interest. Although spatial query execution has been studied extensively in the context of database systems (e.g., indexing technologies), these solutions are not directly applicable in the context of sensor networks due to the decentralized nature of the sensor networks and the limited computational power and energy scarcity of individual sensor nodes. We present a peer-to-peer indexing structure, namely peer-tree, in order to address the problem of energy- and time-efficient execution of spatial queries (such as nearest-neighbor queries) in sensor networks. Loosely speaking, our peer-tree structure can be interpreted as a peer-to-peer version of the centralized R-tree index structure. Using the peer-tree as a building block, we present a peer-to-peer query processing model where a query can be posed in any node of the network without the need of a central server. For achieving minimal energy consumption and minimal response time, our query processing model ensures that only the relevant nodes for the correct execution of a query are involved in the query execution.

conference on information and knowledge management | 2003

High dimensional reverse nearest neighbor queries

Amit Singh; Hakan Ferhatosmanoglu; Ali Şaman Tosun

Reverse Nearest Neighbor (RNN) queries are of particular interest in a wide range of applications such as decision support systems, profile based marketing, data streaming, document databases, and bioinformatics. The earlier approaches to solve this problem mostly deal with two dimensional data. However most of the above applications inherently involve high dimensions and high dimensional RNN problem is still unexplored. In this paper, we propose an approximate solution to answer RNN queries in high dimensions. Our approach is based on the strong correlation in practice between k-NN and RNN. It works in two phases. In the first phase the k-NN of a query point is found and in the next phase they are further analyzed using a novel type of query Boolean Range Query (BRQ). Experimental results show that BRQ is much more efficient than both NN and range queries, and can be effectively used to answer RNN queries. Performance is further improved by running multiple BRQ simultaneously. The proposed approach can also be used to answer other variants of RNN queries such as RNN of order k, bichromatic RNN, and Matching Query which has many applications of its own. Our technique can efficiently answer NN, RNN, and its variants with approximately same number of I/O as running a NN query.

symposium on large spatial databases | 2001

Constrained Nearest Neighbor Queries

Hakan Ferhatosmanoglu; Ioanna Stanoi; Divyakant Agrawal; Amr El Abbadi

In this paper we introduce the notion of constrained nearest neighbor queries (CNN) and propose a series of methods to answer them. This class of queries can be thought of as nearest neighbor queries with range constraints. Although both nearest neighbor and range queries have been analyzed extensively in previous literature, the implications of constrained nearest neighbor queries have not been discussed. Due to their versatility, CNN queries are suitable to a wide range of applications from GIS systems to reverse nearest neighbor queries and multimedia applications. We develop methods for answering CNN queries with different properties and advantages. We prove the optimality (with respect to I/O cost) of one of the techniques proposed in this paper. The superiority of the proposed technique is shown by a performance analysis.

international conference on data engineering | 2001

Approximate nearest neighbor searching in multimedia databases

Hakan Ferhatosmanoglu; Ertem Tuncel; Divyakant Agrawal; A. El Abbadi

Develops a general framework for approximate nearest-neighbor queries. We categorize the current approaches for nearest-neighbor query processing based on either their ability to reduce the data set that needs to be examined, or their ability to reduce the representation size of each data object. We first propose modifications to well-known techniques to support the progressive processing of approximate nearest-neighbor queries. A user may therefore stop the retrieval process once enough information has been returned. We then develop a new technique based on clustering that merges the benefits of the two general classes of approaches. Our cluster-based approach allows a user to progressively explore the approximate results with increasing accuracy. We propose a new metric for evaluation of approximate nearest-neighbor searching techniques. Using both the proposed and the traditional metrics, we analyze and compare several techniques with a detailed performance evaluation. We demonstrate the feasibility and efficiency of approximate nearest-neighbor searching. We perform experiments on several real data sets and establish the superiority of the proposed cluster-based technique over the existing techniques for approximate nearest-neighbor searching.

web search and data mining | 2012

A large-scale sentiment analysis for Yahoo! answers

Onur Küçüktunç; Berkant Barla Cambazoglu; Ingmar Weber; Hakan Ferhatosmanoglu

Sentiment extraction from online web documents has recently been an active research topic due to its potential use in commercial applications. By sentiment analysis, we refer to the problem of assigning a quantitative positive/negative mood to a short bit of text. Most studies in this area are limited to the identification of sentiments and do not investigate the interplay between sentiments and other factors. In this work, we use a sentiment extraction tool to investigate the influence of factors such as gender, age, education level, the topic at hand, or even the time of the day on sentiments in the context of a large online question answering site. We start our analysis by looking at direct correlations, e.g., we observe more positive sentiments on weekends, very neutral ones in the Science & Mathematics topic, a trend for younger people to express stronger sentiments, or people in military bases to ask the most neutral questions. We then extend this basic analysis by investigating how properties of the (asker, answerer) pair affect the sentiment present in the answer. Among other things, we observe a dependence on the pairing of some inferred attributes estimated by a users ZIP code. We also show that the best answers differ in their sentiments from other answers, e.g., in the Business & Finance topic, best answers tend to have a more neutral sentiment than other answers. Finally, we report results for the task of predicting the attitude that a question will provoke in answers. We believe that understanding factors influencing the mood of users is not only interesting from a sociological point of view, but also has applications in advertising, recommendation, and search.

conference on information and knowledge management | 2000

Vector approximation based indexing for non-uniform high dimensional data sets

Hakan Ferhatosmanoglu; Ertem Tuncel; Divyakant Agrawal; Amr El Abbadi

With the proliferation of multimedia data, there is increasing need to support the indexing and searching of high dimensional data. Recently, a vector approximation based technique called VAle has been proposed for indexing high dimensional data. It has been shown that the VAle is an e ective technique compared to the current approaches based on space and data partitioning. The VAle gives good performance especially when the data set is uniformly distributed. Real data sets are not uniformly distributed, are often clustered, and the dimensions of the feature vectors in real data sets are usually correlated. More careful analysis for nonuniform or correlated data is needed for e ectively indexing high dimensional data. We propose a solution to these problems and propose the VAle, a new technique for indexing high dimensional data sets based on vector approximations. We conclude with an evaluation of nearest neighbor queries and show that the VAle technique results in signi cant improvements over the current VAle approach for several real data sets.

Bioinformatics | 2008

CellTrack: an open-source software for cell tracking and motility analysis

Ahmet Sacan; Hakan Ferhatosmanoglu; Huseyin Coskun

MOTIVATION Cell motility is a critical part of many important biological processes. Automated and sensitive cell tracking is essential to cell motility studies where the tracking results can be used for diagnostic or curative decisions and where mathematical models can be developed to deepen our understanding of the mechanisms underlying cell motility. RESULTS We have developed CellTrack: a self-contained, extensible, and cross-platform software package for cell tracking and motility analysis. Besides the general purpose image enhancement, object segmentation and tracking algorithms, we have implemented a novel edge-based method for sensitive tracking of the cell boundaries, and constructed an ensemble of methods that achieves refined tracking results even under large displacements or deformations of the cells. AVAILABILITY CellTrack is an Open Source project and is freely available at http://db.cse.ohio-state.edu/CellTrack.

acm multimedia | 2002

VQ-index: an index structure for similarity searching in multimedia databases

Ertem Tuncel; Hakan Ferhatosmanoglu; Kenneth Rose

In this paper, we introduce a novel indexing technique based on efficient compression of the feature space for approximate similarity searching in large multimedia databases. Its main novelty is that state-of-the-art tools from the discipline of data compression are adopted to optimize the complexity-performance tradeoff in large data sets. The design procedure optimizes the query access time by jointly accounting for both database distribution and query statistics. We achieve efficient compression by using appropriate vector quantization (VQ) techniques, namely, multi-stage VQ and split-VQ, which are especially suited for limited memory applications. We partition the data set using the accumulated query history, and each partition of data points is separately compressed using a vector quantizer tailored to its distribution. The employed VQ techniques inherently provide a spectrum of points to choose from on the time/accuracy plane. This property is especially crucial for large multimedia databases where I/O time is a bottleneck, because it offers the flexibility to trade time for better accuracy. Our experiments demonstrate speedups of 20 to 35 over a VA-file technique that has been adapted for approximate nearest neighbor searching.

international conference on data engineering | 2005

Compressing bitmap indices by data reorganization

Ali Pinar; Tao Tao; Hakan Ferhatosmanoglu

Many scientific applications generate massive volumes of data through observations or computer simulations, bringing up the need for effective indexing methods for efficient storage and retrieval of scientific data. Unlike conventional databases, scientific data is mostly read-only and its volume can reach to the order of petabytes, making a compact index structure vital. Bitmap indexing has been successfully applied to scientific databases by exploiting the fact that scientific data are enumerated or numerical. Bitmap indices can be compressed with valiants of run length encoding for a compact index structure. However even this may not be enough for the enormous data generated in some applications such as high energy physics. In this paper, we study how to reorganize bitmap tables for improved compression rates. Our algorithms are used just as a preprocessing step, thus there is no need to reuse the current indexing techniques and the query processing algorithms. We introduce the tuple reordering problem, which aims to reorganize database tuples for optimal compression rates. We propose Gray code ordering algorithm for this NP-Complete problem, which is an in-place algorithm, and runs in linear time in the order of the size of the database. We also discuss how the tuple reordering problem can be reduced to the traveling salesperson problem. Our experimental results on real data sets show that the compression ratio can be improved by a factor of 2 to 10.

bioinformatics and bioengineering | 2004

A time series analysis of microarray data

Selnur Erdal; Ozgur Ozturk; David Armbruster; Hakan Ferhatosmanoglu; William C. Ray

As the capture and analysis of single-time-point microarray expression data becomes routine, investigators are turning to time-series expression data to investigate complex gene regulation schemes and metabolic pathways. These investigations are facilitated by algorithms that can extract and cluster related behaviors from the full population of time-series behaviors observed. Although traditional clustering techniques have shown to be effective for certain types of expression analysis, they do not take the biological nature of the process into account, and therefore are clearly not optimized for this purpose. Moreover, the current approaches provide internal comparisons for the experiments utilized for clustering, but cross-comparisons between clustered results are qualitative and subjective. We present a combination of current and novel methods for the analysis of time series gene expression data. We focus on an actual study we have performed for Haemophilus influenzae which is a major cause of otitis media in children. We first perform a discretization of the gene expression data that takes both positive and negative correlations into consideration and then develop a clustering algorithm optimized for such data that allows elucidation and searching of time-series patterns. The resulting approach allows time-series data to be usefully compared across multiple experiments. We demonstrate the success of our algorithm by showing some of the genes that it finds to be co-regulated are not detected by current methods. As a result we are able to identify several signal pathways that initiate competence development, and to characterize the transcriptomes of wild-type and an adenylate cyclase mutant (cya) strains under both nutrient-limiting and nutrient-complete growth conditions.

Explore More