Is this you? Create Your Porfile

Cyrus Shahabi

University of Southern California

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cyrus Shahabi is active.

Explore More

Publication

Featured researches published by Cyrus Shahabi.

international conference on management of data | 2008

Private queries in location based services: anonymizers are not necessary

Gabriel Ghinita; Panos Kalnis; Ali Khoshgozaran; Cyrus Shahabi; Kian-Lee Tan

Mobile devices equipped with positioning capabilities (e.g., GPS) can ask location-dependent queries to Location Based Services (LBS). To protect privacy, the user location must not be disclosed. Existing solutions utilize a trusted anonymizer between the users and the LBS. This approach has several drawbacks: (i) All users must trust the third party anonymizer, which is a single point of attack. (ii) A large number of cooperating, trustworthy users is needed. (iii) Privacy is guaranteed only for a single snapshot of user locations; users are not protected against correlation attacks (e.g., history of user movement). We propose a novel framework to support private location-dependent queries, based on the theoretical work on Private Information Retrieval (PIR). Our framework does not require a trusted third party, since privacy is achieved via cryptographic techniques. Compared to existing work, our approach achieves stronger privacy for snapshots of user locations; moreover, it is the first to provide provable privacy guarantees against correlation attacks. We use our framework to implement approximate and exact algorithms for nearest-neighbor search. We optimize query execution by employing data mining techniques, which identify redundant computations. Contrary to common belief, the experimental results suggest that PIR approaches incur reasonable overhead and are applicable in practice.

very large data bases | 2004

Voronoi-based K nearest neighbor search for spatial network databases

Mohammad R. Kolahdouzan; Cyrus Shahabi

A frequent type of query in spatial networks (e.g., road networks) is to find the K nearest neighbors (KNN) of a given query object. With these networks, the distances between objects depend on their network connectivity and it is computationally expensive to compute the distances (e.g., shortest paths) between objects. In this paper, we propose a novel approach to efficiently and accurately evaluate KNN queries in spatial network databases using first order Voronoi diagram. This approach is based on partitioning a large network to small Voronoi regions, and then pre-computing distances both within and across the regions. By localizing the precomputation within the regions, we save on both storage and computation and by performing across-the-network computation for only the border points of the neighboring regions, we avoid global pre-computation between every node-pair. Our empirical experiments with several real-world data sets show that our proposed solution outperforms approaches that are based on on-line distance computation by up to one order of magnitude, and provides a factor of four improvement in the selectivity of the filter step as compared to the index-based approaches.

Communications of The ACM | 2014

Big data and its technical challenges

H. V. Jagadish; Johannes Gehrke; Alexandros Labrinidis; Yannis Papakonstantinou; Jignesh M. Patel; Raghu Ramakrishnan; Cyrus Shahabi

Exploring the inherent technical challenges in realizing the potential of Big Data.

symposium on large spatial databases | 2007

Blind evaluation of nearest neighbor queries using space transformation to preserve location privacy

Ali Khoshgozaran; Cyrus Shahabi

In this paper we propose a fundamental approach to perform the class of Nearest Neighbor (NN) queries, the core class of queries used in many of the location-based services, without revealing the origin of the query in order to preserve the privacy of this information. The idea behind our approach is to utilize one-way transformations to map the space of all static and dynamic objects to another space and resolve the query blindly in the transformed space. However, in order to become a viable approach, the transformation used should be able to resolve NN queries in the transformed space accurately and more importantly prevent malicious use of transformed data by untrusted entities. Traditional encryption based techniques incur expensive O(n) computation cost (where n is the total number of points in space) and possibly logarithmic communication cost for resolving a KNN query. This is because such approaches treat points as vectors in space and do not exploit their spatial properties. In contrast, we use Hilbert curves as efficient one-way transformations and design algorithms to evaluate a KNN query in the Hilbert transformed space. Consequently, we reduce the complexity of computing a KNN query to O(K × 22N/n) and transferring the results to the client in O(K), respectively, where N, the Hilbert curve degree, is a small constant. Our results show that we very closely approximate the result set generated from performing KNN queries in the original space while enforcing our new location privacy metrics termed u-anonymity and a-anonymity, which are stronger and more generalized privacy measures than the commonly used K-anonymity and cloaked region size measures.

advances in geographic information systems | 2012

GeoCrowd: enabling query answering with spatial crowdsourcing

Leyla Kazemi; Cyrus Shahabi

With the ubiquity of mobile devices, spatial crowdsourcing is emerging as a new platform, enabling spatial tasks (i.e., tasks related to a location) assigned to and performed by human workers. In this paper, for the first time we introduce a taxonomy for spatial crowdsourcing. Subsequently, we focus on one class of this taxonomy, in which workers send their locations to a centralized server and thereafter the server assigns to every worker his nearby tasks with the objective of maximizing the overall number of assigned tasks. We formally define this maximum task assignment (or MTA) problem in spatial crowdsourcing, and identify its challenges. We propose alternative solutions to address these challenges by exploiting the spatial properties of the problem space. Finally, our experimental evaluations on both real-world and synthetic data verify the applicability of our proposed approaches and compare them by measuring both the number of assigned tasks and the travel cost of the workers.

Geoinformatica | 2003

A Road Network Embedding Technique for K-Nearest Neighbor Search in Moving Object Databases

Cyrus Shahabi; Mohammad R. Kolahdouzan; Mehdi Sharifzadeh

A very important class of queries in GIS applications is the class of K-nearest neighbor queries. Most of the current studies on the K-nearest neighbor queries utilize spatial index structures and hence are based on the Euclidean distances between the points. In real-world road networks, however, the shortest distance between two points depends on the actual path connecting the points and cannot be computed accurately using one of the Minkowski metrics. Thus, the Euclidean distance may not properly approximate the real distance. In this paper, we apply an embedding technique to transform a road network to a high dimensional space in order to utilize computationally simple Minkowski metrics for distance measurement. Subsequently, we extend our approach to dynamically transform new points into the embedding space. Finally, we propose an efficient technique that can find the actual shortest path between two points in the original road network using only the embedding space. Our empirical experiments indicate that the Chessboard distance metric (L∞) in the embedding space preserves the ordering of the distances between a point and its neighbors more precisely as compared to the Euclidean distance in the original road network.

acm international workshop on multimedia databases | 2004

A PCA-based similarity measure for multivariate time series

Kiyoung Yang; Cyrus Shahabi

Multivariate time series (MTS) datasets are common in various multimedia, medical and financial applications. We propose a similarity measure for MTS datasets, Eros Extended Frobenius norm), which is based on Principal Component Analysis (PCA). Eros applies PCA to MTS datasets represented as matrices to generate principal components and associated eigenvalues. These principal components and eigenvalues are then used to compare the similarity between MTS matrices. Though Eros in itself does not satisfy the triangle inequality, without which existing multidimensional indexing structures may not be utilized, the lower and upper bounds to satisfy the triangle inequality are obtained. In order to show the validity of Eros for similarity search on MTS datasets, we performed several experiments on three datasets (2 real-world and 1 synthetic). The results show the superiority of our approaches as compared to the traditional similarity measures for MTS datasets, such as Euclidean Distance (ED), Dynamic Time Warping (DTW), Weighted Sum SVD (WSSVD) and PCA similarity factor (S<sc>PCA</sc>) in precision/recall.

ACM Transactions on Sensor Networks | 2007

The Clustered AGgregation (CAG) technique leveraging spatial and temporal correlations in wireless sensor networks

Sunhee Yoon; Cyrus Shahabi

Sensed data in Wireless Sensor Networks (WSN) reflect the spatial and temporal correlations of physical attributes existing intrinsically in the environment. In this article, we present the Clustered AGgregation (CAG) algorithm that forms clusters of nodes sensing similar values within a given threshold (spatial correlation), and these clusters remain unchanged as long as the sensor values stay within a threshold over time (temporal correlation). With CAG, only one sensor reading per cluster is transmitted whereas with Tiny AGgregation (TAG) all the nodes in the network transmit the sensor readings. Thus, CAG provides energy efficient and approximate aggregation results with small and often negligible and bounded error. In this article we extend our initial work in CAG in five directions: First, we investigate the effectiveness of CAG that exploits the temporal as well as spatial correlations using both the measured and modeled data. Second, we design CAG for two modes of operation (interactive and streaming) to enable CAG to be used in different environments and for different purposes. Interactive mode provides mechanisms for one-shot queries, whereas the streaming mode provides those for continuous queries. Third, we propose a fixed range clustering method, which makes the performance of our system independent of the magnitude of the sensor readings and the network topology. Fourth, using mica2 motes, we perform a large-scale measurement of real environmental data (temperature and light, both indoor and outdoor) and the wireless radio reliability, which were used for both analytical modeling and simulation experiments. Fifth, we model the spatially correlated data using the properties of our real world measurements. Our experimental results show that when we compute the average of sensor readings in the network using the CAG interactive mode with the user-provided error threshold of, 20%, we can save 68.25% of transmissions over TAG with only 2.46% inaccuracy in the result. The streaming mode of CAG can save even more transmissions (up to 70.24% in our experiments) over TAG, when data shows high spatial and temporal correlations. We expect these results to hold in reality, because we used the mica2 radio profile and empirical datasets for our simulation study. CAG is the first system that leverages spatial and temporal correlations to improve energy efficiency of in-network aggregation. This study analytically and empirically validates CAGs effectiveness.

IEEE Transactions on Knowledge and Data Engineering | 2005

Feature subset selection and feature ranking for multivariate time series

Hyunjin Yoon; Kiyoung Yang; Cyrus Shahabi

Feature subset selection (FSS) is a known technique to preprocess the data before performing any data mining tasks, e.g., classification and clustering. FSS provides both cost-effective predictors and a better understanding of the underlying process that generated the data. We propose a family of novel unsupervised methods for feature subset selection from multivariate time series (MTS) based on common principal component analysis, termed CLeVer. Traditional FSS techniques, such as recursive feature elimination (RFE) and Fisher criterion (FC), have been applied to MTS data sets, e.g., brain computer interface (BCI) data sets. However, these techniques may lose the correlation information among features, while our proposed techniques utilize the properties of the principal component analysis to retain that information. In order to evaluate the effectiveness of our selected subset of features, we employ classification as the target data mining task. Our exhaustive experiments show that CLeVer outperforms RFE, FC, and random selection by up to a factor of two in terms of the classification accuracy, while taking up to 2 orders of magnitude less processing time than RFE and FC.

statistical and scientific database management | 2000

TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data

Cyrus Shahabi; Xiaoming Tian; Wugang Zhao

We introduce a novel wavelet based tree structure, termed TSA-tree, which improves the efficiency of multi-level trend and surprise queries on time sequence data. With the explosion of scientific observation data conceptualized as time sequences, we are facing the challenge of efficiently storing, retrieving and analyzing this data. Frequent queries on this data set are to find trends (e.g., global warming) or surprises (e.g., undersea volcano eruption) within the original time series. The challenge, however is that these trend and surprise queries are needed at different levels of abstractions. To support these multi-level trend and surprise queries, sometimes a huge subset of raw data needs to be retrieved and processed. To expedite this process, we utilize our TSA-tree. Each node of the TSA-tree contains pre-computed trends and surprises at different levels. A wavelet transform is used recursively to construct TSA nodes. As a result, each node of TSA tree is readily available for visualization of trends and surprises. In addition, the size of each node is significantly smaller than that of the original time series, resulting in faster I/O operations. However a limitation of TSA-tree is that its size is larger than the original time series. To address this shortcoming, first we prove that the storage space required to store the optimal subtree of TSA-tree (OTSA-tree) is no more than that required to store the original time series without losing any information. Next, we propose two alternative techniques to reduce the size of the OTSA-tree even further while maintaining an acceptable query precision as compared to querying the original time sequences. Utilizing real and synthetic time sequence databases, we compare our techniques with some well known algorithms.

Explore More