Ning An
Pennsylvania State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ning An.
international conference on data engineering | 2001
Ning An; Zhenyu Yang; Anand Sivasubramaniam
Spatial joins are important and time consuming operations in spatial database management systems. It is crucial to be able to accurately estimate the performance of these operations so that one can derive efficient query execution plans, and even develop/refine data structures to improve their performance. While estimation techniques for analyzing the performance of other operations, such as range queries, on spatial data has come under scrutiny, the problem of estimating selectivity for spatial joins has been little explored. The limited forays into this area have used parametric techniques, which are largely restrictive on the datasets that they can be used for since they tend to make simplifying assumptions about the nature of the datasets to be joined. Sampling and histogram based techniques, on the other hand, are much less restrictive. However, there has been no prior attempt at understanding the accuracy of sampling techniques, or developing histogram based techniques to estimate the selectivity of spatial joins. Apart from extensively evaluating the accuracy of sampling techniques for the very first time, this paper presents two novel histogram based solutions for spatial join estimation. Using a wide spectrum of both real and synthetic datasets, it is shown that one of our proposed schemes, called Geometric Histograms (GH), can accurately quantify the selectivity of spatial joins.
very large data bases | 2003
Ning An; Ravi Kanth; V. Kothuri; Siva Ravada
Spatial indexes play a major role in fast access to spatial and location data. Most commercial applications insert new data in bulk: in batches or arrays. In this paper, we propose a novel bulk insertion technique for R-Trees that is fast and does not compromise on the quality of the resulting index. We present our experiences with incorporating the proposed bulk insertion strategies into Oracle 10i. Experiments with real datasets show that our bulk insertion strategy improves performance of insert operations by 50%-90%.
very large data bases | 2002
Ning An; Sudhanva Gurumurthi; Anand Sivasubramaniam; Narayanan Vijaykrishnan; Mahmut T. Kandemir; Mary Jane Irwin
Abstract. The proliferation of mobile and pervasive computing devices has brought energy constraints into the limelight. Energy-conscious design is important at all levels of system architecture, and the software has a key role to play in conserving battery energy on these devices. With the increasing popularity of spatial database applications, and their anticipated deployment on mobile devices (such as road atlases and GPS-based applications), it is critical to examine the energy implications of spatial data storage and access methods for memory resident datasets. While there has been extensive prior research on spatial access methods on resource-rich environments, this is, perhaps, the first study to examine their suitability for resource-constrained environments. Using a detailed cycle-accurate energy estimation framework and four different datasets, this paper examines the pros and cons of three previously proposed spatial indexing alternatives from both the energy and performance angles. Specifically, the Quadtree, Packed R-tree, and Buddy-Tree structures are evaluated and compared with a brute-force approach that does not use an index. The results show that there are both performance and energy trade-offs between the indexing schemes for the different queries. The nature of the query also plays an important role in determining the energy-performance trade-offs. Further, technological trends and architectural enhancements are influencing factors on the relative behavior of the index structures. The work in the query has a bearing on how and where (on a mobile client or/and on a server) it should be performed for performance and energy savings. The results from this study will be beneficial for the design and implementation of embedded spatial databases, accelerating their deployment on numerous mobile devices.
international parallel and distributed processing symposium | 2003
Sudhanva Gurumurthi; Ning An; Anand Sivasubramaniam; Narayanan Vijaykrishnan; Mahmut T. Kandemir; Mary Jane Irwin
A seamless infrastructure for information access and data processing is the backbone for the successful development and deployment of the envisioned ubiquitous/mobile applications of the near future. The development of such an infrastructure is a challenge due to the resource-constrained nature of the mobile devices, in terms of the computational power, storage capacities, wireless connectivity and battery energy. With spatial data and location-aware applications widely recognized as being significant beneficiaries of mobile computing, this paper examines an important topic with respect to spatial query processing from the resource-constrained perspective. Specifically, when faced with the task of answering different location-based queries on spatial data from a mobile device, this paper investigates the benefits of partitioning the work between the resource-constrained mobile device (client) and a resource-rich server, that are connected by a wireless network, for energy and performance savings. This study considers two different scenarios, one where all the spatial data and associated index can fit in client memory and the other where client memory is insufficient. For each of these scenarios, several work partitioning schemes are identified. It is found that work partitioning is a good choice from both energy and performance perspectives in several situations, and these perspectives can have differential effects on the relative benefits of work-partitioning techniques.
IEEE Transactions on Knowledge and Data Engineering | 2003
Ning An; Ji Jin; Anand Sivasubramaniam
Analysis of range queries on spatial (multidimensional) data is both important and challenging. Most previous analysis attempts have made certain simplifying assumptions about the data sets and/or queries to keep the analysis tractable. As a result, they may not be universally applicable. This paper proposes a set of five analysis techniques to estimate the selectivity and number of index nodes accessed in serving a range query. The underlying philosophy behind these techniques is to maintain an auxiliary data structure, called a density file, whose creation is a one-time cost, which can be quickly consulted when the query is given. The schemes differ in what information is kept in the density file, how it is maintained, and how this information is looked up. It is shown that one of the proposed schemes, called cumulative density (CD), gives very accurate results (usually less than 5 percent error) using a diverse suite of point and rectangular data sets, that are uniform or skewed, and a wide range of query window parameters. The estimation takes a constant amount of time, which is typically lower than 1 percent of the time that it would take to execute the query, regardless of data set or query window parameters.
Cluster Computing | 1999
Ning An; R. Lu; Liujian Qian; Anand Sivasubramaniam; Tom Keefe
It is becoming increasingly important that a Geographical Information System delivers high performance to efficiently store, retrieve and process the voluminous data that it needs to handle. It is necessary to employ processing and storage parallelism for scalable long‐term solutions. With the demise of many custom‐built parallel machines, it is imperative that we use off‐the‐shelf technology to provide this parallelism. A closely‐coupled network of workstations is a viable alternative. This paper shows that a distributed index structure spanning the workstations can provide an efficient shared storage structure that can be used to get to the geographic information distributed amongst the individual disks and memories of the workstations. This goal can be attained without significantly compromising on the time taken to build this structure.
international conference on data engineering | 2004
Ravi Kothuri; Siva Ravada; Ning An
Much research has been devoted to scalable storage and retrieval techniques for domain databases such as spatial, text, XML and gene sequence data. Many efficient indexing techniques have been developed in this context. Given the improvement in the underlying technology, database applications are increasingly using domain data in transactional semantics. We examine the issue of when during the lifetime of a transaction is it better to incorporate updates in domain indexes. We present our experiences with R-tree indexes in Oracle. We examine two approaches for incorporating updates in spatial R-tree indexes: the first at update time, and the second at commit time. The first approach immediately incorporates changes in the index right away using system transactions and at commit time makes them visible to other transactions. The second approach, referred to as the deferred-incorporate approach, defers the updates in a secondary table and incorporates the changes in the index only at commit time. In experiments on real data sets, we compare the performance of the two approaches. For most transactions with reasonable number of update operations, we observe that the deferred approach outperforms the immediate-incorporate approach significantly for update operations and with appropriate optimizations achieves comparable query performance.
advances in geographic information systems | 1998
Ning An; Liujian Qian; Anand Sivasubramaniam; Tom Keefe
Several GIS applications are characterized by the vast amount of information that needs to be stored, retrieved and analyzed. The volume and complexity of the data will continue to grow in the future as is apparent from the expected geo-spatial petabyte data set for NASA’s EOSDIS project which will hold a collection of raster images arriving at a rate of 3-5 Mbytes/second for 10 years from satellites orbiting the earth. In addition to just being able to handle these large data sets, a GIS should also be able to perform queries on this data efficiently to meet certain real-time constraints. Queries to a GIS are not necessarily limited to spatial searches or selections. The response times for all queries should be maintained as low as possible. To summarize, there are are three main requirements for a GIS to be successful in handling the demands of current and emerging applications:
international world wide web conferences | 2009
Ning An; Raja Chatterjee; M. Horhammer; Siva Ravada
In this paper, we briefly describe the implementation of various Open Geospatial Consortium Web Service Interface Standards in Oracle Spatial 11g. We highlight how we utilize Oracles implementation of OASIS Web Services Security (WSS) to provide a robust security framework for these OGC Web Services. We also discuss our future direction in supporting OGC Web Service Interface Standards.
Archive | 2004
Ravikanth V. Kothuri; Siva Ravada; Ning An