Ablimit Aji
Emory University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ablimit Aji.
very large data bases | 2013
Ablimit Aji; Fusheng Wang; Hoang Vo; Rubao Lee; Qiaoling Liu; Xiaodong Zhang; Joel H. Saltz
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.
conference on information and knowledge management | 2010
Ablimit Aji; Yu Wang; Eugene Agichtein; Evgeniy Gabrilovich
The generative process underlies many information retrieval models, notably statistical language models. Yet these models only examine one (current) version of the document, effectively ignoring the actual document generation process. We posit that a considerable amount of information is encoded in the document authoring process, and this information is complementary to the word occurrence statistics upon which most modern retrieval models are based. We propose a new term weighting model, Revision History Analysis (RHA), which uses the revision history of a document (e.g., the edit history of a page in Wikipedia) to redefine term frequency - a key indicator of document topic/relevance for many retrieval models and text processing tasks. We then apply RHA to document ranking by extending two state-of-the-art text retrieval models, namely, BM25 and the generative statistical language model (LM). To the best of our knowledge, our paper is the first attempt to directly incorporate document authoring history into retrieval models. Empirical results show that RHA provides consistent improvements for state-of-the-art retrieval models, using standard retrieval tasks and benchmarks.
advances in geographic information systems | 2014
Hoang Vo; Ablimit Aji; Fusheng Wang
Scalable spatial query processing relies on effective spatial data partitioning for query parallelization, data pruning, and load balancing. These are often challenged by the intrinsic characteristics of spatial data, such as high skew in data distribution and high complexity of irregular multi-dimensional objects. In this demo, we present SATO, a spatial data partitioning framework that can quickly analyze and partition spatial data with an optimal spatial partitioning strategy for scalable query processing. SATO works in following steps: 1) Sample, which samples a small fraction of input data for analysis, 2) Analyze, which quickly analyzes sampled data to find an optimal partition strategy, 3) Tear, which provides data skew aware partitioning and supports MapReduce based scalable partitioning, and 4) Optimize, which collects succinct partition statistics for potential query optimization. SATO also provides multiple level partitioning, which can be used to significantly improve window based queries in cloud based spatial query processing systems. SATO comes with a visualization component that provides heat maps and histograms for qualitative evaluation. SATO has been implemented within the Hadoop-GIS, a high performance spatial data warehousing system over MapReduce. SATO is also released as an independent software package to support various scalable spatial query processing systems. Our experiments have demonstrated that SATO can generate much balanced partitioning that can significantly improve spatial query performance with MapReduce comparing to traditional spatial partitioning approaches.
advances in geographic information systems | 2013
Ablimit Aji; Xiling Sun; Hoang Vo; Qiaoling Liu; Rubao Lee; Xiaodong Zhang; Joel H. Saltz; Fusheng Wang
The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS -- a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for work-load specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing.
international conference on management of data | 2012
Ablimit Aji; Fusheng Wang
Analyzing and querying large volumes of spatially derived data from scientific experiments has posed major challenges in the past decade. For example, the systematic analysis of imaged pathology specimens result in rich spatially derived information with GIS characteristics at cellular and sub-cellular scales, with nearly a million derived markups and hundred million features per image. This provides critical information for evaluation of experimental results, support of biomedical studies and pathology image based diagnosis. However, the vast amount of spatially oriented morphological information poses major challenges for analytical medical imaging. The major challenges I attack include: i) How can we provide cost effective, scalable spatial query support for medical imaging GIS? ii) How can we provide fast response queries on analytical imaging data to support biomedical research and clinical diagnosis? and iii) How can we provide expressive queries to support spatial queries and spatial pattern discoveries for end users? In my thesis, I work towards developing a MapReduce based framework MIGIS to support expressive, cost effective and high performance spatial queries. The framework includes a real-time spatial query engine RESQUE consisting of a variety of optimized access methods, boundary and density aware spatial data partitioning, a declarative query language interface, a query translator which automates translation of the spatial queries into MapReduce programs and an execution engine which parallelizes and executes queries on Hadoop. Our preliminary experiments demonstrate that MIGIS is a cost effective architecture which achieves high performance spatial query execution. MIGIS is extensible and can be adapted to support similar complex spatial queries for large scale spatial data in other scientific domains.
international workshop on analytics for big geospatial data | 2014
Ablimit Aji; George Teodoro; Fusheng Wang
Spatial query processing involves complex multidimensional objects and compute intensive spatial operations, and therefore requires a high performance approach to meet the rapid data analytics requirements of modern spatial applications. Recently, MapReduce based spatial query systems have become a viable solution for many data intensive query tasks, and gained widespread adoption in both academia and industry. At the same time, GPUs have been successfully utilized in many applications that require high performance computation. Both approaches, GPU and MapReduce, have their own limitations and advantages, and have been separately utilized in spatial query processing tasks to boost application performance. However, it is unclear that how MapReduce and GPU, two vastly different parallelization techniques, can be fused together to effectively deal with the spatial big data challenges. In this paper, we explore such synergy of parallelization techniques for large scale spatial query processing. We extend Hadoop-GIS, a MapReduce based spatial query system, and provide GPU accelerated spatial query processing capability at the engine level. We evaluate the system on a real world dataset, and demonstrate that GPU accelerated system can gain considerable performance improvements. We also show how other factors such as partition granularity, task scheduling between CPU and GPU can impact the query performance.
international workshop on analytics for big geospatial data | 2014
Xin Chen; Hoang Vo; Ablimit Aji; Fusheng Wang
The growth of spatial big data has been explosive thanks to cost-effective and ubiquitous positioning technologies, and the generation of data from multiple sources in multi-forms. Such emerging spatial data has high potential to create new insights and values for our life through spatial analytics. However, spatial data analytics faces two major challenges. First, spatial data is both data-and compute-intensive due to the massive amounts of data and the multi-dimensional nature, which requires high performance spatial computing infrastructure and methods. Second, spatial big data sources are often isolated, for example, OpenStreetMap, census data and Twitter tweets are independent data sources. This leads to incompleteness of information and sometimes limited data accuracy, thus limited values from the data. Integrating spatial big data analytics by consolidating multiple data sources provides significant potential for data quality improvement in terms of completeness and accuracy, and much increased values derived from the data. In this paper, we present our vision of a high performance integrated spatial big data analytics framework. We provide a scalable spatial query based data integration engine with MapReduce, and demonstrate integrated spatial data analytics through a few use cases in our preliminary work. We then present our future plan on integrated spatial big data analytics for improving public health research and applications.
Sigspatial Special | 2015
Fusheng Wang; Ablimit Aji; Hoang Vo
Support of high performance queries on large volumes of spatial data has become increasingly important in many application domains, including geospatial problems in numerous disciplines, location based services, and emerging medical imaging applications. There are two major challenges for managing massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. Our goal is to develop a general framework to support high performance spatial queries and analytics for spatial big data on MapReduce and CPU-GPU hybrid platforms. In this paper, we introduce Hadoop-GIS -- a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through skew-aware spatial partitioning, on-demand indexing, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. To accelerate compute-intensive geometric operations, GPU based geometric computation algorithms are integrated into MapReduce pipelines. Our experiments have demonstrated that Hadoop-GIS is highly efficient and scalable, and outperforms parallel spatial DBMS for compute-intensive spatial queries.
social computing behavioral modeling and prediction | 2010
Ablimit Aji; Eugene Agichtein
Online knowledge sharing sites have recently exploded in popularity, and have began to play an important role in online information seeking. Unfortunately, many factors that influence the effectiveness of the information exchange in these communities are not well understood. This paper is an attempt to fill this gap by exploring the dynamics of information sharing in such sites - that is, identifying the factors that can explain how people respond to information requests. As a case study, we use Yahoo! Answers, one of the leading knowledge sharing portals on the web with millions of active participants. We follow the progress of thousands of questions, from posting until resolution. We examine contextual factors such as the topical area of the questions, as well as intrinsic factors of question wording, subjectivity, sentiment, and other characteristics that could influence how a community responds to an information request. Our findings could be useful for improving existing collaborative question answering systems, and for designing the next generation of knowledge sharing communities.
advances in geographic information systems | 2016
Yanhui Liang; Hoang Vo; Ablimit Aji; Jun Kong; Fusheng Wang
3D analytical pathology imaging examines high resolution 3D image volumes of human tissues to facilitate biomedical research and provide potential effective diagnostic assistance. Such approach - quantitative analysis of large- scale 3D pathology image volumes - generates tremendous amounts of spatially derived 3D micro-anatomic objects, such as 3D blood vessels and nuclei. Spatial exploration of such massive 3D spatial data requires effective and efficient querying methods. In this paper, we present a scalable and efficient 3D spatial query system for querying massive 3D spatial data based on MapReduce. The system provides an on-demand spatial querying engine which can be executed with as many instances as needed on MapReduce at runtime. Our system supports multiple types of spatial queries on MapReduce through 3D spatial data partitioning, customizable 3D spatial query engine, and implicit parallel spatial query execution. We utilize multi-level spatial indexing to achieve efficient query processing, including global partition indexing for data retrieval and on-demand local spatial indexing for spatial query processing. We evaluate our system with two representative queries: 3D spatial joins and 3D k-nearest neighbor query. Our experiments demonstrate that our system scales to large number of computing nodes, and efficiently handles data-intensive 3D spatial queries that are challenging in analytical pathology imaging.