Thanaa M. Ghanem | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thanaa M. Ghanem is active.

Explore More

Publication

Featured researches published by Thanaa M. Ghanem.

ACM Transactions on Database Systems | 2010

Supporting views in data stream management systems

Thanaa M. Ghanem; Ahmed K. Elmagarmid; Per Åke Larson; Walid G. Aref

In relational database management systems, views supplement basic query constructs to cope with the demand for “higher-level” views of data. Moreover, in traditional query optimization, answering a query using a set of existing materialized views can yield a more efficient query execution plan. Due to their effectiveness, views are attractive to data stream management systems. In order to support views over streams, a data stream management system should employ a closed (or composable) continuous query language. A closed query language is a language in which query inputs and outputs are interpreted in the same way, hence allowing query composition. This article introduces the Synchronized SQL (or SyncSQL) query language that defines a data stream as a sequence of modify operations against a relation. SyncSQL enables query composition through the unified interpretation of query inputs and outputs. An important issue in continuous queries over data streams is the frequency by which the answer gets refreshed and the conditions that trigger the refresh. Coarser periodic refresh requirements are typically expressed as sliding windows. In this article, the sliding window approach is generalized by introducing the synchronization principle that empowers SyncSQL with a formal mechanism to express queries with arbitrary refresh conditions. After introducing the semantics and syntax, we lay the algebraic foundation for SyncSQL and propose a query-matching algorithm for deciding containment of SyncSQL expressions. Then, the article introduces the Nile-SyncSQL prototype to support SyncSQL queries. Nile-SyncSQL employs a pipelined incremental evaluation paradigm in which the query pipeline consists of a set of differential operators. A cost model is developed to estimate the cost of SyncSQL query execution pipelines and to choose the best execution plan from a set of different plans for the same query. An experimental study is conducted to evaluate the performance of Nile-SyncSQL. The experimental results illustrate the effectiveness of Nile-SyncSQL and the significant performance gains when views are enabled in data stream management systems.

IEEE Transactions on Knowledge and Data Engineering | 2007

Incremental Evaluation of Sliding-Window Queries over Data Streams

Thanaa M. Ghanem; Moustafa A. Hammad; Mohamed F. Mokbel; Walid G. Aref; Ahmed K. Elmagarmid

Two research efforts have been conducted to realize sliding-window queries in data stream management systems, namely, query revaluation and incremental evaluation. In the query reevaluation method, two consecutive windows are processed independently of each other. On the other hand, in the incremental evaluation method, the query answer for a window is obtained incrementally from the answer of the preceding window. In this paper, we focus on the incremental evaluation method. Two approaches have been adopted for the incremental evaluation of sliding-window queries, namely, the input-triggered approach and the negative tuples approach. In the input-triggered approach, only the newly inserted tuples flow in the query pipeline and tuple expiration is based on the timestamps of the newly inserted tuples. On the other hand, in the negative tuples approach, tuple expiration is separated from tuple insertion where a tuple flows in the pipeline for every inserted or expired tuple. The negative tuples approach avoids the unpredictable output delays that result from the input-triggered approach. However, negative tuples double the number of tuples through the query pipeline, thus reducing the pipeline bandwidth. Based on a detailed study of the incremental evaluation pipeline, we classify the incremental query operators into two classes according to whether an operator can avoid the processing of negative tuples or not. Based on this classification, we present several optimization techniques over the negative tuples approach that aim to reduce the overhead of processing negative tuples while avoiding the output delay of the query answer. A detailed experimental study, based on a prototype system implementation, shows the performance gains over the input-triggered approach of the negative tuples approach when accompanied with the proposed optimizations

international conference on management of data | 2006

Exploiting predicate-window semantics over data streams

Thanaa M. Ghanem; Walid G. Aref; Ahmed K. Elmagarmid

The continuous sliding-window query model is used widely in data stream management systems where the focus of a continuous query is limited to a set of the most recent tuples. In this paper, we show that an interesting and important class of queries over data streams cannot be answered using the sliding-window query model. Thus, we introduce a new model for continuous window queries, termed the predicate-window query model that limits the focus of a continuous query to the stream tuples that qualify a certain predicate. Predicate-window queries have some distinguishing characteristics, e.g., (1) The window predicate can be defined over any attribute in the stream tuple (ordered or unordered). (2) Stream tuples qualify and disqualify the window predicate in an out-of-order manner. In this paper, we discuss the applicability of the predicate-window query model. We will show how the existing sliding-window query models fail to answer some of the predicate-window queries. Finally, we discuss the challenges in supporting the predicate-window query model in data stream management systems.

advances in geographic information systems | 2014

Taghreed: a system for querying, analyzing, and visualizing geotagged microblogs

Amr Magdy; Louai Alarabi; Saif Al-Harthi; Mashaal Musleh; Thanaa M. Ghanem; Sohaib Ghani; Mohamed F. Mokbel

This paper presents Taghreed; a full-fledged system for efficient and scalable querying, analyzing, and visualizing geotagged microblogs, e.g., tweets. Taghreed supports arbitrary queries on a large number (Billions) of microblogs that go up to several months in the past. Taghreed consists of four main components: (f) Indexer, (2) query engine, (3) recovery manager, and (4) visualizer. Taghreed indexer efficiently digests incoming microblogs with high arrival rates in light memory-resident indexes. When the memory becomes full, a flushing policy manager transfers the memory contents to disk indexes which are managing Billions of microblogs for several months. On memory failure, the recovery manager restores the system status from replicated copies for the main-memory content. Taghreed query engine consists of two modules: a query optimizer and a query processor. The query optimizer generates an optimal query plan to be executed by the query processor through efficient retrieval techniques to provide low query response, i.e., order of milli-seconds. Taghreed visualizer allows end users to issue a wide variety of spatio-temporal queries. Then, it graphically presents the answers and allows interactive exploration through them. Taghreed is the first system that addresses all these challenges collectively for microblogs data. In the paper, each system component is described in detail.

international conference on data engineering | 2004

Bulk operations for space-partitioning trees

Thanaa M. Ghanem; Rahul Shah; Mohamed F. Mokbel; Walid G. Aref; Jeffrey Scott Vitter

The emergence of extensible index structures, e.g., GiST (generalized search tree) [J.M. Hellerstein et al. (1995)] and SP-GiST (space-partitioning generalized search tree) [W. G Aref et al., (2001)], calls for a set of extensible algorithms to support different operations (e.g., insertion, deletion, and search). Extensible bulk operations (e.g., bulk loading and bulk insertion) are of the same importance and need to be supported in these index engines. In this paper, we propose two extensible buffer-based algorithms for bulk operations in the class of space-partitioning trees; a class of hierarchical data structures that recursively decompose the space into disjoint partitions. The main idea of these algorithms is to build an in-memory tree of the target space-partitioning index. Then, data items are recursively partitioned into disk-based buffers using the in-memory tree. Although the second algorithm is designed for bulk insertion, it can be used in bulk loading as well. The proposed extensible algorithms are implemented inside SP-GiST; a framework for supporting the class of space-partitioning trees. Both algorithms have I/O bound O(NH/B), where N is the number of data items to be bulk loaded/inserted, B is the number of tree nodes that can fit in one disk page, H is the tree height in terms of pages after applying a clustering algorithm. Experimental results are provided to show the scalability and applicability of the proposed algorithms for the class of space-partitioning trees. A comparison of the two proposed algorithms shows that the first algorithm performs better in case of bulk loading. However the second algorithm is more general and can be used for efficient bulk insertion.

acm international workshop on multimedia databases | 2003

Video query processing in the VDBMS testbed for video database research

Walid G. Aref; Moustafa A. Hammad; Ann Christine Catlin; Ihab F. Ilyas; Thanaa M. Ghanem; Ahmed K. Elmagarmid; Mirette S. Marzouk

The increased use of video data sets for multimedia-based applications has created a demand for strong video database support, including efficient methods for handling the content-based query and retrieval of video data. Video query processing presents significant research challenges, mainly associated with the size, complexity and unstructured nature of video data. A video query processor must support video operations for search by content and streaming, new query types, and the incorporation of video methods and operators in generating, optimizing and executing query plans. In this paper, we address these query processing issues in two contexts, first as applied to the video data type and then as applied to the stream data type. We first present the query processing functionality of the VDBMS video database management system as a framework designed to support the full range of functionality for video as an abstract data type. We describe two query operators for the video data type which implement the rank-join and stop-after algorithms. As videos may be considered streams of consecutive image frames, video query processing can be expressed as continuous queries over video data streams. The stream data type was therefore introduced into the VDBMS system, and system functionality was extended to support general data streams. From this viewpoint, we present an approach for defining and processing streams, including video, through the query execution engine. We describe the implementation of several algorithms for video query processing expressed as continuous queries over video streams, such as fast forward, region-based blurring and left outer join. We include a description of the window-join algorithm as a core operator for continuous query systems, and discuss shared execution as an optimization approach for stream query processing.

international conference on data engineering | 2015

Demonstration of Taghreed: A system for querying, analyzing, and visualizing geotagged microblogs

Amr Magdy; Louai Alarabi; Saif Al-Harthi; Mashaal Musleh; Thanaa M. Ghanem; Sohaib Ghani; Saleh M. Basalamah; Mohamed F. Mokbel

This paper demonstrates Taghreed; a full-fledged system for efficient and scalable querying, analyzing, and visualizing geotagged microblogs, such as tweets. Taghreed supports a wide variety of queries on all microblogs attributes. In addition, it is able to manage a large number (billions) of microblogs for relatively long periods, e.g., months. Taghreed consists of four main components: (1) indexer, (2) query engine, (3) recovery manager, and (4) visualizer. Taghreed indexer efficiently digests incoming microblogs with high arrival rates in light main-memory indexes. When the memory becomes full, the memory contents are flushed to disk indexes which are managing billions of microblogs efficiently. On memory failure, the recovery manager restores the memory contents from backup copies. Taghreed query engine consists of two modules: a query optimizer and a query processor. The query optimizer generates an optimized query plan to be executed by the query processor to provide low query responses. Taghreed visualizer features to its users a wide variety of spatiotemporal queries and presents the answers on a map-based user interface that allows an interactive exploration. Taghreed is the first system that addresses all these challenges collectively for geotagged microblogs data. The system is demonstrated based on real system implementation through different scenarios that show system functionality and internals.

advances in geographic information systems | 2014

VisCAT: spatio-temporal visualization and aggregation of categorical attributes in twitter data

Thanaa M. Ghanem; Amr Magdy; Mashaal Musleh; Sohaib Ghani; Mohamed F. Mokbel

In the last few years, Twitter data has become so popular that it is used in a rich set of new applications, e.g., real-time event detection, demographic analysis, and news extraction. As user-generated data, the plethora of Twitter data motivates several analysis tasks that make use of activeness of 271+ Million Twitter users. This demonstration presents VisCAT; a tool for aggregating and visualizing categorical attributes in Twitter data. VisCAT outputs visual reports that provide spatial analysis through interactive map-based visualization for categorical attributes---such as tweet language or source operating system---at different zoom levels. The visual reports are built based on user-selected data in arbitrary spatial and temporal ranges. For this data, VisCAT employs a hierarchical spatial data structure to materialize the count of each category at multiple spatial levels. We demonstrate VisCAT, using real Twitter dataset. The demonstration includes use cases on tweet language and tweet source attributes in the region of Gulf Arab states, which can be used for deducing thoughtful conclusions on demographics and living levels in local societies.

international conference on management of data | 2014

Exploiting Geo-tagged Tweets to Understand Localized Language Diversity

Amr Magdy; Thanaa M. Ghanem; Mashaal Musleh; Mohamed F. Mokbel

Social media services are the top-growing online communities in the last few years. Among those, Twitter becomes the de facto of microblogging services with millions of tweets posted everyday. In this paper, we present an analytical study for localized language usage and diversity in Twitter data using a half billion geotagged tweets. We first identify local Twitter communities on a country-level. For the identified communities, we examine (1) the language diversity, (2) the language dominance within the community and how this differs from local to global views, (3) demographics representativeness of tweets for real population demographics, and (4) the spatial distribution of different cultural groups within the countries. To this end, we group the tweets on two levels. First, we group tweets per country to identify the local communities. Second, we group tweets within each local community based on the tweet language. Our study shows useful insights about language usage on Twitter which provide important information for language-based applications on top of Twitter data, e.g., lingual analysis and disaster management. In addition, we present an interactive exploration tool for the spatial distribution of cultural groups, which provides a low-effort and high-precision localization of different cultural groups inside a certain country.

acm conference on hypertext | 2016

Understanding Language Diversity in Local Twitter Communities

Amr Magdy; Thanaa M. Ghanem; Mashaal Musleh; Mohamed F. Mokbel

Twitter is one of the top-growing online communities in the last years. In this poster, we study the language usage and diversity in Twitter local communities. We identify local communities in Twitter on a country-level. For each community, we examine: (1) the language diversity, (2) the language dominance and how it differs from local to global views, (3) demographic representativeness of tweets, and (4) the spatial distribution of different cultural groups within the community. We show fruitful insights about language usage on Twitter which can be exploited in language-based applications on top of tweets, e.g., lingual analysis and disaster management. In addition, we provide an interactive tool to explore the spatial distribution of cultural groups, which provides a low-effort and high-precision localization of different cultural groups.

Explore More