Amr Magdy
University of Minnesota
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Amr Magdy.
international conference on management of data | 2012
Mohamed Sarwat; Jie Bao; Ahmed Eldawy; Justin J. Levandoski; Amr Magdy; Mohamed F. Mokbel
This demo presents Sindbad; a location-based social networking system. Sindbad supports three new services beyond traditional social networking services, namely, location-aware news feed, location-aware recommender, and location-aware ranking. These new services not only consider social relevance for its users, but they also consider spatial relevance. Since location-aware social networking systems have to deal with large number of users, large number of messages, and user mobility, efficiency and scalability are important issues. To this end, Sindbad encapsulates its three main services inside the query processing engine of PostgreSQL. Usage and internal functionality of Sindbad, implemented with PostgreSQL and Google Maps API, are demonstrated through user (i.e., web/phone) and system analyzer GUI interfaces, respectively.
advances in geographic information systems | 2014
Amr Magdy; Louai Alarabi; Saif Al-Harthi; Mashaal Musleh; Thanaa M. Ghanem; Sohaib Ghani; Mohamed F. Mokbel
This paper presents Taghreed; a full-fledged system for efficient and scalable querying, analyzing, and visualizing geotagged microblogs, e.g., tweets. Taghreed supports arbitrary queries on a large number (Billions) of microblogs that go up to several months in the past. Taghreed consists of four main components: (f) Indexer, (2) query engine, (3) recovery manager, and (4) visualizer. Taghreed indexer efficiently digests incoming microblogs with high arrival rates in light memory-resident indexes. When the memory becomes full, a flushing policy manager transfers the memory contents to disk indexes which are managing Billions of microblogs for several months. On memory failure, the recovery manager restores the system status from replicated copies for the main-memory content. Taghreed query engine consists of two modules: a query optimizer and a query processor. The query optimizer generates an optimal query plan to be executed by the query processor through efficient retrieval techniques to provide low query response, i.e., order of milli-seconds. Taghreed visualizer allows end users to issue a wide variety of spatio-temporal queries. Then, it graphically presents the answers and allows interactive exploration through them. Taghreed is the first system that addresses all these challenges collectively for microblogs data. In the paper, each system component is described in detail.
international conference on data engineering | 2014
Amr Magdy; Mohamed F. Mokbel; Sameh Elnikety; Suman Nath; Yuxiong He
This paper presents Mercury; a system for real-time support of top-k spatio-temporal queries on microblogs, where users are able to browse recent microblogs near their locations. With high arrival rates of microblogs, Mercury ensures real-time query response within a tight memory-constrained environment. Mercury bounds its search space to include only those microblogs that have arrived within certain spatial and temporal boundaries, in which only the top-k microblogs, according to a spatio-temporal ranking function, are returned in the search results. Mercury employs: (a) a scalable dynamic in-memory index structure that is capable of digesting all incoming microblogs, (b) an efficient query processor that exploits the in-memory index through spatio-temporal pruning techniques that reduce the number of visited microblogs to return the final answer, (c) an index size tuning module that dynamically finds and adjusts the minimum index size to ensure that incoming queries will be answered accurately, and (d) a load shedding technique that trades slight decrease in query accuracy for significant storage savings. Extensive experimental results based on a real-time Twitter Firehose feed and actual locations of Bing search queries show that Mercury supports high arrival rates of up to 64K microblogs/second and average query latency of 4 msec.
symposium on large spatial databases | 2013
Mohamed F. Mokbel; Louai Alarabi; Jie Bao; Ahmed Eldawy; Amr Magdy; Mohamed Sarwat; Ethan Waytas; Steven Yackel
Road network traffic datasets have attracted significant attention in the past decade. For instance, in spatio-temporal databases area, researchers harness road network traffic data to evaluate and validate their research. Collecting real traffic datasets is tedious as it usually takes a significant amount of time and effort. Alternatively, many researchers opt to generate synthetic traffic data using existing traffic generation tools, e.g., Brinkhoff and BerlinMOD. Unfortunately, existing road network traffic generators require significant amount of time and effort to install, configure, and run. Moreover, it is not trivial to generate traffic data in arbitrary spatial regions using existing traffic generators. In this paper, we propose Minnesota Traffic Generator (MNTG); an extensible web-based road network traffic generator that overcomes the hurdles of using existing traffic generators. MNTG does not provide a new way to simulate traffic data. Instead, it serves as a wrapper over existing traffic generators, making them easy to use, configure, and run for any arbitrary spatial road region. To generate traffic data, MNTG users just need to use its user-friendly web interface to specify an arbitrary spatial range on the map, select a traffic generator method, and submit the traffic generation request to the server. MNTG dedicated server will receive and process the submitted traffic generation request, and notify the user via email when finished. MNTG users can then download their generated data and/or visualize it on MNTG map interface. MNTG is extensible in two frontiers: (1) It can be easily extended to support various traffic generators. It is already shipped with the two most common traffic generators, Brinkhoff and BerlinMOD, yet, it also has the interface that can be used to add new traffic generators. (2) It can be easily extended to support various road network sources. It is shipped with U.S. Tiger files and Open Street Map, yet, it also has the interface that can be used to add other sources. MNTG is launched as a web service for public use; a prototype can be accessed via http://mntg.cs.umn.edu .
mobile data management | 2015
Amr Magdy; Mohamed F. Mokbel
This paper advocates for the need to build a Microblogs Data Management System (MDMS) as an end-to-end data management system to support indexing, querying, and analyzing microblogs, e.g., Tweets, comments, or check-ins. We identify a set of characteristics for microblogging environments that are distinguishing from any other data management environment. Then, we propose a system architecture for the first Microblogs Data Management System, which includes indexing, querying, and recovery components. The indexing component is responsible for indexing recent data in memory, indexing older data in disk, and synchronizing the flow of data from memory to disk without affecting the query response time. The querying component is responsible for retrieving the query answer from both memory and disk storage as well as employing online selectivity estimation techniques tuned to the behavior of microblogs data. The recovery module allows for efficiently storing and processing incoming microblogs in memory without worrying about data loss.
international conference on data engineering | 2015
Amr Magdy; Louai Alarabi; Saif Al-Harthi; Mashaal Musleh; Thanaa M. Ghanem; Sohaib Ghani; Saleh M. Basalamah; Mohamed F. Mokbel
This paper demonstrates Taghreed; a full-fledged system for efficient and scalable querying, analyzing, and visualizing geotagged microblogs, such as tweets. Taghreed supports a wide variety of queries on all microblogs attributes. In addition, it is able to manage a large number (billions) of microblogs for relatively long periods, e.g., months. Taghreed consists of four main components: (1) indexer, (2) query engine, (3) recovery manager, and (4) visualizer. Taghreed indexer efficiently digests incoming microblogs with high arrival rates in light main-memory indexes. When the memory becomes full, the memory contents are flushed to disk indexes which are managing billions of microblogs efficiently. On memory failure, the recovery manager restores the memory contents from backup copies. Taghreed query engine consists of two modules: a query optimizer and a query processor. The query optimizer generates an optimized query plan to be executed by the query processor to provide low query responses. Taghreed visualizer features to its users a wide variety of spatiotemporal queries and presents the answers on a map-based user interface that allows an interactive exploration. Taghreed is the first system that addresses all these challenges collectively for geotagged microblogs data. The system is demonstrated based on real system implementation through different scenarios that show system functionality and internals.
international conference on data engineering | 2014
Amr Magdy; Ahmed M. Aly; Mohamed F. Mokbel; Sameh Elnikety; Yuxiong He; Suman Nath
Mars demonstration exploits the microblogs location information to support a wide variety of important spatio-temporal queries on microblogs. Supported queries include range, nearest-neighbor, and aggregate queries. Mars works under a challenging environment where streams of microblogs are arriving with high arrival rates. Mars distinguishes itself with three novel contributions: (1) Efficient in-memory digestion/expiration techniques that can handle microblogs of high arrival rates up to 64,000 microblog/sec. This also includes highly accurate and efficient hopping-window based aggregation for incoming microblogs keywords. (2) Smart memory optimization and load shedding techniques that adjust in-memory contents based on the expected query load to trade off a significant storage savings with a slight and bounded accuracy loss. (3) Scalable real-time query processing, exploiting Zipf distributed microblogs data for efficient top-k aggregate query processing. In addition, Mars employs a scalable real-time nearest neighbor and range query processing module that employs various pruning techniques so that it serves heavy query workloads in real time. Mars is demonstrated using a stream of real tweets obtained from Twitter firehose with a production query workload obtained from Bing web search. We show that Mars serves incoming queries with an average latency of less than 4 msec and with 99% answer accuracy while saving up to 70% of storage overhead for different query loads.
advances in geographic information systems | 2014
Thanaa M. Ghanem; Amr Magdy; Mashaal Musleh; Sohaib Ghani; Mohamed F. Mokbel
In the last few years, Twitter data has become so popular that it is used in a rich set of new applications, e.g., real-time event detection, demographic analysis, and news extraction. As user-generated data, the plethora of Twitter data motivates several analysis tasks that make use of activeness of 271+ Million Twitter users. This demonstration presents VisCAT; a tool for aggregating and visualizing categorical attributes in Twitter data. VisCAT outputs visual reports that provide spatial analysis through interactive map-based visualization for categorical attributes---such as tweet language or source operating system---at different zoom levels. The visual reports are built based on user-selected data in arbitrary spatial and temporal ranges. For this data, VisCAT employs a hierarchical spatial data structure to materialize the count of each category at multiple spatial levels. We demonstrate VisCAT, using real Twitter dataset. The demonstration includes use cases on tweet language and tweet source attributes in the region of Gulf Arab states, which can be used for deducing thoughtful conclusions on demographics and living levels in local societies.
international conference on data engineering | 2014
Mohamed F. Mokbel; Louai Alarabi; Jie Bao; Ahmed Eldawy; Amr Magdy; Mohamed Sarwat; Ethan Waytas; Steven Yackel
This demo presents Minnesota Traffic Generator (MNTG); an extensible web-based road network traffic generator. MNTG enables its users to generate traffic data at any arbitrary road networks with different traffic generators. Unlike existing traffic generators that require a lot of time/effort to install, configure, and run, MNTG is a web service with a user-friendly interface where users can specify an arbitrary spatial region, select a traffic generator, and submit their traffic generation request. Once the traffic data is generated by MNTG, users can then download and/or visualize the generated data. MNTG can be extended to support: (1) various traffic generators. It is already shipped with the two most common traffic generators, Brinkhoff and BerlinMOD, but other generators can be easily added. (2) various road network sources. It is shipped with U.S. Tiger files and OpenStreetMap, but other sources can be also added. A beta version of MNTG is launched at: http://mntg.cs.umn.edu.
international conference on data engineering | 2016
Amr Magdy; Rami Alghamdi; Mohamed F. Mokbel
Searching microblogs, e.g., tweets and comments, is practically supported through main-memory indexing for scalable data digestion and efficient query evaluation. With continuity and excessive numbers of microblogs, it is infeasible to keep data in main-memory for long periods. Thus, once allocated memory budget is filled, a portion of data is flushed from memory to disk to continuously accommodate newly incoming data. Existing techniques come with either low memory hit ratio due to flushing items regardless of their relevance to incoming queries or significant overhead of tracking individual data items, which limit scalability of microblogs systems in either cases. In this paper, we propose kFlushing policy that exploits popularity of top-k queries in microblogs to smartly select a subset of microblogs to flush. kFlushing is mainly designed to increase memory hit ratio. To this end, it identifies and flushes in-memory data that does not contribute to incoming queries. The freed memory space is utilized to accumulate more useful data that is used to answer more queries from memory contents. When all memory is utilized for useful data, kFlushing flushes data that is less likely to degrade memory hit ratio. In addition, kFlushing comes with a little overhead that keeps high system scalability in terms of high digestion rates of incoming fast data. Extensive experimental evaluation shows the effectiveness and scalability of kFlushing to improve main-memory hit by 26–330% while coping up with fast microblog streams of up to 100K microblog/second.