Tamer Elsayed | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tamer Elsayed is active.

Explore More

Publication

Featured researches published by Tamer Elsayed.

meeting of the association for computational linguistics | 2008

Pairwise Document Similarity in Large Collections with MapReduce

Tamer Elsayed; Jimmy J. Lin; Douglas W. Oard

This paper presents a MapReduce algorithm for computing pairwise document similarity in large document collections. MapReduce is an attractive framework because it allows us to decompose the inner products involved in computing document similarity into separate multiplication and summation stages in a way that is well matched to efficient disk access patterns across several machines. On a collection consisting of approximately 900,000 newswire articles, our algorithm exhibits linear growth in running time and space in terms of the number of documents.

ieee international conference on cloud computing technology and science | 2011

iHadoop: Asynchronous Iterations for MapReduce

Eslam Elnikety; Tamer Elsayed; Hany E. Ramadan

MapReduce is a distributed programming framework designed to ease the development of scalable data-intensive applications for large clusters of commodity machines. Most machine learning and data mining applications involve iterative computations over large datasets, such as the Web hyperlink structures and social network graphs. Yet, the MapReduce model does not efficiently support this important class of applications. The architecture of MapReduce, most critically its dataflow techniques and task scheduling, is completely unaware of the nature of iterative applications, tasks are scheduled according to a policy that optimizes the execution for a single iteration which wastes bandwidth, I/O, and CPU cycles when compared with an optimal execution for a consecutive set of iterations. This work presents iHadoop, a modified MapReduce model, and an associated implementation, optimized for iterative computations. The iHadoop model schedules iterations asynchronously. It connects the output of one iteration to the next, allowing both to process their data concurrently. iHadoops task scheduler exploits inter-iteration data locality by scheduling tasks that exhibit a producer/consumer relation on the same physical machine allowing a fast local data transfer. For those iterative applications that require satisfying certain criteria before termination, iHadoop runs the check concurrently during the execution of the subsequent iteration to further reduce the applications latency. This paper also describes our implementation of the iHadoop model, and evaluates its performance against Hadoop, the widely used open source implementation of MapReduce. Experiments using different data analysis applications over real-world and synthetic datasets show that iHadoop performs better than Hadoop for iterative algorithms, reducing execution time of iterative applications by 25% on average. Furthermore, integrating iHadoop with HaLoop, a variant Hadoop implementation that caches invariant data between iterations, reduces execution time by 38% on average.

conference on information and knowledge management | 2011

When close enough is good enough: approximate positional indexes for efficient ranked retrieval

Tamer Elsayed; Jimmy J. Lin; Donald Metzler

Previous research has shown that features based on term proximity are important for effective retrieval. However, they incur substantial costs in terms of larger inverted indexes and slower query execution times as compared to term-based features. This paper explores whether term proximity features based on approximate term positions are as effective as those based on exact term positions. We introduce the novel notion of approximate positional indexes based on dividing documents into coarse-grained buckets and recording term positions with respect to those buckets. We propose different approaches to defining the buckets and compactly encoding bucket ids. In the context of linear ranking functions, experimental results show that features based on approximate term positions are able to achieve effectiveness comparable to exact term positions, but with smaller indexes and faster query evaluation.

Information Processing and Management | 2016

Unsupervised adaptive microblog filtering for broad dynamic topics

Walid Magdy; Tamer Elsayed

Information filtering has been a major task of study in the field of information retrieval (IR) for a long time, focusing on filtering well-formed documents such as news articles. Recently, more interest was directed towards applying filtering tasks to user-generated content such as microblogs. Several earlier studies investigated microblog filtering for focused topics. Another vital filtering scenario in microblogs targets the detection of posts that are relevant to long-standing broad and dynamic topics, i.e., topics spanning several subtopics that change over time. This type of filtering in microblogs is essential for many applications such as social studies on large events and news tracking of temporal topics. In this paper, we introduce an adaptive microblog filtering task that focuses on tracking topics of broad and dynamic nature. We propose an entirely-unsupervised approach that adapts to new aspects of the topic to retrieve relevant microblogs. We evaluated our filtering approach using 6 broad topics, each tested on 4 different time periods over 4 months. Experimental results showed that, on average, our approach achieved 84% increase in recall relative to the baseline approach, while maintaining an acceptable precision that showed a drop of about 8%. Our filtering method is currently implemented on TweetMogaz, a news portal generated from tweets. The website compiles the stream of Arabic tweets and detects the relevant tweets to different regions in the Middle East to be presented in the form of comprehensive reports that include top stories and news in each region. Broad topics on Twitter are highly dynamic.Boolean filtering retrieve high precision but limited number of tweets.A proposed adaptive filtering achieved 84% gain in recall with slight drop in prec.Proposed method showed robustness over time, across domains, and query formulations.Our method is currently adopted in a live service that follows news from Twitter.

Future Generation Computer Systems | 2015

CloudFlow: A data-aware programming model for cloud workflow applications on modern HPC systems

F. Zhang; Qutaibah M. Malluhi; Tamer Elsayed; Samee Ullah Khan; Keqin Li; Albert Y. Zomaya

NPRP grant # 09-1116-1-172 from the Qatar National Research Fund (a member of Qatar Foundation). Ministry of Science and Technology of China under National 973 Basic Research Program (Grant No. 2013CB228206), National Natural Science Foundation of China (Grant Nos. 61472200 and 61233016).

Information Processing and Management | 2018

Intelligent topic selection for low-cost information retrieval evaluation: A New perspective on deep vs. shallow judging

Mucahid Kutlu; Tamer Elsayed; Matthew Lease

Abstract While test collections provide the cornerstone for Cranfield-based evaluation of information retrieval (IR) systems, it has become practically infeasible to rely on traditional pooling techniques to construct test collections at the scale of today’s massive document collections (e.g., ClueWeb12’s 700M+ Webpages). This has motivated a flurry of studies proposing more cost-effective yet reliable IR evaluation methods. In this paper, we propose a new intelligent topic selection method which reduces the number of search topics (and thereby costly human relevance judgments) needed for reliable IR evaluation. To rigorously assess our method, we integrate previously disparate lines of research on intelligent topic selection and deep vs. shallow judging (i.e., whether it is more cost-effective to collect many relevance judgments for a few topics or a few judgments for many topics). While prior work on intelligent topic selection has never been evaluated against shallow judging baselines, prior work on deep vs. shallow judging has largely argued for shallowed judging, but assuming random topic selection. We argue that for evaluating any topic selection method, ultimately one must ask whether it is actually useful to select topics, or should one simply perform shallow judging over many topics? In seeking a rigorous answer to this over-arching question, we conduct a comprehensive investigation over a set of relevant factors never previously studied together: 1) method of topic selection; 2) the effect of topic familiarity on human judging speed; and 3) how different topic generation processes (requiring varying human effort) impact (i) budget utilization and (ii) the resultant quality of judgments. Experiments on NIST TREC Robust 2003 and Robust 2004 test collections show that not only can we reliably evaluate IR systems with fewer topics, but also that: 1) when topics are intelligently selected, deep judging is often more cost-effective than shallow judging in evaluation reliability; and 2) topic familiarity and topic generation costs greatly impact the evaluation cost vs. reliability trade-off. Our findings challenge conventional wisdom in showing that deep judging is often preferable to shallow judging when topics are selected intelligently.

midwest symposium on circuits and systems | 2003

ATP: autonomous transport protocol

Tamer Elsayed; Mohamed E. Hussein; Moustafa Youssef; Tamer Nadeem; Adel M. Youssef; Liviu Iftode

In this paper we present the design of the autonomous transport protocol (ATP). The basic service provided by the ATP is to maintain a reliable transport connection between two endpoints, identified by content identifiers, independent of their physical locations. Autonomy allows dynamic endpoints relocation on different hosts without disrupting the transport connection between them. The ATP depends on the existence of an underlying enhanced content-based network to achieve its goals. Data is transferred by a combination of active and passive operations, where the ATP layer of a node can decide whether to actively push the data to the destination or to passively wait for the destination endpoint to pull the data. The decision to use either the active mode or the passive mode can be taken by a local policy on the node running the ATP

north american chapter of the association for computational linguistics | 2016

QU-IR at SemEval 2016 Task 3: Learning to Rank on Arabic Community Question Answering Forums with Word Embedding

Rana Malhas; Marwan Torki; Tamer Elsayed

Resorting to community question answering (CQA) websites for finding answers has gained momentum in the past decade with the explosive rate at which social media has been proliferating. With many questions left unanswered on those websites, automatic and smart question answering systems have seen light. One of the main objectives of such systems is to harness the plethora of existing answered questions; hence transforming the problem to finding good answers to newly posed questions from similar previously-answered ones. As SemEval 2016 Task 3 “Community Question Answering” has focused on this problem, we have participated in the Arabic Subtask. Our system has adopted a supervised learning approach in which a learning-to-rank model is trained over data (questions and answers) extracted from Arabic CQA forums using word2vec features generated from that data. Our primary submission achieved a 29.7% improvement over the MAP score of the baseline. Post submission experiments were further conducted to integrate variations of the word2vec features to our system. Integrating covariance word embedding features has raised the the improvement over the baseline to 37.9%.

meeting of the association for computational linguistics | 2009

Arabic cross-document coreference detection

Asad B. Sayeed; Tamer Elsayed; Nikesh Garera; David Alexander; Tan Xu; Douglas W. Oard; David Yarowsky; Christine D. Piatko

We describe a set of techniques for Arabic cross-document coreference resolution. We compare a baseline system of exact mention string-matching to ones that include local mention context information as well as information from an existing machine translation system. It turns out that the machine translation-based technique outperforms the baseline, but local entity context similarity does not. This helps to point the way for future cross-document coreference work in languages with few existing resources for the task.

international acm sigir conference on research and development in information retrieval | 2016

EveTAR: A New Test Collection for Event Detection in Arabic Tweets

Hind Almerekhi; Maram Hasanain; Tamer Elsayed

Research on event detection in Twitter is often obstructed by the lack of publicly-available evaluation mechanisms such as test collections; this problem is more severe when considering the scarcity of them in languages other than English. In this paper, we present EveTAR, the first publicly-available test collection for event detection in Arabic tweets. The collection includes a crawl of 590M Arabic tweets posted in a month period and covers 66 significant events (in 8 different categories) for which more than 134k relevance judgments were gathered using crowdsourcing with high average inter-annotator agreement (Kappa value of 0.6). We demonstrate the usability of the collection by evaluating 3 state-of-the-art event detection algorithms. The collection is also designed to support other retrieval tasks, as we show in our experiments with ad-hoc search systems.

Explore More