Ayman Farahat
PARC
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ayman Farahat.
international acm sigir conference on research and development in information retrieval | 2003
Thorsten Brants; Francine Chen; Ayman Farahat
We present a new method and system for performing the New Event Detection task, i.e., in one or multiple streams of news stories, all stories on a previously unseen (new) event are marked. The method is based on an incremental TF-IDF model. Our extensions include: generation of source-specific models, similarity score normalization based on document-specific averages, similarity score normalization based on source-pair specific averages, term reweighting based on inverse event frequencies, and segmentation of the documents. We also report on extensions that did not improve results. The system performs very well on TDT3 and TDT4 test data and scored second in the TDT-2002 evaluation.
international acm sigir conference on research and development in information retrieval | 2001
Joel C. Miller; Gregory Rae; Fred Schaefer; Lesley Ward; Thomas LoFaro; Ayman Farahat
Kleinberg’s HITS algorithm, a method of link analysis, uses the link structure of a network of webpages to assign authority and hub weights to each page. These weights are used to rank sources on a particular topic. We have found that certain tree-like web structures can lead the HITS algorithm to return either arbitrary or non-intuitive results. We give a characterization of these web structures. We present two modifications to the adjacency matrix input to the HITS algorithm. Exponentiated Input, our first modification, includes information not only on direct links but also on longer paths between pages. It resolves both limitations mentioned above. Usage Weighted Input, our second modification, weights links according to how often they were followed by users in a given time period; it incorporates user feedback without requiring direct user querying.
SIAM Journal on Scientific Computing | 2005
Ayman Farahat; Thomas LoFaro; Joel C. Miller; Gregory Rae; Lesley Ward
Algorithms such as Kleinbergs HITS algorithm, the PageRank algorithm of Brin and Page, and the SALSA algorithm of Lempel and Moran use the link structure of a network of web pages to assign weights to each page in the network. The weights can then be used to rank the pages as authoritative sources. These algorithms share a common underpinning; they find a dominant eigenvector of a nonnegative matrix that describes the link structure of the given network and use the entries of this eigenvector as the page weights. We use this commonality to give a unified treatment, proving the existence of the required eigenvector for the PageRank, HITS, and SALSA algorithms, the uniqueness of the PageRank eigenvector, and the convergence of the algorithms to these eigenvectors. However, we show that the HITS and SALSA eigenvectors need not be unique. We examine how the initialization of the algorithms affects the final weightings produced. We give examples of networks that lead the HITS and SALSA algorithms to return nonunique or nonintuitive rankings. We characterize all such networks in terms of the connectivity of the related HITS authority graph. We propose a modification, Exponentiated Input to HITS, to the adjacency matrix input to the HITS algorithm. We prove that Exponentiated Input to HITS returns a unique ranking, provided that the network is weakly connected. Our examples also show that SALSA can give inconsistent hub and authority weights, due to nonuniqueness. We also mention a small modification to the SALSA initialization which makes the hub and authority weights consistent.
meeting of the association for computational linguistics | 2003
Ayman Farahat; Francine Chen; Thorsten Brants
Link detection has been regarded as a core technology for the Topic Detection and Tracking tasks of new event detection. In this paper we formulate story link detection and new event detection as information retrieval task and hypothesize on the impact of precision and recall on both systems. Motivated by these arguments, we introduce a number of new performance enhancing techniques including part of speech tagging, new similarity measures and expanded stop lists. Experimental results validate our hypothesis.
north american chapter of the association for computational linguistics | 2003
Francine Chen; Ayman Farahat; Thorsten Brants
Story link detection has been regarded as a core technology for other Topic Detection and Tracking tasks such as new event detection. In this paper we analyze story link detection and new event detection in a retrieval framework and examine the effect of a number of techniques, including part of speech tagging, new similarity measures, and an expanded stop list, on the performance of the two detection tasks. We present experimental results that show that the utility of the techniques on the two tasks differs, as is consistent with our analysis.
workshop on privacy in the electronic society | 2004
Philippe Golle; Ayman Farahat
We define message privacy against a <i>profiling</i> adversary, whose goal is to classify a population of users into categories according to the messages they exchange. This adversary models the most common privacy threat against email communication. We propose a protocol that protects senders and receivers of email messages from profiling attacks.
international world wide web conferences | 2005
Aleksandra Korolova; Ayman Farahat; Philippe Golle
A profiling adversary is an adversary whose goal is to classify a population of users into categories according to messages they exchange. This adversary models the most common privacy threat against web based communication.We propose a new encryption scheme, called stealth encryption, that protects users from profiling attacks by concealing the semantic content of plaintext while preserving its grammatical structure and other non-semantic linguistic features, such as word frequency distribution. Given English plaintext, stealth encryption produces ciphertext that cannot efficiently be distinguished from normal English text (our techniques apply to other languages as well).
Archive | 2003
Hermann Calabria; Francine Chen; Ayman Farahat; Daniel H. Greene
Archive | 2003
Hermann Calabria; Francine Chen; Ayman Farahat; Daniel H. Greene
Archive | 2003
Hermann Calabria; Francine Chen; Ayman Farahat; Daniel H. Greene