Is this you? Create Your Porfile

Foto N. Afrati

National Technical University of Athens

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Foto N. Afrati is active.

Explore More

Publication

Featured researches published by Foto N. Afrati.

foundations of computer science | 1999

Approximation schemes for minimizing average weighted completion time with release dates

Foto N. Afrati; Evripidis Bampis; Chandra Chekuri; David R. Karger; Claire Kenyon; Sanjeev Khanna; Ioannis Milis; Maurice Queyranne; Martin Skutella; Clifford Stein; Maxim Sviridenko

We consider the problem of scheduling n jobs with release dates on m machines so as to minimize their average weighted completion time. We present the first known polynomial time approximation schemes for several variants of this problem. Our results include PTASs for the case of identical parallel machines and a constant number of unrelated machines with and without preemption allowed. Our schemes are efficient: for all variants the running time for /spl alpha/(1+/spl epsiv/) approximation is of the form f(1//spl epsiv/, m)poly(n).

knowledge discovery and data mining | 2004

Approximating a collection of frequent sets

Foto N. Afrati; Aristides Gionis; Heikki Mannila

One of the most well-studied problems in data mining is computing the collection of frequent item sets in large transactional databases. One obstacle for the applicability of frequent-set mining is that the size of the output collection can be far too large to be carefully examined and understood by the users. Even restricting the output to the border of the frequent item-set collection does not help much in alleviating the problem.In this paper we address the issue of overwhelmingly large output size by introducing and studying the following problem: What are the k sets that best approximate a collection of frequent item sets? Our measure of approximating a collection of sets by k sets is defined to be the size of the collection covered by the the k sets, i.e., the part of the collection that is included in one of the k sets. We also specify a bound on the number of extra sets that are allowed to be covered. We examine different problem variants for which we demonstrate the hardness of the corresponding problems and we provide simple polynomial-time approximation algorithms. We give empirical evidence showing that the approximation methods work well in practice.

IEEE Transactions on Knowledge and Data Engineering | 2011

Optimizing Multiway Joins in a Map-Reduce Environment

Foto N. Afrati; Jeffrey D. Ullman

Implementations of map-reduce are being used to perform many operations on very large data. We examine strategies for joining several relations in the map-reduce environment. Our new approach begins by identifying the “map-key,” the set of attributes that identify the Reduce process to which a Map process must send a particular tuple. Each attribute of the map-key gets a “share,” which is the number of buckets into which its values are hashed, to form a component of the identifier of a Reduce process. Relations have their tuples replicated in limited fashion, the degree of replication depending on the shares for those map-key attributes that are missing from their schema. We study the problem of optimizing the shares, given a fixed number of Reduce processes. An algorithm for detecting and fixing problems where a variable is mistakenly included in the map-key is given. Then, we consider two important special cases: chain joins and star joins. In each case, we are able to determine the map-key and determine the shares that yield the least replication. While the method we propose is not always superior to the conventional way of using map-reduce to implement joins, there are some important cases involving large-scale data where our method wins, including: 1) analytic queries in which a very large fact table is joined with smaller dimension tables, and 2) queries involving paths through graphs with high out-degree, such as the Web or a social network.

extending database technology | 2011

Map-reduce extensions and recursive queries

Foto N. Afrati; Vinayak R. Borkar; Michael J. Carey; Neoklis Polyzotis; Jeffrey D. Ullman

We survey the recent wave of extensions to the popular map-reduce systems, including those that have begun to address the implementation of recursive queries using the same computing environment as map-reduce. A central problem is that recursive tasks cannot deliver their output only at the end, which makes recovery from failures much more complicated than in map-reduce and its nonrecursive extensions. We propose several algorithmic ideas for efficient implementation of recursions in the map-reduce environment and discuss several alternatives for supporting recovery from failures without restarting the entire job.

international conference on data engineering | 2012

Fuzzy Joins Using MapReduce

Foto N. Afrati; Anish Das Sarma; David Menestrina; Aditya G. Parameswaran; Jeffrey D. Ullman

Fuzzy/similarity joins have been widely studied in the research community and extensively used in real-world applications. This paper proposes and evaluates several algorithms for finding all pairs of elements from an input set that meet a similarity threshold. The computation model is a single MapReduce job. Because we allow only one MapReduce round, the Reduce function must be designed so a given output pair is produced by only one task, for many algorithms, satisfying this condition is one of the biggest challenges. We break the cost of an algorithm into three components: the execution cost of the mappers, the execution cost of the reducers, and the communication cost from the mappers to reducers. The algorithms are presented first in terms of Hamming distance, but extensions to edit distance and Jaccard distance are shown as well. We find that there are many different approaches to the similarity-join problem using MapReduce, and none dominates the others when both communication and reducer costs are considered. Our cost analyses enable applications to pick the optimal algorithm based on their communication, memory, and cluster requirements.

international conference on data engineering | 2013

Enumerating subgraph instances using map-reduce

Foto N. Afrati; Dimitris Fotakis; Jeffrey D. Ullman

The theme of this paper is how to find all instances of a given “sample” graph in a larger “data graph,” using a single round of map-reduce. For the simplest sample graph, the triangle, we improve upon the best known such algorithm. We then examine the general case, considering both the communication cost between mappers and reducers and the total computation cost at the reducers. To minimize communication cost, we exploit the techniques of [1] for computing multiway joins (evaluating conjunctive queries) in a single map-reduce round. Several methods are shown for translating sample graphs into a union of conjunctive queries with as few queries as possible. We also address the matter of optimizing computation cost. Many serial algorithms are shown to be “convertible,” in the sense that it is possible to partition the data graph, explore each partition in a separate reducer, and have the total computation cost at the reducers be of the same order as the computation cost of the serial algorithm.

international conference on management of data | 2001

Generating efficient plans for queries using views

Foto N. Afrati; Chen Li; Jeffrey D. Ullman

We study the problem or generating efficient, equivalent rewritings using views to compute the answer to a query. We take the closed-world assumption, in which views are materialized from base relations, rather than views describing sources in terms of abstract predicates, as is common when the open-world assumption is used. In the closed-world model, there can be an infinite number of different rewritings that compute the same answer, yet have quite different performance. Query optimizers take a logical plan (a rewriting of the query) as an input, and generate efficient physical plans to compute the answer. Thus our goal is to generate a small subset of the possible logical plans without missing an optimal physical plan. We first consider a cost model that counts the number of subgoals in a physical plan, and show a search space that is guaranteed to include an optimal rewriting, if the query has a rewriting in terms of the views. We also develop an efficient algorithm for finding rewritings with the minimum number of subgoals. We then consider a cost model that counts the sizes of intermediate relations of a physical plan, without dropping any attributes, and give a search space for finding optimal rewritings. Our final cost model allows attributes to be dropped in intermediate relations. We show that, by careful variable renaming, it is possible to do better than the standard “supplementary relation” approach, by dropping attributes that the latter approach would retain. Experiments show that our algorithm of generating optimal rewritings has good efficiency and scalability.

symposium on principles of database systems | 2008

Answering aggregate queries in data exchange

Foto N. Afrati; Phokion G. Kolaitis

Data exchange, also known as data translation, has been extensively investigated in recent years. One main direction of research has focused on the semantics and the complexity of answering first-order queries in the context of data exchange between relational schemas. In this paper, we initiate a systematic investigation of the semantics and the complexity of aggregate queries in data exchange, and make a number of conceptual and technical contributions. Data exchange is a context in which incomplete information arises, hence one has to cope with a set of possible worlds, instead of a single database. Three different sets of possible worlds have been explored in the study of the certain answers of first-order queries in data exchange: the set of possible worlds of all solutions, the set of possible worlds of all universal solutions, and a set of possible worlds derived from the CWA-solutions. We examine each of these sets and point out that none of them is suitable for aggregation in data exchange, as each gives rise to rather trivial semantics. Our analysis also reveals that, to have meaningful semantics for aggregation in data exchange, a strict closed world assumption has to be adopted in selecting the set of possible worlds. For this, we introduce and study the set of the endomorphic images of the canonical universal solution as a set of possible worlds for aggregation in data exchange. Our main technical result is that for schema mappings specified by source-to-target tgds, there are polynomial-time algorithms for computing the range semantics of every scalar aggregation query, where the range semantics of an aggregate query is the greatest lower bound and the least upper bound of the values that the query takes over the set of possible worlds. Among these algorithms, the more sophisticated one is the algorithm for the average operator, which makes use of concepts originally introduced in the study of the core of the universal solutions in data exchange. We also show that if, instead of range semantics, we consider possible answer semantics, then it is an NP-complete problem to tell if a number is a possible answer of a given scalar aggregation query with the average operator.

symposium on principles of database systems | 2002

Answering queries using views with arithmetic comparisons

Foto N. Afrati; Chen Li; Prasenjit Mitra

We consider the problem of answering queries using views, where queries and views are conjunctive queries with arithmetic comparisons (CQACs) over dense orders. Previous work only considered limited variants of this problem, without giving a complete solution. We have developed a novel algorithm to obtain maximally-contained rewritings (MCRs) for queries having left (or right) semi-interval-comparison predicates. For semi-interval queries, we show that the language of finite unions of CQAC rewritings is not sufficient to find a maximally-contained solution, and identify cases where datalog is sufficient. Finally, we show that it is decidable to obtain equivalent rewritings for CQAC queries.

symposium on principles of database systems | 1987

The parallel complexity of simple chain queries

Foto N. Afrati; Christos H. Papadimitriou

They are written m a notation called DATALOG, that IS, PROLOG without function symbols and other lmpurities (an orthogonal way of vlewmg DATALOG 1s as Relatlonal Calculus with the additional power of recurslon) For more on DATALOG see, for example, [UV] Both queries above define a view S m terms of the database relations a and b We are Interested m the parallel complexJtyof these and similar queries, that is, the degree to which such queries are amenable to rapid parallel evaluation by the cooperation of many processors Recently, m view of the projected avallablhty of multlprocessmg systems with a very large number of processors, there has been much Interest m such a classlficatlon of computational problems In particular, it has been proposed that a problem be considered satlsfactorlly solved m parallel if there IS an algorithm for it which can be rendered as a circuit with a polynomial number of gates, andpoiyiogarlthmlc depth (that IS, of depth O(logk n), where n IS the length of the input) The class of all problems thus solvable 1s called NC [Co] Obviously, NC 1s a subset of P, the class of all problems solvable m polynomial sequentd time (It IS perhaps amusing that, m response to a sequence of lmpresslve breakthroughs m computmg technology,

Explore More