Francesco M. Malvestuto
Sapienza University of Rome
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Francesco M. Malvestuto.
ACM Transactions on Database Systems | 1993
Francesco M. Malvestuto
In many situations a statistical database contains multiple summary tables, which report summary statistics on the same summary variable for the same population of individuals or objects using different classification criteria («homogeneous» summary tables). Existing query languages consider only those queries which may aggregate data stored in a single summary table. When a statistical database contains homogeneous summary tables, such query languages do not allow an integrated view of data, whereas statisticians are inclined to view and query a collection of homogeneous summary tables as if they were actually a single higher-dimensional summary table. This legitimizes the search for a universal-scheme solution to the problem of data integration is such statistical databases
systems man and cybernetics | 1991
Francesco M. Malvestuto
A heuristic procedure is presented for approximating an n-dimensional discrete probability distribution with a decomposable model of a given complexity. It is shown that, without loss of generality, the search space can be restricted to a suitable subclass of decomposable models, whose members are called elementary models. The selected elementary model is constructed in an incremental manner according to a local-optimality criterion that consists of minimizing a suitable cost function. It is shown by an example that the solution computed by the procedure is sometimes optimal. >
international conference on management of data | 1988
Francesco M. Malvestuto
Given a statistical database consisting of two summary tables based on a common but not identical classification criterion (e.g., two geographical partitionings of a country) there are additional summary tables that are derivable in the sense that they are uniquely (i.e., with no uncertainty) determined by the tables given. Derivable tables encompass not only, of course, “less detailed” tables (that is, aggregated data) but also “more detailed” tables (that is, disaggregated data). Tables of the second type can be explicitly constructed by using a “procedure of data refinement” based on the graph representation of the correspondences between the categories of the two classification systems given in some cases, that is, when such a graph representation meets the acyclicity condition, the underlying database is “equivalent” to a single table (called representative table) and then a necessary and sufficient condition for a table to be derivable can be stated.
ACM Transactions on Information and System Security | 2006
Francesco M. Malvestuto; Mauro Mezzini; Marina Moscarini
In response to queries asked to a statistical database, the query system should avoid releasing summary statistics that could lead to the disclosure of confidential individual data. Attacks to the security of a statistical database may be direct or indirect and, in order to repel them, the query system should audit queries by controlling the amount of information released by their responses. This paper focuses on sum-queries with a response variable of nonnegative real type and proposes a compact representation of answered sum-queries, called an information model in “normal form,” which allows the query system to decide whether the value of a new sum-query can or cannot be safely answered. If it cannot, then the query system will issue the range of feasible values of the new sum-query consistent with previously answered sum-queries. Both the management of the information model and the answering procedure require solving linear-programming problems and, since standard linear-programming algorithms are not polynomially bounded (despite their good performances in practice), effective procedures that make a parsimonious use of them are stated for the general case. Moreover, in the special case that the information model is “graphical.” It is shown that the answering procedure can be implemented in polynomial time.
Discrete Mathematics | 1988
Francesco M. Malvestuto
Abstract Three or more probability distributions may be pairwise compatible but not collectively compatible, in the sense that they admit no common extensions. However, pairwise compatibility proves to be a necessary and sufficient condition for collective compatibility when the underlying system of distribution schemes is “acyclic”. If this is the case, then (and only then) do the distributions admit a product extension, whose expression can be computed by a simple algorithm.
Information Sciences | 1989
Francesco M. Malvestuto
Abstract A consistent categorical database can be viewed as a single contingency table by taking the maximum-entropy extension of its base tables. Such a view, here called the universal table model , is needed to answer a user who wishes “cross-classified” data, that is, categorical data resulting from the combination of information contained in two or more base tables. In order to implement a universal table interface , we make use of a query-evaluation procedure; this allows for an appropriate answer to be generated whether the requested data are stored in the database or not and, then, have to be computed (i.e., estimated).
symposium on principles of database systems | 1987
Francesco M. Malvestuto
A compatible categorical data base can be viewed as a single (contingency) table by taking the maximum-entropy extension of the component tables. Such a view, here called universal table model, is needed to answer a user who wishes “cross-classified” categorical data, that is, categorical data resulting from the combination of the information contents of two or more base tables. In order to implement a universal table interface we make use of a query-optimization procedure, which is able to generate an appropriate answer both in the case that the asked data are present in the data base and in the case that they are not and, then, have to be estimated
Theoretical Computer Science | 2000
Francesco M. Malvestuto; Marina Moscarini
The notion of vertex separability by partial edges for a simple hypergraph is introduced and the related structural properties of the hypergraph are analyzed in terms of maximal (with respect to set-theoretic inclusion) compacts and of dividers, where a compact is a vertex set in which every two vertices are separated by no partial edge, and a divider is a partial edge X for which there exists a pair of vertices that are separated by X and by no proper subset of X. It is proven that, given a hypergraph H, the hypergraph (called the compaction of H) made up of maximal compacts of H is acyclic and coincides with H if and only if H is acyclic; furthermore, it has the same dividers as H, and can be characterized as being the unique acyclic hypergraph that has the same compacts as H. Polynomial algorithms for finding maximal compacts and dividers of a given hypergraph are provided. Finally, an application to the problem of computing the maximum-entropy extension of a system of marginals over a hypergraph is discussed.
statistical and scientific database management | 1998
Francesco M. Malvestuto; Marina Moscarini
An implementation of the auditing strategy is presented to avoid both exact and approximate disclosure. The key data structure is a query map, which is a graphical summary of answered queries. Since the size of a query map may be exponential in the number of answered queries, a query-restriction criterion is introduced to make every query map a graph. An auditing procedure on such a graph is presented and the computational issues connected with its implementation are discussed. All the computational tasks can be carried out efficiently but one, which is a provably intractable problem.
international conference on database theory | 2003
Francesco M. Malvestuto; Mauro Mezzini
In an on-line statistical database, the query system should leave unanswered queries asking for sums that could lead to the disclosure of confidential data. To check that, every sum query and previously answered sum queries should be audited. We show that, under a suitable query-overlap restriction, an auditing procedure can be efficiently worked out using flow-network computation.