Is this you? Create Your Porfile

Dan A. Simovici

University of Massachusetts Boston

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dan A. Simovici is active.

Explore More

Publication

Featured researches published by Dan A. Simovici.

knowledge discovery and data mining | 2004

Interestingness of frequent itemsets using Bayesian networks as background knowledge

Szymon Jaroszewicz; Dan A. Simovici

The paper presents a method for pruning frequent itemsets based on background knowledge represented by a Bayesian network. The interestingness of an itemset is defined as the absolute difference between its support estimated from data and from the Bayesian network. Efficient algorithms are presented for finding interestingness of a collection of frequent itemsets, and for finding all attribute sets with a given minimum interestingness. Practical usefulness of the algorithms and their efficiency have been verified experimentally.

international conference on data mining | 2005

On feature selection through clustering

Richard Butterworth; Gregory Piatetsky-Shapiro; Dan A. Simovici

We study an algorithm for feature selection that clusters attributes using a special metric and then makes use of the dendrogram of the resulting cluster hierarchy to choose the most relevant attributes. The main interest of our technique resides in the improved understanding of the structure of the analyzed data and of the relative importance of the attributes for the selection process.

knowledge discovery and data mining | 2002

Pruning Redundant Association Rules Using Maximum Entropy Principle

Szymon Jaroszewicz; Dan A. Simovici

Data mining algorithms produce huge sets of rules, practically impossible to analyze manually. It is thus important to develop methods for removing redundant rules from those sets. We present a solution to the problem using the Maximum Entropy approach. The problem of efficiency of Maximum Entropy computations is addressed by using closed form solutions for the most frequent cases. Analytical and experimental evaluation of the proposed technique indicates that it efficiently produces small sets of interesting association rules.

european conference on principles of data mining and knowledge discovery | 2001

A General Measure of Rule Interestingness

Szymon Jaroszewicz; Dan A. Simovici

The paper presents a new general measure of rule interestingness. Many known measures such as chi-square, gini gain or entropy gain can be obtained from this measure by setting some numerical parameters, including the amount of trust we have in the estimation of the probability distribution of the data. Moreover, we show that there is a continuum of measures having chi-square, Gini gain and entropy gain as boundary cases. Therefore our measure generalizes both conditional and unconditional classical measures of interestingness. Properties and experimental evaluation of the new measure are also presented.

international conference on data mining | 2002

Generating an informative cover for association rules

Laurentiu Cristofor; Dan A. Simovici

Mining association rules may generate a large numbers of rules making the results hard to analyze manually. Pasquier et al. have discussed the generation of Guigues-Duquenne-Luxenburger basis (GD-L basis). Using a similar approach, we introduce a new rule of inference and define the notion of association rules cover as a minimal set of rules that are non-redundant with respect to this new rule of inference. Our experimental results (obtained using both synthetic and real data sets) show that our covers are smaller than the GD-L basis and they are computed in time that is comparable to the classic Apriori algorithm for generating rules.

IEEE Transactions on Information Theory | 2002

An axiomatization of partition entropy

Dan A. Simovici; Szymon Jaroszewicz

The aim of this article is to present an axiomatization of a generalization of Shannons entropy starting from partitions of finite sets. The proposed axiomatization defines a family of entropies depending on a real positive parameter that contains as a special case the Havrda-Charvat (1967) entropy, and thus, provides axiomatizations for the Shannon entropy, the Gini index, and for other types of entropy used in classification and data mining.

Journal of Biomedical Informatics | 2004

A greedy algorithm for supervised discretization

Richard Butterworth; Dan A. Simovici; Gustavo S. Santos; Lucila Ohno-Machado

We present a greedy algorithm for supervised discretization using a metric defined on the space of partitions of a set of objects. This proposed technique is useful for preparing the data for classifiers that require nominal attributes. Experimental work on decision trees and naïve Bayes classifiers confirm the efficacy of the proposed algorithm.

Mathematical Structures in Computer Science | 1994

A categorical approach to database semantics

Kenneth Baclawski; Dan A. Simovici; William White

We propose a formalization of standard database management systems using topos theory. In this treatment, all constructions take place within an ambient topos, which thereby serves as the ‘universe of discourse’. A database schema is defined using objects and morphisms in the ambient topos. A database state for a given schema involves not only the ambient topos but also an internal category within the topos. This approach neatly separates the schema from the state data by placing them in distinct category structures. It is shown that database states can either be regarded syntactically as objects in an external topos or semantically as morphisms in an internal slice category. A number of operations are introduced that correspond to operations used in standard database systems. Extraction selects some of the tables, attributes and domains of a database state. The squeeze operation performs an ‘elimination of duplicates’, which can be combined with extraction to obtain an operation called ‘projection’ in standard relational database systems. A join operation is defined, which generalizes the relational join operation and can be used for the cartesian product and selection operations. Finally, ‘boolean’ operations of intersection, union and difference are introduced and related to the other operations.

ACM Transactions on Multimedia Computing, Communications, and Applications | 2007

Detecting eye fixations by projection clustering

Thierry Urruty; Stanislas Lew; Nacim Ihadaddene; Dan A. Simovici

The identification of the components of eye movements (fixations and saccades) is an essential part in the analysis of visual behavior because these types of movements provide the basic elements used by further investigations of human vision. However, many of the algorithms that detect fixations present some problems (consistency, robustness, many input parameters). In this article we present a new eye fixation identification technique that is based on clustering of eye positions using projections and projection aggregation.

Data Mining and Knowledge Discovery | 2009

Scalable pattern mining with Bayesian networks as background knowledge

Szymon Jaroszewicz; Tobias Scheffer; Dan A. Simovici

We study a discovery framework in which background knowledge on variables and their relations within a discourse area is available in the form of a graphical model. Starting from an initial, hand-crafted or possibly empty graphical model, the network evolves in an interactive process of discovery. We focus on the central step of this process: given a graphical model and a database, we address the problem of finding the most interesting attribute sets. We formalize the concept of interestingness of attribute sets as the divergence between their behavior as observed in the data, and the behavior that can be explained given the current model. We derive an exact algorithm that finds all attribute sets whose interestingness exceeds a given threshold. We then consider the case of a very large network that renders exact inference unfeasible, and a very large database or data stream. We devise an algorithm that efficiently finds the most interesting attribute sets with prescribed approximation bound and confidence probability, even for very large networks and infinite streams. We study the scalability of the methods in controlled experiments; a case-study sheds light on the practical usefulness of the approach.

Explore More