Edward Omiecinski | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Edward Omiecinski is active.

Explore More

Publication

Featured researches published by Edward Omiecinski.

IEEE Transactions on Knowledge and Data Engineering | 2003

Alternative interest measures for mining associations in databases

Edward Omiecinski

Data mining is defined as the process of discovering significant and potentially useful patterns in large volumes of data. Discovering associations between items in a large database is one such data mining activity. In finding associations, support is used as an indicator as to whether an association is interesting. In this paper, we discuss three alternative interest measures for associations: any-confidence, all-confidence, and bond. We prove that the important downward closure property applies to both all-confidence and bond. We show that downward closure does not hold for any-confidence. We also prove that, if associations have a minimum all-confidence or minimum bond, then those associations will have a given lower bound on their minimum support and the rules produced from those associations will have a given lower bound on their minimum confidence as well. However, associations that have that minimum support (and likewise their rules that have minimum confidence) may not satisfy the minimum all-confidence or minimum bond constraint. We describe the algorithms that efficiently find all associations with a minimum all-confidence or minimum bond and present some experimental results.

international conference on data engineering | 1998

Mining for strong negative associations in a large database of customer transactions

Ashoka Savasere; Edward Omiecinski; Shamkant B. Navathe

Mining for association rules is considered an important data mining problem. Many different variations of this problem have been described in the literature. We introduce the problem of mining for negative associations. A naive approach to finding negative associations leads to a very large number of rules with low interest measures. We address this problem by combining previously discovered positive associations with domain knowledge to constrain the search space such that fewer but more interesting negative rules are mined. We describe an algorithm that efficiently finds all such negative associations and present the experimental results.

Proceedings IEEE Forum on Research and Technology Advances in Digital Libraries | 1999

Discovering association rules based on image content

Carlos Ordonez; Edward Omiecinski

Our focus for data mining in the paper is concerned with knowledge discovery in image databases. We present a data mining algorithm to find association rules in 2-dimensional color images. The algorithm has four major steps: feature extraction, object identification, auxiliary image creation and object mining. Our emphasis is on data mining of image content without the use of auxiliary domain knowledge. The purpose of our experiments is to explore the feasibility of this approach. A synthetic image set containing geometric shapes was generated to test our initial algorithm implementation. Our experimental results show that there is promise in image mining based on content. We compare these results against the rules obtained from manually identifying the shapes. We analyze the reasons for discrepancies. We also suggest directions for future work.

IEEE Transactions on Computers | 2002

Efficient data allocation over multiple channels at broadcast servers

Wai Gen Yee; Shamkant B. Navathe; Edward Omiecinski; Christopher Jermaine

Broadcast is a scalable way of disseminating data because broadcasting an item satisfies all outstanding client requests for it. However, because the transmission medium is shared, individual requests may have high response times. In this paper, we show how to minimize the average response time given multiple broadcast channels by optimally partitioning data among them. We also offer an approximation algorithm that is less complex than the optimal and show that its performance is near-optimal for a wide range of parameters. Finally, we briefly discuss the extensibility of our work with two simple, yet seldom researched extensions, namely, handling varying sized items and generating single channel schedules.

IEEE Transactions on Parallel and Distributed Systems | 1995

Inverted file partitioning schemes in multiple disk systems

Byeong-soo Jeong; Edward Omiecinski

Multiple-disk I/O systems (disk arrays) have been an attractive approach to meet high performance I/O demands in data intensive applications such as information retrieval systems. When we partition and distribute files across multiple disks to exploit the potential for I/O parallelism, a balanced I/O workload distribution becomes important for good performance. Naturally, the performance of a parallel information retrieval system using an inverted file structure is affected by the partitioning scheme of the inverted file. In this paper, we propose two different partitioning schemes for an inverted file system for a shared-everything multiprocessor machine with multiple disks. We study the performance of these schemes by simulation under a number of workloads where the term frequencies in the documents are varied, the term frequencies in the queries are varied, the number of disks are varied and the multiprogramming level is varied. >

international conference on data mining | 2001

Mining constrained association rules to predict heart disease

Carlos Ordonez; Edward Omiecinski; L. de Braal; Cesar A. Santana; Norberto F. Ezquerra; J.A. Taboada; D. Cooke; Elzbieta G. Krawczynska; Ernest V. Garcia

This work describes our experiences in discovering association rules in medical data to predict heart disease. We focus on two aspects of this work: mapping medical data to a transaction format suitable for mining association rules, and identifying useful constraints. Based on these aspects we introduce an improved algorithm to discover constrained association rules. We present an experimental section explaining several interesting discovered rules.

IEEE Transactions on Knowledge and Data Engineering | 2004

Efficient disk-based K-means clustering for relational databases

Carlos Ordonez; Edward Omiecinski

K-means is one of the most popular clustering algorithms. We introduce an efficient disk-based implementation of K-means. The proposed algorithm is designed to work inside a relational database management system. It can cluster large data sets having very high dimensionality. In general, it only requires three scans over the data set. It is optimized to perform heavy disk I/O and its memory requirements are low. Its parameters are easy to set. An extensive experimental section evaluates quality of results and performance. The proposed algorithm is compared against the Standard K-means algorithm as well as the Scalable K-means algorithm.

conference on information and knowledge management | 2002

FREM: fast and robust EM clustering for large data sets

Carlos Ordonez; Edward Omiecinski

Clustering is a fundamental Data Mining technique. This article presents an improved EM algorithm to cluster large data sets having high dimensionality, noise and zero variance problems. The algorithm incorporates improvements to increase the quality of solutions and speed. In general the algorithm can find a good clustering solution in 3 scans over the data set. Alternatively, it can be run until it converges. The algorithm has a few parameters that are easy to set and have defaults for most cases. The proposed algorithm is compared against the standard EM algorithm and the On-Line EM algorithm.

Lecture Notes in Computer Science | 2005

Efficiency and security trade-off in supporting range queries on encrypted databases

Jun Li; Edward Omiecinski

The database-as-a-service (DAS) model is a newly emerging computing paradigm, where the DBMS functions are outsourced. It is desirable to store data on database servers in encrypted form to reduce security and privacy risks since the server may not be fully trusted. But this usually implies that one has to sacrifice functionality and efficiency for security. Several approaches have been proposed in recent literature for efficiently supporting queries on encrypted databases. These approaches differ from each other in how the index of attribute values is created. Random one-to-one mapping and order-preserving are two examples. In this paper we will adapt a prefix-preserving encryption scheme to create the index. Certainly, all these approaches look for a convenient trade-off between efficiency and security. In this paper we will discuss the security issues and efficiency of these approaches for supporting range queries on encrypted numeric data.

extending database technology | 1992

Adaptive and Automated Index Selection in RDBMS

Martin R. Frank; Edward Omiecinski; Shamkant B. Navathe

We present a novel approach for a tool that assists the database administrator in designing an index configuration for a relational database system. A new methodology for collecting usage statistics at run time is developed which lets the optimizer estimate query execution costs for alternative index configurations. Defining the workload specification required by existing index design tools may be very complex for a large integrated database system. Our tool automatically derives the workload statistics. These statistics are then used to efficiently compute an index configuration. Execution of a prototype of the tool against a sample database demonstrates that the proposed index configuration is reasonably close to the optimum for test query sets.

Explore More