Le Gruenwald | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Le Gruenwald is active.

Explore More

Publication

Featured researches published by Le Gruenwald.

Sigkdd Explorations | 1999

A survey of data mining and knowledge discovery software tools

Michael Goebel; Le Gruenwald

Knowledge discovery in databases is a rapidly growing field, whose development is driven by strong research interests as well as urgent practical, social, and economical needs. While the last few years knowledge discovery tools have been used mainly in research environments, sophisticated software products are now rapidly emerging. In this paper, we provide an overview of common knowledge discovery tasks and approaches to solve these tasks. We propose a feature classification scheme that can be used to study knowledge and data mining software. This scheme is based on the softwares general characteristics, database connectivity, and data mining characteristics. We then apply our feature classification scheme to investigate 43 software products, which are either research prototypes or commercially available. Finally, we specify features that we consider important for knowledge discovery software to possess in order to accommodate its users effectively, as well as issues that are either not addressed or insufficiently solved yet.

international conference on management of data | 2006

Research issues in data stream association rule mining

Nan Jiang; Le Gruenwald

There exist emerging applications of data streams that require association rule mining, such as network traffic monitoring and web click streams analysis. Different from data in traditional static databases, data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data. This paper discusses those issues and how they are addressed in the existing literature.

very large data bases | 2008

A survey of data replication techniques for mobile ad hoc network databases

Prasanna Padmanabhan; Le Gruenwald; Anita Vallur; Mohammed Atiquzzaman

A mobile ad hoc network (MANET) is a network that allows mobile servers and clients to communicate in the absence of a fixed infrastructure. MANET is a fast growing area of research as it finds use in a variety of applications. In order to facilitate efficient data access and update, databases are deployed on MANETs. These databases that operate on MANETs are referred to as MANET databases. Since data availability in MANETs is affected by the mobility and power constraints of the servers and clients, data in MANETs are replicated. A number of data replication techniques have been proposed for MANET databases. This paper identifies issues involved in MANET data replication and attempts to classify existing MANET data replication techniques based on the issues they address. The attributes of the replication techniques are also tabulated to facilitate a feature comparison of the existing MANET data replication works. Parameters and performance metrics are also presented to measure the performance of MANET replication techniques. In addition, this paper also proposes criteria for selecting appropriate data replication techniques for various application requirements. Finally, the paper concludes with a discussion on future research directions.

international conference on management of data | 2003

Research issues for data communication in mobile ad-hoc network database systems

Leslie D. Fife; Le Gruenwald

Mobile Ad-hoc Networks (MANET) is an emerging area of research. Most current work is centered on routing issues. This paper discusses the issues associated with data communication with MANET database systems. While data push and data pull methods have been previously addressed in mobile networks, the proposed methods do not handle the unique requirements associated with MANET. Unlike traditional mobile networks, all nodes within the MANET are mobile and battery powered. Existing wireless algorithms and protocols are insufficient primarily because they do not consider the mobility and power requirements of both clients and servers. This paper will present some of the critical tasks facing this research.

International Journal of Production Research | 2004

Real-time due-date promising by build-to-order environments

Scott A. Moses; Hank Grant; Le Gruenwald; S. Pulat

A vast amount of literature exists on scheduling to meet due dates, but very little work considers how to set these due dates before scheduling the orders. A method is described for real-time promising of order due dates that is applicable to discrete build-to-order environments facing dynamic order arrivals. When computing a due date, the method considers: (1) dynamic time-phased availability of resources required for each operation of the order, (2) individual order-specific characteristics and (3) existing commitments to orders that arrived previously. Performance of the method surpasses that of due-date assignment methods previously examined in the literature and also those commonly used in practice. The median and standard deviation of absolute flow-time estimation error and of absolute lateness are chosen as the primary performance criteria because they capture both positive and negative error in flow-time estimation of each individual order. Computational results from large-scale simulation studies of realistic systems with 20 resources and up to 100 000 orders also indicate the method is highly scalable.

Mobile Networks and Applications | 2000

A pre-serialization transaction management technique for mobile multidatabases

Ravi A. Dirckze; Le Gruenwald

Rapid advances in hardware and wireless communication technology have made the concept of mobile computing a reality. Thus, evolving database technology needs to address the requirements of the future mobile user. The frequent disconnection and migration of the mobile user violate underlying presumptions about connectivity that exist in wired database systems and introduce new issues that affect transaction management. In this paper, we present the Pre‐Serialization (PS) transaction management technique for the mobile multidatabase environment. This technique addresses disconnection and migration and enforces a range of atomicity and isolation criteria. We also develop an analytical model to compare the performance of the PS technique to that of the Kangaroo model.

international conference on data engineering | 2015

Large-scale spatial join query processing in Cloud

Simin You; Jianting Zhang; Le Gruenwald

The rapidly increasing amount of location data available in many applications has made it desirable to process their large-scale spatial queries in Cloud for performance and scalability. We report our designs and implementations of two prototype systems that are ready for Cloud deployments: SpatialSpark based on Apache Spark and ISP-MC based on Cloudera Impala. Both systems support indexed spatial joins based on point-in-polygon test and point-to-polyline distance computation. Experiments on the pickup locations of ~170 million taxi trips in New York City and ~10 million global species occurrences records have demonstrated both efficiency and scalability using Amazon EC2 clusters.

international conference on data mining | 2007

Using Data Mining to Estimate Missing Sensor Data

Le Gruenwald; Hamed Chok; Mazen Aboukhamis

Estimating missing sensor values is an inherent problem in sensor network applications; however, existing data estimation approaches do not apply well to the context of datastreams, a major characteristic of sensornet applications. Additionally, they fail to account for relationships among sensors and simultaneously, incorporate the time factor making the estimation process computationally aware of the relative relevance of each data round in the datastream. To address this gap, we propose a data estimation technique, FARM, which uses association rule mining to discover intrinsic relationships among sensors and incorporate them into the data estimation while taking data freshness into consideration. FARM was tested with data from two real sensornet applications, namely climate sensing and traffic monitoring. Simulation shows that in terms of estimation accuracy, FARM outperformed existing techniques costing only marginally more space and time overheads while scaling well with the network size, thus assuring quality of service for real-time applications.

international conference on management of data | 2005

Research issues in automatic database clustering

Sylvain Guinepain; Le Gruenwald

While a lot of work has been published on clustering of data on storage medium, little has been done about automating this process. This is an important area because with data proliferation, human attention has become a precious and expensive resource. Our goal is to develop an automatic and dynamic database clustering technique that will dynamically re-cluster a database with little intervention of a database administrator (DBA) and maintain an acceptable query response time at all times. In this paper we describe the issues that need to be solved when developing such a technique.

data and knowledge engineering | 2005

Microarray gene expression data association rules mining based on BSC-tree and FIS-tree

Xiang-Rong Jiang; Le Gruenwald

In this paper we propose to use association rules to mine the association relationships among different genes under the same experimental conditions. These kinds of relations may also exist across many different experiments with various experimental conditions. In this paper, a new approach, called FIS-tree mining, is proposed for mining the microarray data. Our approach uses two new data structures, BSC-tree and FIS-tree, and a data partition format for gene expression level data. Based on these two new data structures it is possible to mine the association rules efficiently and quickly from the gene expression database. Our algorithm was tested using the two real-life gene expression databases available at Stanford University and Harvard Medical School and was shown to perform better than the two existing algorithms, Apriori and FP-Growth.

Explore More