Jussi Myllymaki
University of Wisconsin-Madison
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jussi Myllymaki.
measurement and modeling of computer systems | 1995
Jussi Myllymaki; Miron Livny
Today large amounts of data are stored on tertiary storage media such as magnetic tapes and optical disks. DBMSs typically operate only on magnetic disks since they know how to maneuver disks and how to optimize accesses on them. Tertiary devices present a problem for DBMSs since these devices have dismountable media and have very different operational characteristics compared to magnetic disks. For instance, most tape drives offer very high capacity at low cost but are accessed sequentially, involve lengthy latencies, and deliver lower bandwidth. Typically, the scope of a DBMSs query optimizer does not include tertiary devices, and the DBMS might not even know how to control and operate upon tertiary-resident data. In a three-level hierarchy of storage devices (main memory, disk, tape), the typical solution is to elevate tape-resident data to disk devices, thus bringing such data into the DBMS control, and then to perform the required operations on disk. This requires additional space on disk and may not give the lowest response time possible. With this challenge in mind, we studied the trade-offs between memory and disk requirements and the execution time of a join with the help of two well-known join methods. The conventional, disk-based Nested Block Join and Hybrid Hash Join were modified to operate directly on tapes. An experimental implementation of the modified algorithms gave us more insight into how the algorithms perform in practice. Our performance analysis shows that a DBMS desiring to operate on tertiary storage will benefit from special algorithms that operate directly on tape-resident data and take into account and exploit the mismatch in disk and tape characteristics.
parallel computing | 1997
Karen L. Karavanic; Jussi Myllymaki; Miron Livny; Barton P. Miller
Abstract Performance tuning a parallel application involves integrating performance data from many components of the system, including the message passing library, performance monitoring tool, resource manager, operating system, and the application itself. The current practice of visualizing these data streams using a separate, customized tool for each source is inconvenient from a usability perspective, and there is no easy way to visualize the data in an integrated fashion. We demonstrate a solution to this problem using Devise, a generic visualization tool which is designed to allow an arbitrary number of different but related data streams to be integrated and explored visually in a flexible manner. We display data emanating from a variety of sources side by side in three case studies. First we interface the Paradyn parallel performance tool and Devise, using two simple data export modules and Paradyns simple visualization interface. We show several Devise/Paradyn visualizations which are useful for performance tuning parallel codes, and which incorporate data from Unix utilities and application output. Next we describe the visualization of trace data from a parallel application running in a Condor cluster of workstations. Finally we demonstrate the utility of Devise visualizations in a study of Condor cluster activity.
international conference on data engineering | 1997
Jussi Myllymaki; Miron Livny
Despite the steady decrease in secondary storage prices, the data storage requirements of many organizations cannot be met economically using secondary storage alone. Tertiary storage offers a lower-cost alternative but is viewed as a second-class citizen in many systems. For instance, the typical solution in bringing tertiary-resident data under the control of a DBMS is to use operating system facilities to copy the data to secondary storage, and then to perform query optimization and execution as if the data had been in secondary storage all along. This approach fails to recognize the opportunities for saving execution time and storage space if the data were accessed directly on tertiary devices and in parallel with other I/Os. We explore how to join two DBMS relations stored on magnetic tapes. Both relations are assumed to be larger than available disk space. We show how Grace Hash Join can be modified to handle a range of tape relation sizes. The modified algorithms access data directly on tapes and exploit parallelism between disk and tape I/Os. We also provide performance results of an experimental implementation of the algorithms.
Performance Evaluation | 1996
Jussi Myllymaki; Miron Livny
Abstract Tertiary storage is becoming increasingly important for many organizations involved in large-scale data analysis and data mining activities. Yet database management systems (DBMS) and other data-intensive systems do not incorporate tertiary storage as a first-class citizen in the storage hierarchy. For instance, the typical solution for bringing tertiary-resident data under the control of a DBMS is to use operating system facilities to copy the data to secondary storage, and then to perform query optimization and execution as if the data had been in secondary storage all along. This approach fails to recognize the opportunities for saving execution time and storage space if the data were accessed on tertiary devices directly and in parallel with other I/Os. In this paper we examine issues in accessing secondary and tertiary storage in parallel and suggest buffering mechanisms for increasing the throughput of applications with concurrent, intensive I/O requirements. We first identify several factors that determine the parallel I/O performance of secondary and tertiary storage devices. We discuss the performance characteristics of magnetic disks and magnetic tapes when used alone and when used concurrently, sharing the same I/O bus. We then describe alternative buffering schemes for parallel I/O and analyze their efficiency via an experimental implementation.
international conference on data engineering | 1996
Daniel Alexander Ford; Jussi Myllymaki
We present the design of a log-structured tertiary storage (LTS) system. The advantage of this approach is that it allows the system to hide the details of jukebox robotics and media characteristics behind a uniform, random-access, block-oriented interface. It also allows the system to avoid media mount operations for writes, giving a write performance similar to that of secondary storage.
international conference on management of data | 1997
Miron Livny; Raghu Ramakrishnan; Kevin S. Beyer; Guangshun Chen; Donko Donjerkovic; Shilpa Lawande; Jussi Myllymaki; Kent Wenger
DEVise is a data exploration system that allows users to easily develop, browse, and share visual presentations of large tabular datasets (possibly containing or referencing multimedia objects) from several sources. The DEVise framework, implemented in a tool that has been already successfully applied to a variety of real applications by a number of user groups, makes several contributions. In particular, it combines support for extended relational queries with powerful data visualization features. Datasets much larger than available main memory can be handled—DEVise is currently being used to visualize datasets well in excess of 100MB—and data can be interactively examined at several levels of detail: all the way from meta-data summarizing the entire dataset, to large subsets of the actual data, to individual data records. Combining querying (in general, data processing) with visualizations gives us a very versatile tool, and presents several novel challenges.nOur emphasis is on developing an intuitive yet powerful set of querying and visualization primitives that can be easily combined to develop a rich set of visual presentations that integrate data from a wide range of application domains. In this demo, we will present a number of examples of the use of the DEVise tool for visualizing and interactively exploring very large datasets, and report on our experience in applying it to several real applications.
electronic imaging | 1996
Miron Livny; Raghu Ramakrishnan; Jussi Myllymaki
DEVise is a data visualization and exploration system capable of handling large data sets using off-the-shelf hardware with minimal memory requirements. Data can be large in volume, complex in structure (multi-dimensional and/or hierarchical), and may be imported from different sources such as database servers, external programs, and World Wide Web resources. Commercial and scientific databases can also be linked to the DEVise to allow the user to visualize and analyze related information from heterogeneous sources. Associations between data sources are developed interactively as the user gains more knowledge of the data being explored. To assist in handling large data sets, DEVise allows a user to logically split the data into more manageable units at different levels. The user selects a data source, a data stream within a data source (e.g. a time series), attributes of a stream, and a mapping of attributes to graphical objects. At each step, the selections made by the user reduce the data volume. DEVise takes advantage of this form of data compression to optimize its caching strategies and to minimize the accesses needed to fetch data from tertiary storage, for example. DEVise supports users with different expertise levels by automating most tasks performed by a novice user and by also providing a programming interface that allows new data sources to be defined, new graphical objects to be used, and custom storage policies to be employed.
international conference on management of data | 1997
Miron Livny; Raghu Ramakrishnan; Kevin S. Beyer; Guangshun Chen; Donko Donjerkovic; Shilpa Lawande; Jussi Myllymaki; Kent Wenger
international conference on management of data | 1997
Miron Livny; Raghu Ramakrishnan; Kevin S. Beyer; Guangshun Chen; Donko Donjerkovic; Shilpa Lawande; Jussi Myllymaki; R. Kent Wenger
Archive | 1997
Karen L. Karavanic; Jussi Myllymaki; Miron Livny; Barton P. Miller