Kiem-Phong Vo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kiem-Phong Vo is active.

Explore More

Publication

Featured researches published by Kiem-Phong Vo.

ACM Transactions on Software Engineering and Methodology | 1998

Delta algorithms: an empirical analysis

James J. Hunt; Kiem-Phong Vo; Walter F. Tichy

Delta algorithms compress data by encoding one file in terms of another. This type of compression is useful in a number of situations: strong multiple versions of data, displaying differences, merging changes, distributing updates, storing backups, transmitting video sequences, and others. This article studies the performance parameters of several delta algorithms, using a benchmark of over 1,300 pairs of files taken from two successive releases of GNU software. Results indicate that modern delta compression algorithms based on Ziv-Lempel techniques significantly outperform diff, a popular but older delta compressor, in terms of compression ratio. The modern compressors also correlate better with the actual difference between files without sacrificing performance.

data compression conference | 2004

Using column dependency to compress tables

Binh D. Vo; Kiem-Phong Vo

Large amounts of business data are kept in tables of fixed-length records. Columns in such a table may be functionally dependent on one another, resulting in low overall information content. This paper shows how to exploit this source of information redundancy to compress table data. Experiments with a wide variety of massive tables including telecom data and stock quotes show that this technique compresses table data well, up to 48:1 or even 100:1 reduction in some cases.

ACM Journal of Experimental Algorithms | 2003

Fast prefix matching of bounded strings

Adam L. Buchsbaum; Glenn S. Fowler; Balachannder Kirishnamurthy; Kiem-Phong Vo; Jia Wang

Longest Prefix Matching (LPM) is the problem of finding which string from a given set is the longest prefix of another, given string. LPM is a core problem in many applications, including IP routing, network data clustering, and telephone network management. These applications typically require very fast matching of bounded strings, i.e., strings that are short and based on small alphabets. We note a simple correspondence between bounded strings and natural numbers that maps prefixes to nested intervals so that computing the longest prefix matching a string is equivalent to finding the shortest interval containing its corresponding integer value. We then present retries, a fast and compact data structure for LPM on general alphabets. Performance results show that retries often outperform previously published data structures for IP look-up. By extending LPM to general alphabets, retries admit new applications that could not exploit prior LPM solutions designed for IP look-ups.

Theoretical Computer Science | 2007

Compressing table data with column dependency

Binh D. Vo; Kiem-Phong Vo

Tables are two-dimensional arrays given in row-major order. Such data have unique features that could be exploited for effective compression. For example, tables often represent database files with rows as records so certain columns or fields in a table may have few distinct values. This means that simply transposing the data can make it compress better. Further, a large source of information redundancy in a table is the correlation among columns representing related types of data. This paper formalizes the notion of column dependency as a way to capture this information redundancy across columns and discusses how to automatically compute and use it to substantially improve table compression.

Software - Practice and Experience | 1997

CDT: a container data type library

Kiem-Phong Vo

Cdt is a container data type library that provides a uniform set of operations to manage dictionaries based on the common storage methods: list, stack, queue, ordered set/multiset, and unordered set/multiset. Both object description and storage method in a dictionary can be dynamically changed so that abstract operations can be exactly matched with run‐time requirements for operational flexibility and performance. A study comparing Cdt and other popular container packages shows that Cdt performs best in both computing time and space usage.

Software - Practice and Experience | 2000

The discipline and method architecture for reusable libraries

Kiem-Phong Vo

Over the past few years, my colleagues and I have written a number of software libraries for fundamental computing tasks, including I/O, memory allocation, container data types and sorting. These libraries have proved to be good software building blocks, and are used widely by programmers around the world. This success is due in part to a library architecture that employs two main interface mechanisms: disciplines to define resource requirements; and methods to parameterize resource management. Libraries built this way are called discipline and method libraries. Copyright

international conference on software reuse | 1998

An architecture for reusable libraries

Kiem-Phong Vo

My colleagues and I have written and distributed a number of general purpose libraries covering a wide range of computing areas such as I/O, memory allocation, container data types, and sorting. Published studies showed that these libraries are more general, flexible and efficient than comparable packages as application construction tools. Our libraries are based on an architecture in which two main interfaces are made explicit: disciplines to define resource requirements, and methods to define resource management. This paper discusses the discipline and method library architecture and a resource-oriented analysis approach for analyzing and designing libraries based on this architecture.

data compression conference | 2007

Compression as Data Transformation

Kiem-Phong Vo

Summary form only given. Conventional compression techniques exploit general redundancy features in data to compress them. For example, Huffman or Lempel-Ziv techniques compresses data by statistical modeling or string matching while the Burrows-Wheeler Transform simply sorts data by context to improve compressibility. On the other hand, data can often be compressed better by exploiting their specific features. For example, columns or fields in a database table tend to be sparse, but not rows. Techniques have been developed to either group related table columns or compute dependency among them to transform data and enhance compressibility. The Vcodex data transformation platform provides a framework to develop and use such data transforms. That is, it treats compression techniques as invertible data transforms that can be composed together for specific tasks. In this way, data transformation remains general and can include techniques for encryption and others.

Archive | 2007

The Vcodex Platform for Data Compression

Kiem-Phong Vo

Vcodex is a software platform for constructing data compressors. It introduces the notion of data transforms as software components to encapsulate data transformation and compression techniques. The platform provides a variety of compression transforms ranging from general purpose compressors such as Huffman or Lempel-Ziv to structure related ones such as reordering fields and columns in relational data tables. Tranform composition enables construction of compressors either general purpose or customized to data semantics. The software and data architecture of Vcodex will be presented. Examples and experimental results will be given showing how the approach helps to achieve compression performance far beyond traditional approaches.

world conference on www and internet | 2000