Kiem-Phong Vo
AT&T Labs
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kiem-Phong Vo.
ACM Transactions on Software Engineering and Methodology | 1998
James J. Hunt; Kiem-Phong Vo; Walter F. Tichy
Delta algorithms compress data by encoding one file in terms of another. This type of compression is useful in a number of situations: strong multiple versions of data, displaying differences, merging changes, distributing updates, storing backups, transmitting video sequences, and others. This article studies the performance parameters of several delta algorithms, using a benchmark of over 1,300 pairs of files taken from two successive releases of GNU software. Results indicate that modern delta compression algorithms based on Ziv-Lempel techniques significantly outperform diff, a popular but older delta compressor, in terms of compression ratio. The modern compressors also correlate better with the actual difference between files without sacrificing performance.
data compression conference | 2004
Binh D. Vo; Kiem-Phong Vo
Large amounts of business data are kept in tables of fixed-length records. Columns in such a table may be functionally dependent on one another, resulting in low overall information content. This paper shows how to exploit this source of information redundancy to compress table data. Experiments with a wide variety of massive tables including telecom data and stock quotes show that this technique compresses table data well, up to 48:1 or even 100:1 reduction in some cases.
ACM Journal of Experimental Algorithms | 2003
Adam L. Buchsbaum; Glenn S. Fowler; Balachannder Kirishnamurthy; Kiem-Phong Vo; Jia Wang
Longest Prefix Matching (LPM) is the problem of finding which string from a given set is the longest prefix of another, given string. LPM is a core problem in many applications, including IP routing, network data clustering, and telephone network management. These applications typically require very fast matching of bounded strings, i.e., strings that are short and based on small alphabets. We note a simple correspondence between bounded strings and natural numbers that maps prefixes to nested intervals so that computing the longest prefix matching a string is equivalent to finding the shortest interval containing its corresponding integer value. We then present retries, a fast and compact data structure for LPM on general alphabets. Performance results show that retries often outperform previously published data structures for IP look-up. By extending LPM to general alphabets, retries admit new applications that could not exploit prior LPM solutions designed for IP look-ups.
Theoretical Computer Science | 2007
Binh D. Vo; Kiem-Phong Vo
Tables are two-dimensional arrays given in row-major order. Such data have unique features that could be exploited for effective compression. For example, tables often represent database files with rows as records so certain columns or fields in a table may have few distinct values. This means that simply transposing the data can make it compress better. Further, a large source of information redundancy in a table is the correlation among columns representing related types of data. This paper formalizes the notion of column dependency as a way to capture this information redundancy across columns and discusses how to automatically compute and use it to substantially improve table compression.
Software - Practice and Experience | 1997
Kiem-Phong Vo
Cdt is a container data type library that provides a uniform set of operations to manage dictionaries based on the common storage methods: list, stack, queue, ordered set/multiset, and unordered set/multiset. Both object description and storage method in a dictionary can be dynamically changed so that abstract operations can be exactly matched with run‐time requirements for operational flexibility and performance. A study comparing Cdt and other popular container packages shows that Cdt performs best in both computing time and space usage.
Software - Practice and Experience | 2000
Kiem-Phong Vo
Over the past few years, my colleagues and I have written a number of software libraries for fundamental computing tasks, including I/O, memory allocation, container data types and sorting. These libraries have proved to be good software building blocks, and are used widely by programmers around the world. This success is due in part to a library architecture that employs two main interface mechanisms: disciplines to define resource requirements; and methods to parameterize resource management. Libraries built this way are called discipline and method libraries. Copyright
international conference on software reuse | 1998
Kiem-Phong Vo
My colleagues and I have written and distributed a number of general purpose libraries covering a wide range of computing areas such as I/O, memory allocation, container data types, and sorting. Published studies showed that these libraries are more general, flexible and efficient than comparable packages as application construction tools. Our libraries are based on an architecture in which two main interfaces are made explicit: disciplines to define resource requirements, and methods to define resource management. This paper discusses the discipline and method library architecture and a resource-oriented analysis approach for analyzing and designing libraries based on this architecture.
data compression conference | 2007
Kiem-Phong Vo
Summary form only given. Conventional compression techniques exploit general redundancy features in data to compress them. For example, Huffman or Lempel-Ziv techniques compresses data by statistical modeling or string matching while the Burrows-Wheeler Transform simply sorts data by context to improve compressibility. On the other hand, data can often be compressed better by exploiting their specific features. For example, columns or fields in a database table tend to be sparse, but not rows. Techniques have been developed to either group related table columns or compute dependency among them to transform data and enhance compressibility. The Vcodex data transformation platform provides a framework to develop and use such data transforms. That is, it treats compression techniques as invertible data transforms that can be composed together for specific tasks. In this way, data transformation remains general and can include techniques for encryption and others.
Archive | 2007
Kiem-Phong Vo
Vcodex is a software platform for constructing data compressors. It introduces the notion of data transforms as software components to encapsulate data transformation and compression techniques. The platform provides a variety of compression transforms ranging from general purpose compressors such as Huffman or Lempel-Ziv to structure related ones such as reordering fields and columns in relational data tables. Tranform composition enables construction of compressors either general purpose or customized to data semantics. The software and data architecture of Vcodex will be presented. Examples and experimental results will be given showing how the approach helps to achieve compression performance far beyond traditional approaches.
world conference on www and internet | 2000
Yih-Farn Chen; Fred Douglis; Huale Huang; Kiem-Phong Vo