Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jeffrey Scott Vitter is active.

Publication


Featured researches published by Jeffrey Scott Vitter.


ACM Transactions on Mathematical Software | 1985

Random sampling with a reservoir

Jeffrey Scott Vitter

We introduce fast algorithms for selecting a random sample of <italic>n</italic> records without replacement from a pool of <italic>N</italic> records, where the value of <italic>N</italic> is unknown beforehand. The main result of the paper is the design and analysis of Algorithm Z; it does the sampling in one pass using constant space and in <italic>O</italic>(<italic>n</italic>(1 + log(<italic>N/n</italic>))) expected time, which is optimum, up to a constant factor. Several optimizations are studied that collectively improve the speed of the naive version of the algorithm by an order of magnitude. We give an efficient Pascal-like implementation that incorporates these modifications and that is suitable for general use. Theoretical and empirical results indicate that Algorithm Z outperforms current methods by a significant margin.


ACM Computing Surveys | 2001

External memory algorithms and data structures: dealing with massive data

Jeffrey Scott Vitter

Data sets in large applications are often too massive to fit completely inside the computers internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this article we survey the state of the art in the design and analysis of external memory (or EM) algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs. We consider a variety of EM paradigms for solving batched and online problems efficiently in external memory. For the batched problem of sorting and related problems such as permuting and fast Fourier transform, the key paradigms include distribution and merging. The paradigm of disk striping offers an elegant way to use multiple disks in parallel. For sorting, however, disk striping can be nonoptimal with respect to I/O, so to gain further improvements we discuss distribution and merging techniques for using the disks independently. We also consider useful techniques for batched EM problems involving matrices (such as matrix multiplication and transposition), geometric data (such as finding intersections and constructing convex hulls), and graphs (such as list ranking, connected components, topological sorting, and shortest paths). In the online domain, canonical EM applications include dictionary lookup and range searching. The two important classes of indexed data structures are based upon extendible hashing and B-trees. The paradigms of filtering and bootstrapping provide a convenient means in online data structures to make effective use of the data accessed from disk. We also reexamine some of the above EM problems in slightly different settings, such as when the data items are moving, when the data items are variable-length (e.g., text strings), or when the allocated amount of internal memory can change dynamically. Programming tools and environments are available for simplifying the EM programming task. During the course of the survey, we report on some experiments in the domain of spatial databases using the TPIE system (transparent parallel I/O programming environment). The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.


SIAM Journal on Computing | 2005

Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching

Roberto Grossi; Jeffrey Scott Vitter

The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for space-efficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text


international conference on management of data | 1998

Wavelet-based histograms for selectivity estimation

Yossi Matias; Jeffrey Scott Vitter; Min Wang

T


international conference on management of data | 1999

Approximate computation of multidimensional aggregates of sparse data using wavelets

Jeffrey Scott Vitter; Min Wang

consisting of


Journal of the ACM | 1987

Design and analysis of dynamic Huffman codes

Jeffrey Scott Vitter

n


very large data bases | 2004

Efficient indexing methods for probabilistic threshold queries over uncertain data

Reynold Cheng; Yuni Xia; Sunil Prabhakar; Rahul Shah; Jeffrey Scott Vitter

symbols drawn from a fixed alphabet


conference on information and knowledge management | 1998

Data cube approximation and histograms via wavelets

Jeffrey Scott Vitter; Min Wang; Balakrishna R. Iyer

\Sigma


international conference on management of data | 1993

Practical prefetching via data compression

Kenneth M. Curewitz; P. Krishnan; Jeffrey Scott Vitter

. The text


Proceedings of the IEEE | 1994

Arithmetic coding for data compression

Paul G. Howard; Jeffrey Scott Vitter

T

Collaboration


Dive into the Jeffrey Scott Vitter's collaboration.

Top Co-Authors

Avatar

Rahul Shah

Louisiana State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Wing-Kai Hon

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sharma V. Thankachan

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge