Tim Bell | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tim Bell is active.

Explore More

Publication

Featured researches published by Tim Bell.

IEEE Transactions on Information Theory | 1991

The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression

Ian H. Witten; Tim Bell

Approaches to the zero-frequency problem in adaptive text compression are discussed. This problem relates to the estimation of the likelihood of a novel event occurring. Although several methods have been used, their suitability has been on empirical evaluation rather than a well-founded model. The authors propose the application of a Poisson process model of novelty. Its ability to predict novel tokens is evaluated, and it consistently outperforms existing methods. It is applied to a practical statistical coding scheme, where a slight modification is required to avoid divergence. The result is a well-founded zero-frequency model that explains observed differences in the performance of existing methods, and offers a small improvement in the coding efficiency of text compression over the best method previously known. >

ACM Computing Surveys | 1989

Modeling for text compression

Tim Bell; Ian H. Witten; John G. Cleary

The best schemes for text compression use large models to help them predict which characters will come next. The actual next characters are coded with respect to the prediction, resulting in compression of information. Models are best formed adaptively, based on the text seen so far. This paper surveys successful strategies for adaptive modeling that are suitable for use in practical text compression systems. The strategies fall into three main classes: finite-context modeling, in which the last few characters are used to condition the probability distribution for the next one; finite-state modeling, in which the distribution is conditioned by the current state (and which subsumes finite-context modeling as an important special case); and dictionary modeling, in which strings of characters are replaced by pointers into an evolving dictionary. A comparison of different methods on the same sample texts is included, along with an analysis of future research directions.

data compression conference | 1997

A corpus for the evaluation of lossless compression algorithms

Ross Arnold; Tim Bell

A number of authors have used the Calgary corpus of texts to provide empirical results for lossless compression algorithms. This corpus was collected in 1987, although it was not published until 1990. The advances with compression algorithms have been achieving relatively small improvements in compression, measured using the Calgary corpus. There is a concern that algorithms are being fine-tuned to this corpus, and that small improvements measured in this way may not apply to other files. Furthermore, the corpus is almost ten years old, and over this period there have been changes in the kinds of files that are compressed, particularly with the development of the Internet, and the rapid growth of high-capacity secondary storage for personal computers. We explore the issues raised above, and develop a principled technique for collecting a corpus of test data for compression methods. A corpus, called the Canterbury corpus, is developed using this technique, and we report the performance of a collection of compression methods using the new corpus.

Archive | 2008

The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching

Donald A. Adjeroh; Tim Bell; Amar Mukherjee

The Burrows-Wheeler Transform is a text transformation scheme that has found applications in different aspects of the data explosion problem, from data compression to index structures and search. The BWT belongs to a new class of compression algorithms, distinguished by its ability to perform compression by sorted contexts. More recently, the BWT has also found various applications in addition to text data compression, such as in lossless and lossy image compression, tree-source identification, bioinformatics, machine translation, shape matching, and test data compression. This book will serve as a reference for seasoned professionals or researchers in the area, while providing a gentle introduction, making it accessible for senior undergraduate students or first year graduate students embarking upon research in compression, pattern matching, full text retrieval, compressed index structures, or other areas related to the BWT. Key Features Comprehensive resource for information related to different aspects of the Burrows-Wheeler Transform including: Gentle introduction to the BWT History of the development of the BWT Detailed theoretical analysis of algorithmic issues and performance limits Searching on BWT compressed data Hardware architectures for the BWT Explores non-traditional applications of the BWT in areas such as: Bioinformatics Joint source-channel coding Modern information retrieval Machine translation Test data compression for systems-on-chip Teaching materials ideal for classroom use on courses in: Data Compression and Source Coding Modern Information Retrieval Information Science Digital Libraries

Computers and The Humanities | 2001

The Challenge of Optical Music Recognition

David Bainbridge; Tim Bell

This article describes the challenges posed by optical musicrecognition – a topic in computer science that aims to convert scannedpages of music into an on-line format. First, the problem is described;then a generalised framework for software is presented that emphasises keystages that must be solved: staff line identification, musical objectlocation, musical feature classification, and musical semantics. Next,significant research projects in the area are reviewed, showing how eachfits the generalised framework. The article concludes by discussingperhaps the most open question in the field: how to compare the accuracy and success of rival systems, highlighting certain steps thathelp ease the task.

IEEE Transactions on Communications | 1986

Better OPM/L Text Compression

Tim Bell

An OPM/L data compression scheme suggested by Ziv and Lempel, LZ77, is applied to text compression. A slightly modified version suggested by Storer and Szymanski, LZSS, is found to achieve compression ratios as good as most existing schemes for a wide range of texts. LZSS decoding is very fast, and comparatively little memory is required for encoding and decoding. Although the time complexity of LZ77 and LZSS encoding is O(M) for a text of M characters, straightforward implementations are very slow. The time consuming step of these algorithms is a search for the longest string match. Here a binary search tree is used to find the longest string match, and experiments show that this results in a dramatic increase in encoding speed. The binary tree algorithm can be used to speed up other OPM/L schemes, and other applications where a longest string match is required. Although the LZSS scheme imposes a limit on the length of a match, the binary tree algorithm will work without any limit.

Journal of the Association for Information Science and Technology | 1993

Data compression in full-text retrieval systems

Tim Bell; Alistair Moffat; Craig G. Nevill-Manning; Ian H. Witten; Justin Zobel

When data compression is applied to full-text retrieval systems, intricate relationships emerge between the amount of compression, access speed, and computing resources required. We propose compression methods, and explore corresponding tradeoffs, for all components of static full-text systems such as text databases on CD-ROM. These components include lexical indexes, inverted files, bitmaps, signature files, and the main text itself. Results are reported on the application of the methods to several substantial full-text databases, and show that a large, unindexed text can be stored, along with indexes that facilitate fast searching, in less than half its original size—at some appreciable cost in primary memory requirements.

Proceedings of the IEEE | 1994

Textual image compression: two-stage lossy/lossless encoding of textual images

Ian H. Witten; Tim Bell; Hugh Emberson; Stuart J. Inglis; Alistair Moffat

A two-stage method for compressing bilevel images is described that is particularly effective for images containing repeated subimages, notably text. In the first stage, connected groups of pixels, corresponding approximately to individual characters, are extracted from the image. These are matched against an adaptively constructed library of patterns seen so far, and the resulting sequence of symbol identification numbers is coded and transmitted. From this information, along with the library itself and the offset from one mark to the next, an approximate image can be reconstructed. The result is a lossy method of compression that outperforms other schemes. The second stage employs the reconstructed image as an aid for encoding the original image using a statistical context-based compression technique. This yields a total bandwidth for exact transmission appreciably undercutting that required by other lossless binary image compression methods. Taken together, the lossy, and lossless methods provide an effective two-stage progressive transmission capability for textual images which has application for legal, medical, and historical purposes, and to archiving in general. >

computational systems bioinformatics | 2002

DNA sequence compression using the Burrows-Wheeler Transform

Donald A. Adjeroh; Yong Zhang; Amar Mukherjee; Matt Powell; Tim Bell

We investigate off-line dictionary oriented approaches to DNA sequence compression, based on the Burrows-Wheeler Transform (BWT). The preponderance of short repeating patterns is an important phenomenon in biological sequences. Here, we propose off-line methods to compress DNA sequences that exploit the different repetition structures inherent in such sequences. Repetition analysis is performed based on the relationship between the BWT and important pattern matching data structures, such as the suffix tree and suffix array. We discuss how the proposed approach can be incorporated in the BWT compression pipeline.

ieee international conference on cloud computing technology and science | 2010

Cloud Types and Services

Hai Jin; Shadi Ibrahim; Tim Bell; Wei Gao; Dachuan Huang; Song Wu

The increasing popularity of Internet services such as the Amazon Web Services, Google App Engine and Microsoft Azure have drawn a lot of attention to the Cloud Computing paradigm. Although the term “Cloud Computing” is new, the technology is an extension of the remarkable achievements of grid, virtualization, Web 2.0 and Service Oriented Architecture (SOA) technologies, and the convergence of these technologies. Moreover, interest in Cloud Computing has been motivated by many factors such as the prevalence of multi-core processors and the low cost of system hardware, as well as the increasing cost of the energy needed to operate them. As a result, Cloud Computing, in just three years, has risen to the top of the IT revolutionary technologies, and has been announced as the top technology to watch in the year 2010.

Explore More