Vo Ngoc Anh | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vo Ngoc Anh is active.

Explore More

Publication

Featured researches published by Vo Ngoc Anh.

Information Retrieval | 2005

Inverted Index Compression Using Word-Aligned Binary Codes

Vo Ngoc Anh; Alistair Moffat

We examine index representation techniques for document-based inverted files, and present a mechanism for compressing them using word-aligned binary codes. The new approach allows extremely fast decoding of inverted lists during query processing, while providing compression rates better than other high-throughput representations. Results are given for several large text collections in support of these claims, both for compression effectiveness and query efficiency.

international acm sigir conference on research and development in information retrieval | 2001

Vector-space ranking with effective early termination

Vo Ngoc Anh; Owen de Kretser; Alistair Moffat

Considerable research effort has been invested in improving the effectiveness of information retrieval systems. Techniques such as relevance feedback, thesaural expansion, and pivoting all provide better quality responses to queries when tested in standard evaluation frameworks. But such enhancements can add to the cost of evaluating queries. In this paper we consider the pragmatic issue of how to improve the cost-effectiveness of searching. We describe a new inverted file structure using quantized weights that provides superior retrieval effectiveness compared to conventional inverted file structures when early termination heuristics are employed. That is, we are able to reach similar effectiveness levels with less computational cost, and so provide a better cost/performance compromise than previous inverted file organisations.

international acm sigir conference on research and development in information retrieval | 2005

Simplified similarity scoring using term ranks

Vo Ngoc Anh; Alistair Moffat

We propose a method for document ranking that combines a simple document-centric view of text, and fast evaluation strategies that have been developed in connection with the vector space model. The new method defines the importance of a term within a document qualitatively rather than quantitatively, and in doing so reduces the need for tuning parameters. In addition, the method supports very fast query processing, with most of the computation carried out on small integers, and dynamic pruning an effective option. Experiments on a wide range of TREC data show that the new method provides retrieval effectiveness as good as or better than the Okapi BM25 formulation, and variants of language models.

international acm sigir conference on research and development in information retrieval | 2002

Impact transformation: effective and efficient web retrieval

Vo Ngoc Anh; Alistair Moffat

We extend the applicability of impact transformation, which is a technique for adjusting the term weights assigned to documents so as to boost the effectiveness of retrieval when short queries are applied to large document collections. In conjunction with techniques called quantization and thresholding, impact transformation allows improved query execution rates compared to traditional vector-space similarity computations, as the number of arithmetic operations can be reduced. The transformation also facilitates a new dynamic query pruning heuristic. We give results based upon the trec web data that show the combination of these various techniques to yield highly competitive retrieval, in terms of both effectiveness and efficiency, for both short and long queries.

IEEE Transactions on Knowledge and Data Engineering | 2006

Improved word-aligned binary compression for text indexing

Vo Ngoc Anh; Alistair Moffat

We present an improved compression mechanism for handling the compressed inverted indexes used in text retrieval systems, extending the word-aligned binary coding carry method. Experiments using two typical document collections show that the new method obtains superior compression to previous static codes, without penalty in terms of execution speed

Bioinformatics | 2012

Transformations for the compression of FASTQ quality scores of next-generation sequencing data

Raymond Wan; Vo Ngoc Anh; Kiyoshi Asai

MOTIVATION The growth of next-generation sequencing means that more effective and efficient archiving methods are needed to store the generated data for public dissemination and in anticipation of more mature analytical methods later. This article examines methods for compressing the quality score component of the data to partly address this problem. RESULTS We compare several compression policies for quality scores, in terms of both compression effectiveness and overall efficiency. The policies employ lossy and lossless transformations with one of several coding schemes. Experiments show that both lossy and lossless transformations are useful, and that simple coding methods, which consume less computing resources, are highly competitive, especially when random access to reads is needed. AVAILABILITY AND IMPLEMENTATION Our C++ implementation, released under the Lesser General Public License, is available for download at http://www.cb.k.u-tokyo.ac.jp/asailab/members/rwan. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

string processing and information retrieval | 2006

Structured index organizations for high-throughput text querying

Vo Ngoc Anh; Alistair Moffat

Inverted indexes are the preferred mechanism for supporting content-based queries in text retrieval systems, with the various data items usually stored compressed in some way. But different query modalities require that different information be held in the index. For example, phrase querying requires that word offsets be held as well as document numbers. In this study we describe an inverted index organization that provides efficient support for all of conjunctive Boolean queries, ranked queries, and phrase queries. Experimental results on a 426 GB document collection show that the methods we describe provide fast evaluation of all three querying modes.

australasian database conference | 2002

Improved retrieval effectiveness through impact transformation

Vo Ngoc Anh; Alistair Moffat

Users of web search engines are notoriously parsimonious in their use of search terms, and search effectiveness has tended to be relatively poor on the resulting short queries, especially when compared against the good performance attained by recent systems when working with long TREC-like queries. In this paper we examine the reasons why short queries are hard to deal with, and propose a modification to the vector-space paradigm, which we call impact transformation. Experiments with a collection of web data and web queries shows that impact transformation significantly boosts retrieval effectiveness. Moreover, the use of quantised values in the transformation allows extremely fast query processing.

data compression conference | 2005

Binary codes for non-uniform sources

Alistair Moffat; Vo Ngoc Anh

In many applications of compression, decoding speed is at least as important as compression effectiveness. For example, the large inverted indexes associated with text retrieval mechanisms are best stored compressed, but a working system must also process queries at high speed. Here we present two coding methods that make use of fixed binary representations. They have all of the consequent benefits in terms of decoding performance, but are also sensitive to localized variations in the source data, and in practice give excellent compression. The methods are validated by applying them to various test data, including the index of an 18 GB document collection.

conference on information and knowledge management | 2006

Pruning strategies for mixed-mode querying

Vo Ngoc Anh; Alistair Moffat

Web information retrieval systems face a range of unique challenges, not the least of which is the sheer scale of the data that must be handled. Also specific to web retrieval is that queries may be a mix of Boolean and ranked features, and documents may have static score components that must also be factored into the ranking process. In this paper we consider a range of query semantics used in web retrieval systems, and show that impact-sorted indexes provide support for dynamic pruning mechanisms and in doing so allow fast document-at-a-time resolution of typical mixed-mode queries, even on relatively large volumes of data. Our techniques also extend to more complex query semantics, including the use of phrase, proximity, and structural constraints.

Explore More