Henk Ernst Blok | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Henk Ernst Blok is active.

Explore More

Publication

Featured researches published by Henk Ernst Blok.

Information Retrieval | 2005

TIJAH: Embracing IR Methods in XML Databases

Vojkan Mihajlovic; Johan A. List; Vojkan Mihajlovi; Georgina Ramirez; Arjen P. de Vries; Djoerd Hiemstra; Henk Ernst Blok

This paper discusses our participation in INEX (the Initiative for the Evaluation of XML Retrieval) using the TIJAH XML-IR system. TIJAH’s system design follows a ‘standard’ layered database architecture, carefully separating the conceptual, logical and physical levels. At the conceptual level, we classify the INEX XPath-based query expressions into three different query patterns. For each pattern, we present its mapping into a query execution strategy. The logical layer exploits score region algebra (SRA) as the basis for query processing. We discuss the region operators used to select and manipulate XML document components. The logical algebra expressions are mapped into efficient relational algebra expressions over a physical representation of the XML document collection using the ‘pre-post numbering scheme’. The paper concludes with an analysis of experiments performed with the INEX test collection.

conference on information and knowledge management | 2005

Score region algebra: building a transparent XML-R database

Vojkan Mihajlovic; Henk Ernst Blok; Djoerd Hiemstra; Peter M.G. Apers

A unified database framework that will enable better comprehension of ranked XML retrieval is still a challenge in the XML database field. We propose a logical algebra, named score region algebra, that enables transparent specification of information retrieval (IR) models for XML databases. The transparency is achieved by a possibility to instantiate various retrieval models, using abstract score functions within algebra operators, while logical query plan and operator definitions remain unchanged. Our algebra operators model three important aspects of XML retrieval: element relevance score computation, element score propagation, and element score combination. To illustrate the usefulness of our algebra we instantiate four different, well known IR scoring models, and combine them with different score propagation and combination functions. We implemented the algebra operators in a prototype system on top of a low-level database kernel. The evaluation of the system is performed on a collection of IEEE articles in XML format provided by INEX. We argue that state of the art XML IR models can be transparently implemented using our score region algebra framework on top of any low-level physical database engine or existing RDBMS, allowing a more systematic investigation of retrieval model behavior.

INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval | 2004

TIJAH at INEX 2004 modeling phrases and relevance feedback

Vojkan Mihajlovic; Georgina Ramirez; Arjen P. de Vries; Djoerd Hiemstra; Henk Ernst Blok

This paper discusses our participation in INEX using the TIJAH XML-IR system. We have enriched the TIJAH system, which follows a standard layered database architecture, with several new features. An extensible conceptual level processing unit has been added to the system. The algebra on the logical level and the implementation on the physical level have been extended to support phrase search and structural relevance feedback. The conceptual processing unit is capable of rewriting NEXI content-only and content-and-structure queries into the internal form, based on the retrieval model parameter specification, that is either predefined or based on relevance feedback. Relevance feedback parameters are produced based on the data fusion of result element score values and sizes, and relevance assessments. The introduction of new operators supporting phrase search in score region algebra on the logical level is discussed in the paper, as well as their implementation on the physical level using the pre-post numbering scheme. The framework for structural relevance feedback is also explained in the paper. We conclude with a preliminary analysis of the system performance based on INEX 2004 runs.

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval | 2005

TIJAH scratches INEX 2005: vague element selection, image search, overlap, and relevance feedback

Vojkan Mihajlovic; Georgina Ramirez; Thijs Westerveld; Djoerd Hiemstra; Henk Ernst Blok; Arjen P. de Vries

Retrieving information from heterogeneous data sources in a flexible manner and within a single (database) framework is still a challenge. In this paper we present several extensions of our prototype database system TIJAH developed for structured retrieval. The extensions are aimed at modeling vague selection of XML elements and image retrieval. All three levels (conceptual, logical, and physical) of the TIJAH system are enhanced to support the extensions. Additionally, we analyze different ways of removing overlap and explain how structural information can be used for relevance feedback.

international conference on data engineering | 2002

Content-based video indexing for the support of digital library search

Milan Petkovic; van R. Zwol; Henk Ernst Blok; Willem Jonker; Peter M. G. Apers; M.A. Windhouwer; Martin L. Kersten

Presents a digital library search engine that combines efforts of the AMIS and DMW research projects, each covering significant parts of the problem of finding the required information in an enormous mass of data. The most important contributions of our work are the following: (1) We demonstrate a flexible solution for the extraction and querying of meta-data from multimedia documents in general. (2) Scalability and efficiency support are illustrated for full-text indexing and retrieval. (3) We show how, for a more limited domain, like an intranet, conceptual modelling can offer additional and more powerful query facilities. (4) In the limited domain case, we demonstrate how domain knowledge can be used to interpret low-level features into semantic content. In this short description, we focus on the first and fourth items.

conference on information and knowledge management | 2001

Predicting the cost-quality trade-off for information retrieval queries: facilitating database design and query optimization

Henk Ernst Blok; Djoerd Hiemstra; Sunil Choenni; Franciska de Jong; Henk M. Blanken; Peter M.G. Apers

Efficient, flexible, and scalable integration of full text information retrieval (IR) in a DBMS is not a trivial case. This holds in particular for query optimization in such a context. To facilitate the bulk-oriented behavior of database query processing, a priori knowledge of how to limit the data efficiently prior to query evaluation is very valuable at optimization time. The usually imprecise nature of IR querying provides an extra opportunity to limit the data by a trade-off with the quality of the answer. In this paper we present a mathematically derived model to predict the quality implications of neglecting information before query execution. In particular we investigate the possibility to predict the retrieval quality for a document collection for which no training information is available, which is usually the case in practice. Instead, we construct a model that can be trained on other document collections for which the necessary quality information is available, or can be obtained quite easily. We validate our model for several document collections and present the experimental results. These results show that our model performs quite well, even for the case were we did not train it on the test collection itself.

database systems for advanced applications | 2006

Handling uncertainty and ignorance in databases: a rule to combine dependent data

Sunil Choenni; Henk Ernst Blok; Erik Leertouwer

In many applications, uncertainty and ignorance go hand in hand. Therefore, to deliver database support for effective decision making, an integrated view of uncertainty and ignorance should be taken. So far, most of the efforts attempted to capture uncertainty and ignorance with probability theory. In this paper, we discuss the weakness to capture ignorance with probability theory, and propose an approach inspired by the Dempster-Shafer theory to capture uncertainty and ignorance. Then, we present a rule to combine dependent data that are represented in different relations. Such a rule is required to perform joins in a consistent way. We illustrate that our rule is able to solve the so-called problem of information loss, which was considered as an open problem so far.

IEEE Transactions on Knowledge and Data Engineering | 2004

A selectivity model for fragmented relations: applied in information retrieval

Henk Ernst Blok; Sunil Choenni; Henk M. Blanken; Peter M.G. Apers

New application domains cause todays database sizes to grow rapidly, posing great demands on technology. Data fragmentation facilitates techniques (like distribution, parallelization. and main-memory computing) meeting these demands. Also, fragmentation might help to improve efficient processing of query types such as top N. Database design and query optimization require a good notion of the costs resulting from a certain fragmentation. Our mathematically derived selectivity model facilitates this. Once its two parameters have been computed based on the fragmentation, after each (though usually infrequent) update, our model can forget the data distribution, resulting in fast and quite good selectivity estimation. We show experimental verification for Zipfian distributed IR databases.

database and expert systems applications | 2003

Moa and the Multi-model Architecture: A New Perspective on NF2

M. van Keulen; Jochem Vonk; A.P. de Vries; Jan Flokstra; Henk Ernst Blok

Advanced non-traditional application domains such as geographic information systems and digital library systems demand advanced data management support. In an effort to cope with this demand, we present the concept of a novel multi-model DBMS architecture which provides evaluation of queries on complexly structured data without sacrificing efficiency. A vital role in this architecture is played by the Moa language featuring a nested relational data model based on XNF2, in which we placed renewed interest. Furthermore, extensibility in Moa avoids optimization obstacles due to black-box treatment of ADTs. The combination of a mapping of queries on complexly structured data to an efficient physical algebra expression via a nested relational algebra, extensibility open to optimization, and the consequently better integration of domain-specific algorithms, makes that the Moa system can efficiently and effectively handle complex queries from non-traditional application domains.

Lecture Notes in Computer Science | 2006