Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval | 2021

REGIS: A Test Collection for Geoscientific Documents in Portuguese

 
 
 

Abstract


Experimental validation is key to the development of Information Retrieval (IR) systems. The standard evaluation paradigm requires a test collection with documents, queries, and relevance judgments. Creating test collections requires significant human effort, mainly for providing relevance judgments. As a result, there are still many domains and languages that, to this day, lack a proper evaluation testbed. Portuguese is an example of a major world language that has been overlooked in terms of IR research -- the only test collection available is composed of news articles from 1994 and a hundred queries. With the aim of bridging this gap, in this paper, we developed REGIS (Retrieval Evaluation for Geoscientific Information Systems), a test collection for the geoscientific domain in Portuguese. REGIS contains 20K documents and 34 query topics along with relevance assessments. We describe the procedures for document collection, topic creation, and relevance assessment. In addition, we report on results of standard IR techniques on REGIS so that they can serve as a baseline for future research.

Volume None
Pages None
DOI 10.1145/3404835.3463256
Language English
Journal Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Full Text