I.H.E. Hendrickx | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where I.H.E. Hendrickx is active.

Explore More

Publication

Featured researches published by I.H.E. Hendrickx.

social informatics | 2013

Documenting Social Unrest: Detecting Strikes in Historical Daily Newspapers

Kalliopi Zervanou; Marten During; I.H.E. Hendrickx; Antal van den Bosch

The identification of relevant historical sources such as newspapers and letters and the extraction of information from them is an essential part of historical research. In this work, our aim is the detection of relevant primary sources with the goal to support researchers working on a specific historical event. We focus on the historical daily Dutch newspaper archive of the National Library of the Netherlands and strike events that happened in the Netherlands during the 1980s. Using a manually compiled database of strikes in the Netherlands, we first attempt to find reports on those strikes in historical daily newspapers by automatically associating database records to the daily press of the time covering the same strike. Then, we generalise our methodology to detect strike events in the press not currently covered by the strikes database, and support in this way the extension of secondary historical resources. Our methods are evaluated against the manually constructed database of strikes.

language resources and evaluation | 2018

Creating a reference data set for the summarization of discussion forum threads

Suzan Verberne; Emiel Krahmer; I.H.E. Hendrickx; Sander Wubben; Antal van den Bosch

In this paper we address extractive summarization of long threads in online discussion fora. We present an elaborate user evaluation study to determine human preferences in forum summarization and to create a reference data set. We showed long threads to ten different raters and asked them to create a summary by selecting the posts that they considered to be the most important for the thread. We study the agreement between human raters on the summarization task, and we show how multiple reference summaries can be combined to develop a successful model for automatic summarization. We found that although the inter-rater agreement for the summarization task was slight to fair, the automatic summarizer obtained reasonable results in terms of precision, recall, and ROUGE. Moreover, when human raters were asked to choose between the summary created by another human and the summary created by our model in a blind side-by-side comparison, they judged the model’s summary equal to or better than the human summary in over half of the cases. This shows that even for a summarization task with low inter-rater agreement, a model can be trained that generates sensible summaries. In addition, we investigated the potential for personalized summarization. However, the results for the three raters involved in this experiment were inconclusive. We release the reference summaries as a publicly available dataset.

Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage | 2017

A Memory-Based Lemmatizer for Ancient Greek

Corien Bary; Peter Berck; I.H.E. Hendrickx

In this paper we present the lemmatizer that we developed for Ancient Greek: GLEM. As far as we know, GLEM is the first publicly available lemmatizer for Ancient Greek that uses POS information to disambiguate and that also assigns output to unseen words, words that are not yet in the lexicon. As the basis for the lemmatizer we used an existing memory-based learning tool, Frog, that was originally developed for Dutch and that we converted to work for Ancient Greek. As the results of Frog on Ancient Greek were rather modest, we used Frog to create a smarter lemmatizer, GLEM, that uses a lexicon look up in addition to the memory-based tool Frog. We evaluate and compare the performance of GLEM against the Frog lemmatizer and the already existing CLTK lemmatizer and observe that GLEM achieves the highest accuracy of 93% on an unseen test corpus sample. GLEMs look up component overcomes the difficulty of a relative small training set in combination with a morphologically rich language, while the memory-based learning component enables GLEM to handle unknown words.

cross language evaluation forum | 2016

Overview of the SBS 2016 Mining Track

Toine Bogers; I.H.E. Hendrickx; Marijn Koolen; Suzan Verberne

Archive | 2016

TraMOOC (Translation for Massive Open Online Courses): Providing Reliable MT for MOOCs

Valia Kordoni; Lexi Birch; Ioana Buliga; Kostadin Cholakov; Markus Egg; Federico Gaspari; Yota Georgakopoulou; Maria Gialama; I.H.E. Hendrickx; Mitja Jermol; Katia Lida Kermanidis; Joss Moorkens; Davor Orlic; Michael Papadopoulos; Maja Popović; Rico Sennrich; Vilelmini Sosoni; Dimitrios Tsoumakos; Antal van den Bosch; Menno van Zaanen; Andy Way

language resources and evaluation | 2014