Eva D'hondt
Radboud University Nijmegen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Eva D'hondt.
Computational Linguistics | 2013
Eva D'hondt; Suzan Verberne; Cornelis H. A. Koster; Lou Boves
With the increasing rate of patent application filings, automated patent classification is of rising economic importance. This article investigates how patent classification can be improved by using different representations of the patent documents. Using the Linguistic Classification System (LCS), we compare the impact of adding statistical phrases (in the form of bigrams) and linguistic phrases (in two different dependency formats) to the standard bag-of-words text representation on a subset of 532,264 English abstracts from the CLEF-IP 2010 corpus. In contrast to previous findings on classification with phrases in the Reuters-21578 data set, for patent classification the addition of phrases results in significant improvements over the unigram baseline. The best results were achieved by combining all four representations, and the second best by combining unigrams and lemmatized bigrams. This article includes extensive analyses of the class models (a.k.a. class profiles) created by the classifiers in the LCS framework, to examine which types of phrases are most informative for patent classification. It appears that bigrams contribute most to improvements in classification accuracy. Similar experiments were performed on subsets of French and German abstracts to investigate the generalizability of these findings.
patent information retrieval | 2010
Nelleke Oostdijk; Eva D'hondt; Hans van Halteren; Suzan Verberne
In this paper we investigate the variation in language use within the very broad patent domain. We find that language use (represented by syntactic phrases) not only differs from one patent class to the next, but is also a characteristic that sets apart the four sections of a patent (viz. Title, Abstract, Description and Claims). This lends support to the claim that these sections can be viewed as different text genres. For the development of a syntactic parser that is trained on patent texts, we quantify the domain and genre differences in terms of the amounts of text needed to train domain-dependent versions of the parser. Our quantified and exemplified findings on the domain variation in patent data are of interest for the patent retrieval and analysis communities.
cross language evaluation forum | 2009
Suzan Verberne; Eva D'hondt
In this paper we describe our participation in the 2009 CLEFIP task, which was targeted at prior-art search for topic patent documents. We opted for a baseline approach to get a feeling for the specifics of the task and the documents used. Our system retrieved patent documents based on a standard bag-of-words approach for both the Main Task and the English Task. In both runs, we extracted the claim sections from all English patents in the corpus and saved them in the Lemur index format with the patent IDs as DOCIDs. These claims were then indexed using Lemurs BuildIndex function. In the topic documents we also focused exclusively on the claims sections. These were extracted and converted to queries by removing stopwords and punctuation.We did not perform any term selection or query expansion. We retrieved 100 patents per topic using Lemurs RetEval function, retrieval model TF-IDF. Compared to the other runs submitted to the track, we obtained good results in terms of nDCG (0.46) and moderate results in terms of MAP (0.054).
Information Retrieval | 2014
Eva D'hondt; Suzan Verberne; Nelleke Oostdijk; Jean Beney; Cornelius Koster; Lou Boves
Abstract In this paper, we quantify the existence of concept drift in patent data, and examine its impact on classification accuracy. When developing algorithms for classifying incoming patent applications with respect to their category in the International Patent Classification (IPC) hierarchy, a temporal mismatch between training data and incoming documents may deteriorate classification results. We measure the effect of this temporal mismatch and aim to tackle it by optimal selection of training data. To illustrate the various aspects of concept drift on IPC class level, we first perform quantitative analyses on a subset of English abstracts extracted from patent documents in the CLEF-IP 2011 patent corpus. In a series of classification experiments, we then show the impact of temporal variation on the classification accuracy of incoming applications. We further examine what training data selection method, combined with our classification approach yields the best classifier; and how combining different text representations may improve patent classification. We found that using the most recent data is a better strategy than static sampling but that extending a set of recent training data with older documents does not harm classification performance. In addition, we confirm previous findings that using 2-skip-2-grams on top of the bag of unigrams structurally improves patent classification. Our work is an important contribution to the research into concept drift for text classification, and to the practice of classifying incoming patent applications.
patent information retrieval | 2010
Suzan Verberne; Eva D'hondt; Nelleke Oostdijk; Cornelis H. A. Koster
Language Learning | 2010
Suzan Verberne; Merijn Vogel; Eva D'hondt
CLEF (Notebook Papers/Labs/Workshop) | 2011
Eva D'hondt; Suzan Verberne; Wouter Alink; Roberto Cornacchia
computational linguistics in the netherlands | 2012
Eva D'hondt; N. Weber; Suzan Verberne; Cornelis H. A. Koster; Lou Boves
Proceedings of the Dutch-Belgium Information Retrieval workshop 2010 (DIR-2010) | 2010
Eva D'hondt; Suzan Verberne; Nelleke Oostdijk; Lou Boves
CLEF (Notebook Papers/Labs/Workshop) | 2011
Suzan Verberne; Eva D'hondt