J.E.J.M. Odijk
Utrecht University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by J.E.J.M. Odijk.
Archive | 2013
Peter Spyns; J.E.J.M. Odijk
The book provides an overview of more than a decade of joint R&D efforts in the Low Countries on HLT for Dutch. It not only presents the state of the art of HLT for Dutch in the areas covered, but, even more importantly, a description of the resources (data and tools) for Dutch that have been created are now available for both academia and industry worldwide. The contributions cover many areas of human language technology (for Dutch): corpus collection (including IPR issues) and building (in particular one corpus aiming at a collection of 500M word tokens), lexicology, anaphora resolution, a semantic network, parsing technology, speech recognition, machine translation, text (summaries) generation, web mining, information extraction, and text to speech to name the most important ones. The book also shows how a medium-sized language community (spanning two territories) can create a digital language infrastructure (resources, tools, etc.) as a basis for subsequent R&D. At the same time, it bundles contributions of almost all the HLT research groups in Flanders and the Netherlands, hence offers a view of their recent research activities. Targeted readers are mainly researchers in human language technology, in particular those focusing on Dutch. It concerns researchers active in larger networks such as the CLARIN, META-NET, FLaReNet and participating in conferences such as ACL, EACL, NAACL, COLING, RANLP, CICling, LREC, CLIN and DIR (both in the Low Countries), InterSpeech, ASRU, ICASSP, ISCA, EUSIPCO, CLEF, TREC, etc. In addition, some chapters are interesting for human language technology policy makers and even for science policy makers in general.
language resources and evaluation | 2005
Bente Maegaard; Khalid Choukri; Nicoletta Calzolari; J.E.J.M. Odijk
The European Language Resources Association (ELRA) was founded in 1995 with the mission of providing language resources (LR) to European research institutions and companies. In this paper we describe the background, the mission and the major activities since then.
Essential Speech and Language Technology for Dutch. Results by the STEVIN-programme | 2013
J.E.J.M. Odijk
The central problems that this paper addresses are (i) the lack of large and rich formalised lexicons for multi-word expressions for use in Natural Language Processing (NLP); (ii) the lack of proper methods and tools to extend the lexicon of an NLP-system for multi-word expressions given a text corpus in a maximally automated manner. The paper describes innovative methods and tools for the automatic identification and lexical representation of multi-word expressions. In addition, it describes a 5.000 entry corpus-based multi-word expression lexical database for Dutch developed using these methods. The database has been externally validated, and its usability has been evaluated in NLP-systems for Dutch. The MWE database developed fills a gap in existing lexical resources for Dutch. The generic methods and tools for MWE identification and lexical representation focus on Dutch, but they are largely language-independent and can also be used for other languages, new domains, and beyond this project. The research results and data described in this paper contribute directly to strengthening the digital infrastructure for Dutch.
language resources and evaluation | 2014
Claudia Soria; Nicoletta Calzolari; Monica Monachini; Valeria Quochi; Núria Bel; Khalid Choukri; Joseph Mariani; J.E.J.M. Odijk; Stelios Piperidis
Abstract The main purpose of this paper is to serve as a landmark for future research and in particular for future strategic, infrastructural and coordination initiatives. It presents a preliminary plan for actions and infrastructures that could become the basis for future initiatives in the sector of Language Resources and Technologies (LRTs). The FLaReNet Language Resource Strategic Agenda presents a set of recommendations for the development and progress of LRT in Europe, as issued from a three-year consultation of the FLaReNet European project. Recommendations cover a broad range of topics and activities, spanning over production and use of language resources, licensing, maintenance and preservation issues, infrastructures for language resources, resource identification and sharing, evaluation and validation, interoperability and policy issues. The intended recipients belong to a large set of players and stakeholders in LRT, ranging from individuals to research and education institutions, to policy-makers, funding agencies, SMEs and large companies, service and media providers. The main goal of these recommendations is to serve as an instrument to support stakeholders in planning for and addressing the urgencies of the LRT of the future.
Essential Speech and Language Technology for Dutch. Results by the STEVIN-programme | 2013
J.E.J.M. Odijk
In this chapter I will briefly and very globally describe the impact of the STEVIN programme as a whole to human language technology (HLT) for the Dutch language in the Low Countries. I will identify a number of research data and topics that are, despite the STEVIN programme, still insufficiently covered but needed. Here I will also take into account international developments in the field of HLT that are relevant in this context. I identify recent international trends with regard to HLT programmes, assess the position of the Dutch language, and the prospects for funding of programmes and projects that are natural successors to the STEVIN programme. I also make some suggestions to government administrations for future policy actions.
text speech and dialogue | 2004
J.E.J.M. Odijk
I will first sketch some background on the company ScanSoft. Next, I will discuss ScanSoft’s products and technologies, which include digital imaging and OCR technology, automatic speech recognition technology (ASR), text-to-speech technology (TTS), dialogue technology, including multimodal dialogues, dictation technology and audiomining technology. I will sketch the basic functionality of these technologies, a global sketch of the components they are composed of, demonstrate some of them, and illustrate the platform types on which they can be used.
CLARIN in the Low Countries | 2017
J.E.J.M. Odijk
In this chapter I will describe what the CLARIN infrastructure is and how it can be used, with a focus on the Low Countries (and especially the Netherlands) part of the CLARIN infrastructure. I aim to explain how a Humanities researcher can use the CLARIN infrastructure. I describe the basic functionality that CLARIN aims to offer, including searching for data and software, applying software to data, and storing data and software resulting from research.
CLARIN in the Low Countries | 2017
J.E.J.M. Odijk
Given its origins in linguistics and language technology, it should come as no surprise that CLARIN-LC created many infrastructural facilities for linguistics. These will be discussed in this part of the book, with the exception of infrastructural facilities for syntax, to which a separate part of this book is dedicated (Part III). The chapters in this part only partially cover the work done in CLARIN-LC to support linguistic research. I will rst provide a brief overall overview of the relevant data and soware that resulted from CLARIN-LC (section 9.2), and then summarise the topics of the chapters of this part (section 9.3).
Archive | 2000
Kees van Deemter; J.E.J.M. Odijk
Context-dependent interpretation has taken centerstage in the theatre of language interpretation. The interpretation of personal pronouns, for example, is known to depend on the linguistic as well as the nonlinguistic environment in which they appear. Moreover, it has become clear that very similar kinds of dependence on context apply to many other phenomena including, among other things, the contextually restricted interpretation of a full Noun Phrase, the determination of the ‘comparison set’ relevant for the interpretation of a semantically vague predicate, the determination of so-called ‘implicit arguments’ of words like local and contemporary. (For references to the literature, see [van Deemter and Odijk, 1997].) Inspired by this growing body of work, dependence on linguistic context has become the cornerstone of the so-called dynamic theories of meaning (e.g. [Kamp and Reyle, 1994]). These theories characterize the meaning of a sentence as its potential to change one ‘information state’ into another, and it is this dynamic perspective on which current natural-language interpreting systems are beginning to be based.
language resources and evaluation | 2014
Nicoletta Calzolari; Khalid Choukri; Thierry Declerck; Hrafn Loftsson; Bente Maegaard; Joseph Mariani; Asunción Moreno; J.E.J.M. Odijk; Stelios Piperidis