Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Franciska M.G. de Jong is active.

Publication


Featured researches published by Franciska M.G. de Jong.


semantics and digital media technologies | 2007

Annotation of heterogeneous multimedia content using automatic speech recognition

Marijn Huijbregts; Roeland Ordelman; Franciska M.G. de Jong

This paper reports on the setup and evaluation of robust speech recognition system parts, geared towards transcript generation for heterogeneous, real-life media collections. The system is deployed for generating speech transcripts for the NIST/TRECVID-2007 test collection, part of a Dutch real-life archive of news-related genres. Performance figures for this type of content are compared to figures for broadcast news test data.


european conference on research and advanced technology for digital libraries | 1999

Disambiguation Strategies for Cross-Language Information Retrieval

Djoerd Hiemstra; Franciska M.G. de Jong

This paper gives an overview of tools and methods for Cross-Language Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of disambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching.


EURASIP Journal on Advances in Signal Processing | 2003

A probabilistic multimedia retrieval model and its evaluation

Thijs Westerveld; Arjen P. de Vries; Alex van Ballegooij; Franciska M.G. de Jong; Djoerd Hiemstra

We present a probabilistic model for the retrieval of multimodal documents. The model is based on Bayesian decision theory and combines models for text-based search with models for visual search. The textual model is based on the language modelling approach to text retrieval, and the visual information is modelled as a mixture of Gaussian densities. Both models have proved successful on various standard retrieval tasks. We evaluate the multimodal model on the search task of TREC′s video track. We found that the disclosure of video material based on visual information only is still too difficult. Even with purely visual information needs, text-based retrieval still outperforms visual approaches. The probabilistic model is useful for text, visual, and multimedia retrieval. Unfortunately, simplifying assumptions that reduce its computational complexity degrade retrieval effectiveness. Regarding the question whether the model can effectively combine information from different modalities, we conclude that whenever both modalities yield reasonable scores, a combined run outperforms the individual runs.


International Journal on Digital Libraries | 2005

Accessing the spoken word

Jerry Goldman; Steve Renals; Steven Bird; Franciska M.G. de Jong; Marcello Federico; Carl Fleischhauer; Mark Kornbluh; Lori Lamel; Douglas W. Oard; Claire Stewart; Richard Wright

Spoken-word audio collections cover many domains, including radio and television broadcasts, oral narratives, governmental proceedings, lectures, and telephone conversations. The collection, access, and preservation of such data is stimulated by political, economic, cultural, and educational needs. This paper outlines the major issues in the field, reviews the current state of technology, examines the rapidly changing policy issues relating to privacy and copyright, and presents issues relating to the collection and preservation of spoken audio content .


semantics and digital media technologies | 2006

Automated speech and audio analysis for semantic access to multimedia

Franciska M.G. de Jong; Roeland Ordelman; Marijn Huijbregts

The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to increased granularity of automatically extracted metadata. A number of techniques will be presented, including the alignment of speech and text resources, large vocabulary speech recognition, key word spotting and speaker classification. The applicability of techniques will be discussed from a media crossing perspective. The added value of the techniques and their potential contribution to the content value chain will be illustrated by the description of two (complementary) demonstrators for browsing broadcast news archives.


international acm sigir conference on research and development in information retrieval | 2007

The influence of basic tokenization on biomedical document retrieval

Dolf Trieschnigg; Wessel Kraaij; Franciska M.G. de Jong

Tokenization is a fundamental preprocessing step in Information Retrieval systems in which text is turned into index terms. This paper quantifies and compares the influence of various simple tokenization techniques on document retrieval effectiveness in two domains: biomedicine and news. As expected, biomedical retrieval is more sensitive to small changes in the tokenization method. The tokenization strategy can make the difference between a mediocre and well performing IR system, especially in the biomedical domain.


Data-Centric Systems and Applications | 2007

Generative Probabilistic Models

Thijs Westerveld; Arjen P. de Vries; Franciska M.G. de Jong

Many content-based multimedia retrieval tasks can be seen as decision theory problems. Clearly, this is the case for classification tasks, like face detection, face recognition, or indoor/outdoor classification. In all these cases a system has to decide whether an image (or video) belongs to one class or another (respectively face or no face; face A, B, or C; and indoor or outdoor). Even the ad hoc retrieval tasks, where the goal is to find relevant documents given a description of an information need, can be seen as a decision theory problem: documents can be classified into relevant and non-relevant classes, or we can treat each of the documents in the collection as a separate class, and classify a query as belonging to one of these. In all these settings, a probabilistic approach seems natural: an image is assigned to the class with the highest probability.3 nIf some misclassifications are more severe than others, a decision theoretic approach should be taken, and images should be assigned to the class with lowest risk.


text speech and dialogue | 2001

Speech Recognition Issues for Dutch Spoken Document Retrieval

Roeland Ordelman; Adrianus J. van Hessen; Franciska M.G. de Jong

In this paper, ongoing work on the development of the speech recognition modules of MMIR environment for Dutch is described. The work on the generation of acoustic models and language models along with their current performance is presented. Some characteristics of the Dutch language and of the target video archives that require special treatment are discussed.


international acm sigir conference on research and development in information retrieval | 2005

Workshop on the evaluation of multimedia retrieval

Thijs Westerveld; Arjen P. de Vries; Franciska M.G. de Jong

The evaluation of multimedia retrieval is a subject that has gained momentum in the last couple of years. CWI, the National Research Institute for Mathematics and Computer Science in the Netherlands, organised a workshop organised on the subject on 24 November 2004. The main aim of the workshop was to bring together researchers and practitioners in the field of multimedia retrieval to discuss the area of multimedia in general and methodology for evaluation within this area in particular. The workshop, organised by Franciska de Jong (Utwente/TNO, NL), Arjen de Vries (CWI, NL) and Thijs Westerveld (CWI, NL) was an informal half-day meeting without papers or proceedings. Because the workshop was co-located with Thijs Westervelds PhD defence, we were able to invite Alex Hauptmann (CMU, PA, USA) to give a talk. In total six speakers were invited to present their work related to the evaluation of multimedia retrieval. The workshop started with a talk presenting multimedia retrieval in practise. Then, three talks discussed experiments in the laboratory context of the TRECVID video retrieval benchmark. The afternoon ended with a presentation of a study of interactive experiments and a talk discussing metrics for measuring multimedia retrieval effectiveness.


conference of the international speech communication association | 2003

Compound Decomposition in Dutch Large Vocabulary Speech Recognition

Roeland Ordelman; Adrianus J. van Hessen; Franciska M.G. de Jong

Collaboration


Dive into the Franciska M.G. de Jong's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Wessel Kraaij

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Marijn Huijbregts

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar

Arjen P. de Vries

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge