Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Silja Huttunen is active.

Publication


Featured researches published by Silja Huttunen.


Journal of Biomedical Informatics | 2002

Information extraction for enhanced access to disease outbreak reports

Ralph Grishman; Silja Huttunen; Roman Yangarber

Document search is generally based on individual terms in the document. However, for collections within limited domains it is possible to provide more powerful access tools. This paper describes a system designed for collections of reports of infectious disease outbreaks. The system, Proteus-BIO, automatically creates a table of outbreaks, with each table entry linked to the document describing that outbreak; this makes it possible to use database operations such as selection and sorting to find relevant documents. Proteus-BIO consists of a Web crawler which gathers relevant documents; an information extraction engine which converts the individual outbreak events to a tabular database; and a database browser which provides access to the events and, through them, to the documents. The information extraction engine uses sets of patterns and word classes to extract the information about each event. Preparing these patterns and word classes has been a time-consuming manual operation in the past, but automated discovery tools now make this task significantly easier. A small study comparing the effectiveness of the tabular index with conventional Web search tools demonstrated that users can find substantially more documents in a given time period with Proteus-BIO.


empirical methods in natural language processing | 2005

Extracting Information about Outbreaks of Infectious Epidemics

Roman Yangarber; Lauri Jokipii; Antti Rauramo; Silja Huttunen

This work demonstrates the ProMED-PLUS Epidemiological Fact Base. The facts are automatically extracted from plain-text reports about outbreaks of infectious epidemics around the world. The system collects new reports, extracts new facts, and updates the database, in real time. The extracted database is available on-line through a Web server.


international conference on computational linguistics | 2002

Complexity of event structure in IE scenarios

Silja Huttunen; Roman Yangarber; Ralph Grishman

This paper presents new Information Extraction scenarios which are linguistically and structurally more challenging than the traditional MUC scenarios. Traditional views on event structure and template design are not adequate for the more complex scenarios.The focus of this paper is to show the complexity of the scenarios, and propose a way to recover the structure of the event. First we identify two structural factors that contribute to the complexity of scenarios: the scattering of events in text, and inclusion relationships between events. These factors cause difficulty in representing the facts in an unambiguous way. Then we propose a modular, hierarchical representation where the information is split in atomic units represented by templates, and where the inclusion relationships between the units are indicated by links. Lastly, we discuss how we may recover this representation from text, with the help of linguistic cues linking the events.


Multi-source, Multilingual Information Extraction and Summarization | 2013

Predicting Relevance of Event Extraction for the End User

Silja Huttunen; Arto Vihavainen; Mian Du; Roman Yangarber

We present work on estimating the relevance of the results of an Event Extraction system to the end-user’s needs. Our aim is to develop user-oriented measures of utility of the extracted events, i.e., how useful is the factual information found in the document for the end user. We introduce discourse and lexical features, and build classifiers that learn from the users’ ratings of the relevance of the extraction results. Traditional criteria for evaluating the performance of Information Extraction (IE) focus on the correctness of the extracted information, e.g., in terms of recall, precision, F-measure, etc. We rather focus on subjective criteria for evaluating the quality of the extracted information: utility of results to the end-user. To measure utility, we use methods from text mining and linguistic analysis to identify features that are good predictors of the relevance of an event or a document. We report on experiments in two real-world event extraction domains: corporate activities reported in business news, and health threats in news about infectious epidemics.


knowledge discovery and data mining | 2010

Real-time text mining in multilingual news for the creation of a pre-frontier intelligence picture

Jakub Piskorski; Martin Atkinson; Jenya Belyaeva; Vanni Zavarella; Silja Huttunen; Roman Yangarber

This paper presents an endeavor aiming at construction of a real-time event extraction system for border security-related intelligence gathering from online news. First, the background and motivation behind the presented work is given. Next, the paper describes the event extraction processing chain, the specifics of the domain, i.e., illegal migration and related cross-border crime, and event moderation and visualisation aspects of the system.


intelligence and security informatics | 2010

News mining for border security Intelligence

Martin Atkinson; Jenya Belayeva; Vanni Zavarella; Jakub Piskorski; Silja Huttunen; Arto Vihavainen; Roman Yangarber

This presentation gives an overview of an effort to construct OSINT (Open-Source Intelligence) tools for Frontex, the European Agency for the Management of Operational Cooperation at the External Borders of the Member States of the European Union, to facilitate automating the process of extracting structured knowledge from on-line news articles on border-security related events at the EU borders and in related third countries. A particular focus is on incidents and developments which are of relevance in the context of illegal migration. This includes: (a) illegal migration incidents (e.g., illegal border crossing attempts), (b) related cross-border crime (e.g., human/arms/drug trafficking), (c) related crisis events (e.g., terrorist attacks, outbreaks of infectious disease).


user centric media | 2009

A Proposal for a Multilingual Epidemic Surveillance System

Gaël Lejeune; Mohamed Hatmi; Antoine Doucet; Silja Huttunen; Nadine Lucas

In epidemic surveillance, monitoring numerous languages is an important issue. In this paper we present a system designed to work on French, Spanish and English. The originality of our system is that we use only a few resources to perform our information extraction tasks. Instead of using ontologies, we use structure patterns of newspapers articles. The results on these three languages are encouraging at the preliminary stage and we will present a few examples of interesting experiments in other languages.


international conference on computational linguistics | 2000

Automatic acquisition of domain knowledge for Information Extraction

Roman Yangarber; Ralph Grishman; Pasi Tapanainen; Silja Huttunen


international conference on human language technology research | 2002

Real-time event extraction for infectious disease outbreaks

Ralph Grishman; Silja Huttunen; Roman Yangarber


language resources and evaluation | 2002

Diversity of Scenarios in Information extraction.

Silja Huttunen; Roman Yangarber; Ralph Grishman

Collaboration


Dive into the Silja Huttunen's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mian Du

University of Helsinki

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Antoine Doucet

University of La Rochelle

View shared research outputs
Researchain Logo
Decentralizing Knowledge