Małgorzata Marciniak
Polish Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Małgorzata Marciniak.
Journal of Biomedical Informatics | 2009
Agnieszka Mykowiecka; Małgorzata Marciniak; Anna Kupć
The paper describes a rule-based information extraction (IE) system developed for Polish medical texts. We present two applications designed to select data from medical documentation in Polish: mammography reports and hospital records of diabetic patients. First, we have designed a special ontology that subsequently had its concepts translated into two separate models, represented as typed feature structure (TFS) hierarchies, complying with the format required by the IE platform we adopted. Then, we used dedicated IE grammars to process documents and fill in templates provided by the models. In particular, in the grammars, we addressed such linguistic issues as: ambiguous keywords, negation, coordination or anaphoric expressions. Resolving some of these problems has been deferred to a post-processing phase where the extracted information is further grouped and structured into more complex templates. To this end, we defined special heuristic algorithms on the basis of sample data. The evaluation of the implemented procedures shows their usability for clinical data extraction tasks. For most of the evaluated templates, precision and recall well above 80% were obtained.
intelligent information systems | 2004
Jakub Piskorski; Peter Homola; Małgorzata Marciniak; Agnieszka Mykowiecka; Adam Przepiórkowski; Marcin Woliński
The aim of this article is to present the initial results of adapting SProUT, a multi-lingual Natural Language Processing platform developed at DFKI, Germany, to the processing of Polish. The article describes some of the problems posed by the integration of Morfeusz, an external morphological analyzer for Polish, and various solutions to the problem of the lack of extensive gazetteers for Polish. The main sections of the article report on some initial experiments in applying this adapted system to the Information Extraction task of identifying various classes of Named Entities in financial and medical texts, perhaps the first such Information Extraction effort for Polish.
Archive | 2009
Małgorzata Marciniak; Agnieszka Mykowiecka
Inevitably, reading is one of the requirements to be undergone. To improve the performance and quality, someone needs to have something new every day. It will suggest you to have more inspirations, then. However, the needs of inspirations will make you searching for some sources. Even from the other people experience, internet, and many books. Books and internet are the recommended media to help you improving your quality and performance.
intelligent information systems | 2005
Agnieszka Mykowiecka; Anna Kupść; Małgorzata Marciniak
We present the final version of the system for automatic content extraction from Polish medical data. The system combines general IE techniques with an external post-processing. The obtained data is normalized and linked to a simplified ontology. Then, it is automatically grouped to form more complex structures representing medical reports.
language resources and evaluation | 2003
Małgorzata Marciniak; Agnieszka Mykowiecka; Adam Przepiórkowski; Anna Kupść
The paper presents both conceptual and technical issues related to the construc- tion of an HPSG test-suite for Polish. The test-suite consists of sentences of written Polish — both grammatical and ungrammatical. Each sentence is annotated with a list of linguistic phenomena it illustrates. Additionally, grammatical sentences are encoded in HPSG-style AVM structures. We describe also a technical organization of the database, as well as possible operations on it.
Journal of Biomedical Semantics | 2014
Małgorzata Marciniak; Agnieszka Mykowiecka
BackgroundHospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need information on the phrases we are looking for. At the moment, clinical Polish resources are sparse. The existing terminologies, such as Polish Medical Subject Headings (MeSH), do not provide sufficient coverage for clinical tasks. It would be helpful therefore if it were possible to automatically prepare, on the basis of a data sample, an initial set of terms which, after manual verification, could be used for the purpose of information extraction.ResultsUsing a combination of linguistic and statistical methods for processing over 1200 children hospital discharge records, we obtained a list of single and multiword terms used in hospital discharge documents written in Polish. The phrases are ordered according to their presumed importance in domain texts measured by the frequency of use of a phrase and the variety of its contexts. The evaluation showed that the automatically identified phrases cover about 84% of terms in domain texts. At the top of the ranked list, only 4% out of 400 terms were incorrect while out of the final 200, 20% of expressions were either not domain related or syntactically incorrect. We also observed that 70% of the obtained terms are not included in the Polish MeSH.ConclusionsAutomatic terminology extraction can give results which are of a quality high enough to be taken as a starting point for building domain related terminological dictionaries or ontologies. This approach can be useful for preparing terminological resources for very specific subdomains for which no relevant terminologies already exist. The evaluation performed showed that none of the tested ranking procedures were able to filter out all improperly constructed noun phrases from the top of the list. Careful choice of noun phrases is crucial to the usefulness of the created terminological resource in applications such as lexicon construction or acquisition of semantic relations from texts.
Intelligent Tools for Building a Scientific Information Platform | 2013
Małgorzata Marciniak; Agnieszka Mykowiecka
The paper presents a method of extracting terminology from Polish texts which consists of two steps. The first one identifies candidates for terms, and is supported by linguistic knowledge-a shallow grammar used for extracted phrases is given. The second step is based on statistics, consisting in ranking and filtering candidates for domain terms with the help of a C-value method, and phrases extracted from general Polish texts. The presented approach is sensitive to finding terminology also expressed as subphrases. We applied the method to economics texts, and describe the results of the experiment. The paper closes with an evaluation and a discussion of the results.
language and technology conference | 2009
Agnieszka Mykowiecka; Krzysztof Marasek; Małgorzata Marciniak; Joanna Rabiega-Wiśniewska; Ryszard Gubrynowicz
The paper presents a corpus of Polish spoken dialogues being a result of the LUNA (spoken Language UNderstanding in multilinguAl communication systems) project. We describe the process of collecting the corpus and its annotation on several levels, from transcription of dialogues and their morphosyntactic analysis, to semantic annotation on concepts and predicates. Annotation on the morphosyntactic and semantic levels was done automatically and then manually corrected. At the concept level, the annotation scheme comprises about 200 concepts from an ontology designed specially for the project. The set of frames for predicate level annotation was defined as a FrameNet-like resource.
intelligent information systems | 2006
Agnieszka Mykowiecka; Małgorzata Marciniak
The paper presents a program for automatic spelling correction of texts from a very speci c domain, which has been applied to mammography reports. We describe di erent types of errors and present the program of correction based on the Levenshtein distance and probability of bigrams.
Archive | 2013
Mieczyslaw A. Klopotek; Jacek Koronacki; Małgorzata Marciniak; Agnieszka Mykowiecka; Slawomir T. Wierzchon
Toponym extraction and disambiguation are key topics recently addressed by fields of Information Extraction and Geographical Information Retrieval. Toponym extraction and disambiguation are highly dependent processes. Not only toponym extraction effectiveness affects disambiguation, but also disambiguation results may help improving extraction accuracy. In this paper we propose a hybrid toponym extraction approach based on Hidden Markov Models (HMM) and Support Vector Machines (SVM). Hidden Markov Model is used for extraction with high recall and low precision. Then SVM is used to find false positives based on informativeness features and coherence features derived from the disambiguation results. Experimental results conducted with a set of descriptions of holiday homes with the aim to extract and disambiguate toponyms showed that the proposed approach outperform the state of the art methods of extraction and also proved to be robust. Robustness is proved on three aspects: language independence, high and low HMM threshold settings, and limited training data.