Hans Moen
Norwegian University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hans Moen.
Journal of Biomedical Semantics | 2014
Aron Henriksson; Hans Moen; Maria Skeppstedt; Vidas Daudaravicius; Martin Duneld
BackgroundTerminologies that account for variation in language use by linking synonyms and abbreviations to their corresponding concept are important enablers of high-quality information extraction from medical texts. Due to the use of specialized sub-languages in the medical domain, manual construction of semantic resources that accurately reflect language use is both costly and challenging, often resulting in low coverage. Although models of distributional semantics applied to large corpora provide a potential means of supporting development of such resources, their ability to isolate synonymy from other semantic relations is limited. Their application in the clinical domain has also only recently begun to be explored. Combining distributional models and applying them to different types of corpora may lead to enhanced performance on the tasks of automatically extracting synonyms and abbreviation-expansion pairs.ResultsA combination of two distributional models – Random Indexing and Random Permutation – employed in conjunction with a single corpus outperforms using either of the models in isolation. Furthermore, combining semantic spaces induced from different types of corpora – a corpus of clinical text and a corpus of medical journal articles – further improves results, outperforming a combination of semantic spaces induced from a single source, as well as a single semantic space induced from the conjoint corpus. A combination strategy that simply sums the cosine similarity scores of candidate terms is generally the most profitable out of the ones explored. Finally, applying simple post-processing filtering rules yields substantial performance gains on the tasks of extracting abbreviation-expansion pairs, but not synonyms. The best results, measured as recall in a list of ten candidate terms, for the three tasks are: 0.39 for abbreviations to long forms, 0.33 for long forms to abbreviations, and 0.47 for synonyms.ConclusionsThis study demonstrates that ensembles of semantic spaces can yield improved performance on the tasks of automatically extracting synonyms and abbreviation-expansion pairs. This notion, which merits further exploration, allows different distributional models – with different model parameters – and different types of corpora to be combined, potentially allowing enhanced performance to be obtained on a wide range of natural language processing tasks.
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi) | 2014
Hans Moen; Erwin Marsi; Filip Ginter; Laura-Maria Murtola; Tapio Salakoski; Sanna Salanterä
The documentation of a care episode consists of clinical notes concerning patient care, concluded with a discharge summary. Care episodes are stored electronically and used throughout the health care sector by patients, administrators and professionals from different areas, primarily for clinical purposes, but also for secondary purposes such as decision support and research. A common use case is, given a – possibly unfinished – care episode, to retrieve the most similar care episodes among the records. This paper presents several methods for information retrieval, focusing on care episode retrieval, based on textual similarity, where similarity is measured through domain-specific modelling of the distributional semantics of words. Models include variants of random indexing and a semantic neural network model called word2vec. A novel method is introduced that utilizes the ICD-10 codes attached to care episodes to better induce domain-specificity in the semantic model. We report on an experimental evaluation of care episode retrieval that circumvents the lack of human judgements regarding episode relevance by exploiting (1) ICD10 codes of care episodes and (2) semantic similarity between their discharge summaries. Results suggest that several of the methods proposed outperform a state-ofthe art search engine (Lucene) on the retrieval task.
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing | 2013
Hans Moen; Erwin Marsi
Cross-lingual information retrieval aims at retrieving relevant documents from a document collection in a language different from the query language. A novel method is proposed which avoids direct translation of queries by implicit encoding of translations in a bilingual vector space model (VSM). Both queries and documents are represented as vectors using an extension of random indexing (RI). As work on RI for information retrieval is limited, it is first evaluated for monolingual retrieval. Two variants are tested: (1) a direct RI model that approximates a standard VSM; (2) an indirect RI model intended to capture latent semantic relations among terms with a sliding window procedure. Next cross-lingual extensions of these models are presented and evaluated for cross-lingual document retrieval.
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing | 2010
Pinar Öztürk; Rajendra Prasath; Hans Moen
Case Based Reasoning(CBR), an artificial intelligence technique, solves new problem by reusing solutions of previously solved similar cases. In conventional CBR, cases are represented in terms of structured attribute-value pairs. Acquisition of cases, either from domain experts or through manually crafting attribute-value pairs from incident reports, constitutes the main reason why CBR systems have not been more common in industries. Manual case generation is a laborious, costlier and time consuming task. Textual CBR (TCBR) is an emerging line that aims to apply CBR techniques on cases represented as textual descriptions. Similarity of cases is based on the similarity between their constituting features. Conventional CBR benefits from employing domain specific knowledge for similarity assessment. Correspondingly, TCBR needs to involve higher-order relationships between features, hence domain specific knowledge. In addition, the term order has also been contended to influence the similarity assessment. This paper presents an account where features and cases are represented using a distributed representation paradigm that captures higher-order relations among features as well as term order information.
BioNLP 2017 | 2017
Hans Moen; Kai Hakala; Farrokh Mehryary; Laura-Maria Peltonen; Tapio Salakoski; Filip Ginter; Sanna Salanterä
We study and compare two different approaches to the task of automatic assignment of predefined classes to clinical freetext narratives. In the first approach this is treated as a traditional mention-level named-entity recognition task, while the second approach treats it as a sentencelevel multi-label classification task. Performance comparison across these two approaches is conducted in the form of sentence-level evaluation and state-of-theart methods for both approaches are evaluated. The experiments are done on two data sets consisting of Finnish clinical text, manually annotated with respect to the topics pain and acute confusion. Our results suggest that the mentionlevel named-entity recognition approach outperforms sentence-level classification overall, but the latter approach still manages to achieve the best prediction scores on several annotation classes.
International Conference on Well-Being in the Information Society | 2016
Antti Vikström; Sanaz Rahimi Moosavi; Hans Moen; Tapio Salakoski; Sanna Salanterä
The purpose of this paper is to explore secondary use of Finnish electronic patient record (EPR) data in the context of clinical research and product development. Further, EPR availability enhancing procedures and technologies are analysed. The sensitive nature of patient data restricts the use and availability of EPR data in secondary purposes. A case study of secondary users of EPR data was conducted in Southwest Finland. Semi-structured interviews were used to evaluate the effectiveness of procedures and technologies implemented to protect EPR data. In total, 9 experts were interviewed from the fields of academic research, product development, and health management. The results show that three main factors affecting the availability of EPR data in secondary use are data management, privacy preserving, and secondary users. Challenges included in data management concerned the effect of demanding data request procedures and external information system service providers. Two privacy preserving approaches were identified: the use of altered data and protected EPR processing environment. These approaches provide higher availability or more valuable content, both affecting possible secondary users and use cases.
In: Proceedings of LBM 2013; 2013. p. 39-44. | 2013
Sampo Pyysalo; Filip Ginter; Hans Moen; Tapio Salakoski; Sophia Ananiadou
semantic mining in biomedicine | 2012
Aron Henriksson; Hans Moen; Maria Skeppstedt; Ann-Marie Eklund; Vidas Daudaravicius; Martin Hassel
BMC Medical Informatics and Decision Making | 2015
Hans Moen; Filip Ginter; Erwin Marsi; Laura-Maria Peltonen; Tapio Salakoski; Sanna Salanterä
Artificial Intelligence in Medicine | 2016
Hans Moen; Laura-Maria Peltonen; Juho Heimonen; Antti Airola; Tapio Pahikkala; Tapio Salakoski; Sanna Salanterä