Lars Asker | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lars Asker is active.

Explore More

Publication

Featured researches published by Lars Asker.

International Journal of Medical Informatics | 2002

Protein names and how to find them

Kristofer Franzén; Gunnar Eriksson; Fredrik Olsson; Lars Asker; Per Lidén; Joakim Cöster

A prerequisite for all higher level information extraction tasks is the identification of unknown names in text. Today, when large corpora can consist of billions of words, it is of utmost importance to develop accurate techniques for the automatic detection, extraction and categorization of named entities in these corpora. Although named entity recognition might be regarded a solved problem in some domains, it still poses a significant challenge in others. In this work we focus on one of the more difficult tasks, the identification of protein names in text. This task presents several interesting difficulties because of the named entities variant structural characteristics, their sometimes unclear status as names, the lack of common standards and fixed nomenclatures, and the specifics of the texts in the molecular biology domain in which they appear. We describe how we approached these and other difficulties in the implementation of Yapex, a system for the automatic identification of protein names in text. We also evaluate Yapex under four different notions of correctness and compare its performance to that of another publicly available system for protein name recognition.

international conference on computational linguistics | 2001

Automatic Keyword Extraction Using Domain Knowledge

Anette Hulth; Jussi Karlgren; Anna Jonsson; Henrik Boström; Lars Asker

Documents can be assigned keywords by frequency analysis of the terms found in the document text, which arguably is the primary source of knowledge about the document itself. By including a hierarchically organised domain specific thesaurus as a second knowledge source the quality of such keywords was improved considerably, as measured by match to previously manually assigned keywords. In the presented experiment, the combination of the evidence from frequency analysis and the hierarchically organised thesaurus was done using inductive logic programming.

international conference on computational linguistics | 2002

Notions of correctness when evaluating protein name taggers

Fredrik Olsson; Gunnar Eriksson; Kristofer Franzén; Lars Asker; Per Lidén

This paper introduces four different notions of correctness to be used when measuring the performance of protein name taggers, each of which reflects certain characteristics of the tagger under evaluation. The discussion regarding the different notions is centered around the evaluation of two protein name taggers; Yapex, developed by the authors, and KeX developed by Fukuda et al. (1998). For the purpose of illustrating the difference between the ways of evaluation, both taggers are applied to a test corpus of 101 MEDLINE abstracts in which all occurrences of protein names have been marked up by domain experts.

artificial intelligence in medicine in europe | 2013

Predicting Adverse Drug Events by Analyzing Electronic Patient Records

Isak Karlsson; Jing Zhao; Lars Asker; Henrik Boström

Diagnosis codes for adverse drug events (ADEs) are sometimes missing from electronic patient records (EPRs). This may not only affect patient safety in the worst case, but also the number of reported ADEs, resulting in incorrect risk estimates of prescribed drugs. Large databases of electronic patient records (EPRs) are potentially valuable sources of information to support the identification of ADEs. This study investigates the use of machine learning for predicting one specific ADE based on information extracted from EPRs, including age, gender, diagnoses and drugs. Several predictive models are developed and evaluated using different learning algorithms and feature sets. The highest observed AUC is 0.87, obtained by the random forest algorithm. The resulting model can be used for screening EPRs that are not, but possibly should be, assigned a diagnosis code for the ADE under consideration. Preliminary results from using the model are presented.

bioinformatics and biomedicine | 2014

Detecting adverse drug events with multiple representations of clinical measurements

Jing Zhao; Aron Henriksson; Lars Asker; Henrik Boström

Adverse drug events (ADEs) are grossly under-reported in electronic health records (EHRs). This could be mitigated by methods that are able to detect ADEs in EHRs, thereby allowing for missing ADE-specific diagnosis codes to be identified and added. A crucial aspect of constructing such systems is to find proper representations of the data in order to allow the predictive modeling to be as accurate as possible. One category of EHR data that can be used as indicators of ADEs are clinical measurements. However, using clinical measurements as features is not unproblematic due to the high rate of missing values and they can be repeated a variable number of times in each patient health record. In this study, five basic representations of clinical measurements are proposed and evaluated to handle these two problems. An empirical investigation using random forest on 27 datasets from a real EHR database with different ADE targets is presented, demonstrating that the predictive performance, in terms of accuracy and area under ROC curve, is higher when representing clinical measurements crudely as whether they were taken or how many times they were taken by a patient. Furthermore, a sixth alternative, combining all five basic representations, significantly outperforms using any of the basic representation except for one. A subsequent analysis of variable importance is also conducted with this fused feature set, showing that when clinical measurements have a high missing rate, the number of times they were taken by one patient is ranked as more informative than looking at their actual values. The observation from random forest is also confirmed empirically using other commonly employed classifiers. This study demonstrates that the way in which clinical measurements from EHRs are presented has a high impact for ADE detection, and that using multiple representations outperforms using a basic representation.

Proceedings of the First Workshop on Language Technologies for African Languages | 2009

Methods for Amharic Part-of-Speech Tagging

Björn Gambäck; Fredrik Olsson; Atelach Alemu Argaw; Lars Asker

The paper describes a set of experiments involving the application of three state-of-the-art part-of-speech taggers to Ethiopian Amharic, using three different tagsets. The taggers showed worse performance than previously reported results for English, in particular having problems with unknown words. The best results were obtained using a Maximum Entropy approach, while HMM-based and SVM-based taggers got comparable results.

meeting of the association for computational linguistics | 2007

An Amharic Stemmer : Reducing Words to their Citation Forms

Atelach Alemu Argaw; Lars Asker

Stemming is an important analysis step in a number of areas such as natural language processing (NLP), information retrieval (IR), machine translation(MT) and text classification. In this paper we present the development of a stemmer for Amharic that reduces words to their citation forms. Amharic is a Semitic language with rich and complex morphology. The application of such a stemmer is in dictionary based cross language IR, where there is a need in the translation step, to look up terms in a machine readable dictionary (MRD). We apply a rule based approach supplemented by occurrence statistics of words in a MRD and in a 3.1M words news corpus. The main purpose of the statistical supplements is to resolve ambiguity between alternative segmentations. The stemmer is evaluated on Amharic text from two domains, news articles and a classic fiction text. It is shown to have an accuracy of 60% for the old fashioned fiction text and 75% for the news articles.

inductive logic programming | 1999

Combining Divide-and-Conquer and Separate-and-Conquer for Efficient and Effective Rule Induction

Henrik Boström; Lars Asker

Divide-and-Conquer (DAC) and Separate-and-Conquer (SAC) are two strategies for rule induction that have been used extensively. When searching for rules DAC is maximally conservative w.r.t. decisions made during search for previous rules. This results in a very efficient strategy, which however suffers from diffculties in effectively inducing disjunctive concepts due to the replication problem. SAC on the other hand is maximally liberal in the same respect. This allows for a larger hypothesis space to be searched, which in many cases avoids the replication problem but at the cost of lower effciency. We present a hybrid strategy called Reconsider-and-Conquer (RAC), which handles the replication problem more effectively than DAC by reconsidering some of the earlier decisions and allows for more efficient induction than SAC by holding on to some of the decisions. We present experimental results from propositional, numerical and relational domains demonstrating that RAC significantly reduces the replication problem from which DAC suffers and is several times (up to an order of magnitude) faster than SAC.

Data Mining and Knowledge Discovery | 2014

A peek into the black box: exploring classifiers by randomization

Andreas Henelius; Kai Puolamäki; Henrik Boström; Lars Asker; Panagiotis Papapetrou

Classifiers are often opaque and cannot easily be inspected to gain understanding of which factors are of importance. We propose an efficient iterative algorithm to find the attributes and dependencies used by any classifier when making predictions. The performance and utility of the algorithm is demonstrated on two synthetic and 26 real-world datasets, using 15 commonly used learning algorithms to generate the classifiers. The empirical investigation shows that the novel algorithm is indeed able to find groupings of interacting attributes exploited by the different classifiers. These groupings allow for finding similarities among classifiers for a single dataset as well as for determining the extent to which different classifiers exploit such interactions in general.

cross language evaluation forum | 2004

Dictionary-based amharic: english information retrieval

Atelach Aelemu Argaw; Lars Asker; Rickard Cöster; Jussi Karlgren

We present two approaches to the Amharic – English bilingual track in CLEF 2004. Both experiments use a dictionary based approach to translate the Amharic queries into English Bags-of-words, but while one approach removes non-content bearing words from the Amharic queries based on their IDF value, the other uses a list of English stop words to perform the same task. The resulting translated (English) terms are then submitted to a retrieval engine that supports the Boolean and vector-space models. In our experiments, the second approach (based on a list of English stop words) performs slightly better than the one based on IDF values for the Amharic terms.

Explore More