Is this you? Create Your Porfile

Magnus Sahlgren

Swedish Institute of Computer Science

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Magnus Sahlgren is active.

Explore More

Publication

Featured researches published by Magnus Sahlgren.

international conference on computational linguistics | 2004

Using bag-of-concepts to improve the performance of support vector machines in text categorization

Magnus Sahlgren; Rickard Cöster

This paper investigates the use of concept-based representations for text categorization. We introduce a new approach to create concept-based text representations, and apply it to a standard text categorization collection. The representations are used as input to a Support Vector Machine classifier, and the results show that there are certain categories for which concept-based representations constitute a viable supplement to word-based ones. We also demonstrate how the performance of the Support Vector Machine can be improved by combining representations.

Natural Language Engineering | 2005

Automatic bilingual lexicon acquisition using random indexing of parallel corpora

Magnus Sahlgren; Jussi Karlgren

This paper presents a very simple and effective approach to using parallel corpora for automatic bilingual lexicon acquisition. The approach, which uses the Random Indexing vector space methodology, is based on finding correlations between terms based on their distributional characteristics. The approach requires a minimum of preprocessing and linguistic knowledge, and is efficient, fast and scalable. In this paper, we explain how our approach differs from traditional cooccurrence-based word alignment algorithms, and we demonstrate how to extract bilingual lexica using the Random Indexing approach applied to aligned parallel data. The acquired lexica are evaluated by comparing them to manually compiled gold standards, and we report overlap of around 60%. We also discuss methodological problems with evaluating lexical resources of this kind.

conference on information and knowledge management | 2009

Terminology mining in social media

Magnus Sahlgren; Jussi Karlgren

The highly variable and dynamic word usage in social media presents serious challenges for both research and those commercial applications that are geared towards blogs or other user-generated non-editorial texts. This paper discusses and exemplifies a terminology mining approach for dealing with the productive character of the textual environment in social media. We explore the challenges of practically acquiring new terminology, and of modeling similarity and relatedness of terms from observing realistic amounts of data. We also discuss semantic evolution and density, and investigate novel measures for characterizing the preconditions for terminology mining.

european conference on information retrieval | 2008

Filaments of meaning in word space

Jussi Karlgren; Anders Holst; Magnus Sahlgren

Word space models, in the sense of vector space models built on distributional data taken from texts, are used to model semantic relations between words. We argue that the high dimensionality of typical vector space models lead to unintuitive effects on modeling likeness of meaning and that the local structure of word spaces is where interesting semantic relations reside.We show that the local structure of word spaces has substantially different dimensionality and character than the global space and that this structure shows potential to be exploited for further semantic analysis using methods for local analysis of vector space structure rather than globally scoped methods typically in use today such as singular value decomposition or principal component analysis.

cross language evaluation forum | 2002

SICS at CLEF 2002: Automatic Query Expansion Using Random Indexing

Magnus Sahlgren; Jussi Karlgren; Rickard Cöster; Timo Järvinen

Vector space techniques can be used for extracting semantically similar words from the co-occurrence statistics of words in large text data collections. We have used a technique called Random Indexing to accumulate context vectors for Swedish, French and Italian. We have then used the context vectors to perform automatic query expansion. In this paper, we report on our CLEF 2002 experiments on Swedish, French and Italian monolingual query expansion.

european conference on information retrieval | 2012

Usefulness of sentiment analysis

Jussi Karlgren; Magnus Sahlgren; Fredrik Olsson; Fredrik Espinoza; Ola Hamfors

What can text sentiment analysis technology be used for, and does a more usage-informed view on sentiment analysis pose new requirements on technology development?

cross language evaluation forum | 2001

Vector-Based Semantic Analysis Using Random Indexing for Cross-Lingual Query Expansion

Magnus Sahlgren; Jussi Karlgren

Random Indexing is a vector-based technique for extracting semantically similar words from the co-occurrence statistics of words in large text data. We have applied the technique on aligned bilingual corpora, producing French-English and Swedish-English thesauri that we have used for cross-lingual query expansion. In this paper, we report on our CLEF 2001 experiments on French-to-English and Swedish-to-English query expansion.

empirical methods in natural language processing | 2015

Navigating the Semantic Horizon using Relative Neighborhood Graphs

Amaru Cuba Gyllensten; Magnus Sahlgren

This paper is concerned with nearest neighbor search in distributional semantic models. A normal nearest neighbor search only returns a ranked list of neighbors, with no information about the structure or topology of the local neighborhood. This is a potentially serious shortcoming of the mode of querying a distributional semantic model, since a ranked list of neighbors may conflate several different senses. We argue that the topology of neighborhoods in semantic space provides important information about the different senses of terms, and that such topological structures can be used for word-sense induction. We also argue that the topology of the neighborhoods in semantic space can be used to determine the semantic horizon of a point, which we define as the set of neighbors that have a direct connection to the point. We introduce relative neighborhood graphs as method to uncover the topological properties of neighborhoods in semantic models. We also provide examples of relative neighborhood graphs for three well-known semantic models; the PMI model, the GloVe model, and the skipgram model.

Computational Intelligence and Neuroscience | 2015

Encoding sequential information in semantic space models: comparing holographic reduced representation and random permutation

Gabriel Recchia; Magnus Sahlgren; Pentti Kanerva; Michael N. Jones

Circular convolution and random permutation have each been proposed as neurally plausible binding operators capable of encoding sequential information in semantic memory. We perform several controlled comparisons of circular convolution and random permutation as means of encoding paired associates as well as encoding sequential information. Random permutations outperformed convolution with respect to the number of paired associates that can be reliably stored in a single memory trace. Performance was equal on semantic tasks when using a small corpus, but random permutations were ultimately capable of achieving superior performance due to their higher scalability to large corpora. Finally, “noisy” permutations in which units are mapped to other units arbitrarily (no one-to-one mapping) perform nearly as well as true permutations. These findings increase the neurological plausibility of random permutations and highlight their utility in vector space models of semantics.

cross language evaluation forum | 2003

Selective Compound Splitting of Swedish Queries for Boolean Combinations of Truncated Terms

Rickard Cöster; Magnus Sahlgren; Jussi Karlgren

In languages that use compound words such as Swedish, it is often neccessary to split compound words when indexing documents or queries. One of the problems is that it is difficult to find constituents that express a concept similar to that expressed by the compound. The approach taken here is to expand a query with the leading constituents of the compound words. Every query term is truncated so as to increase recall by hopefully finding other compounds with the leading constituent as prefix. This approach increases recall in a rather uncontrolled way, so we use a Boolean quorum-level search method to rank documents both according to a tf-idf factor but also to the number of matching Boolean combinations.

Explore More