Niklas Lavesson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Niklas Lavesson is active.

Explore More

Publication

Featured researches published by Niklas Lavesson.

acs/ieee international conference on computer systems and applications | 2011

CudaRF: A CUDA-based implementation of Random Forests

Håkan Grahn; Niklas Lavesson; Mikael Hellborg Lapajne; Daniel Slat

Machine learning algorithms are frequently applied in data mining applications. Many of the tasks in this domain concern high-dimensional data. Consequently, these tasks are often complex and computationally expensive. This paper presents a GPU-based parallel implementation of the Random Forests algorithm. In contrast to previous work, the proposed algorithm is based on the compute unified device architecture (CUDA). An experimental comparison between the CUDA-based algorithm (CudaRF), and state-of-the-art Random Forests algorithms (Fas-tRF and LibRF) shows that CudaRF outperforms both FastRF and LibRF for the studied classification task.

Expert Systems With Applications | 2013

Open Data for Anomaly Detection in Maritime Surveillance

Samira Kazemi; Shahrooz Abghari; Niklas Lavesson; Henric Johnson; Peter Ryman

Maritime surveillance has received increased attention from a civilian perspective in recent years. Anomaly detection is one of many techniques available for improving the safety and security in this domain. Maritime authorities use confidential data sources for monitoring the maritime activities; however, a paradigm shift on the Internet has created new open sources of data. We investigate the potential of using open data as a complementary resource for anomaly detection in maritime surveillance. We present and evaluate a decision support system based on open data and expert rules for this purpose. We conduct a case study in which experts from the Swedish coastguard participate to conduct a real-world validation of the system. We conclude that the exploitation of open data as a complementary resource is feasible since our results indicate improvements in the efficiency and effectiveness of the existing surveillance systems by increasing the accuracy and covering unseen aspects of maritime activities.

Knowledge and Information Systems | 2011

Learning to detect spyware using end user license agreements

Niklas Lavesson; Martin Boldt; Paul Davidsson; Andreas Jacobsson

The amount of software that hosts spyware has increased dramatically. To avoid legal repercussions, the vendors need to inform users about inclusion of spyware via end user license agreements (EULAs) during the installation of an application. However, this information is intentionally written in a way that is hard for users to comprehend. We investigate how to automatically discriminate between legitimate software and spyware associated software by mining EULAs. For this purpose, we compile a data set consisting of 996 EULAs out of which 9.6% are associated to spyware. We compare the performance of 17 learning algorithms with that of a baseline algorithm on two data sets based on a bag-of-words and a meta data model. The majority of learning algorithms significantly outperform the baseline regardless of which data representation is used. However, a non-parametric test indicates that bag-of-words is more suitable than the meta model. Our conclusion is that automatic EULA classification can be applied to assist users in making informed decisions about whether to install an application without having read the EULA. We therefore outline the design of a spyware prevention tool and suggest how to select suitable learning algorithms for the tool by using a multi-criteria evaluation approach.

availability, reliability and security | 2010

Detection of Spyware by Mining Executable Files

Raja Khurram Shahzad; Syed Imran Haider; Niklas Lavesson

Spyware represents a serious threat to confidentiality since it may result in loss of control over private data for computer users. This type of software might collect the data and send it to a third party without informed user consent. Traditionally two approaches have been presented for the purpose of spyware detection: Signature-based Detection and Heuristic-based Detection. These approaches perform well against known Spyware but have not been proven to be successful at detecting new spyware. This paper presents a Spyware detection approach by using Data Mining (DM)technologies. Our approach is inspired by DM-based malicious code detectors, which are known to work well for detecting viruses and similar software. However, this type of detector has not been investigated in terms of how well it is able to detect spyware. We extract binary features, called n-grams, from both spyware and legitimate software and apply five different supervised learning algorithms to train classifiers that are able to classify unknown binaries by analyzing extracted n-grams. The experimental results suggest that our method is successful even when the training data is scarce.

Expert Systems With Applications | 2014

Detecting serial residential burglaries using clustering

Anton Borg; Martin Boldt; Niklas Lavesson; Ulf Melander; Veselka Boeva

According to the Swedish National Council for Crime Prevention, law enforcement agencies solved approximately three to five percent of the reported residential burglaries in 2012. Internationally, studies suggest that a large proportion of crimes are committed by a minority of offenders. Law enforcement agencies, consequently, are required to detect series of crimes, or linked crimes. Comparison of crime reports today is difficult as no systematic or structured way of reporting crimes exists, and no ability to search multiple crime reports exist. This study presents a systematic data collection method for residential burglaries. A decision support system for comparing and analysing residential burglaries is also presented. The decision support system consists of an advanced search tool and a plugin-based analytical framework. In order to find similar crimes, law enforcement officers have to review a large amount of crimes. The potential use of the cut-clustering algorithm to group crimes to reduce the amount of crimes to review for residential burglary analysis based on characteristics is investigated. The characteristics used are modus operandi, residential characteristics, stolen goods, spatial similarity, or temporal similarity. Clustering quality is measured using the modularity index and accuracy is measured using the rand index. The clustering solution with the best quality performance score were residential characteristics, spatial proximity, and modus operandi, suggesting that the choice of which characteristic to use when grouping crimes can positively affect the end result. The results suggest that a high quality clustering solution performs significantly better than a random guesser. In terms of practical significance, the presented clustering approach is capable of reduce the amounts of cases to review while keeping most connected cases. While the approach might miss some connections, it is also capable of suggesting new connections. The results also suggest that while crime series clustering is feasible, further investigation is needed.

International Journal of Intelligent Information and Database Systems | 2007

Evaluating learning algorithms and classifiers

Niklas Lavesson; Paul Davidsson

We analyse 18 evaluation methods for learning algorithms and classifiers, and show how to categorise these methods with the help of an evaluation method taxonomy based on several criteria. We also define a formal framework that make it possible to describe all methods using the same terminology, and apply it in a review of the state-of-the-art in learning algorithm and classifier evaluation. The framework enables comparison and deeper understanding of evaluation methods from different fields of research. Moreover, we argue that the framework and taxonomy support the process of finding candidate evaluation methods for a particular problem.

information security for south africa | 2011

Detecting scareware by mining variable length instruction sequences

Raja Khurram Shahzad; Niklas Lavesson

This paper presents a scareware detection method that is based on performing data mining on extracted variable length opcode sequences derived from instruction sequences of binary files. Our experi ...

availability, reliability and security | 2012

Veto-based Malware Detection

Raja Khurram Shahzad; Niklas Lavesson

Malicious software (malware) represents a threat to the security and privacy of computer users. Traditional signature-based and heuristic-based methods are unsuccessful in detecting some forms of malware. This paper presents a malware detection approach based on supervised learning. The main contributions of the paper are an ensemble learning algorithm, two pre-processing techniques, and an empirical evaluation of the proposed algorithm. Sequences of operational codes are extracted as features from malware and benign files. These sequences are used to produce three different data sets with different configurations. A set of learning algorithms is evaluated on the data sets and the predictions are combined by the ensemble algorithm. The predicted output is decided on the basis of veto voting. The experimental results show that the approach can accurately detect both novel and known malware instances with higher recall in comparison to majority voting.

Knowledge and Information Systems | 2012

Similarity assessment for removal of noisy end user license agreements

Niklas Lavesson; Stefan Axelsson

In previous work, we have shown the possibility to automatically discriminate between legitimate software and spyware-associated software by performing supervised learning of end user license agreements (EULAs). However, the amount of false positives (spyware classified as legitimate software) was too large for practical use. In this study, the false positives problem is addressed by removing noisy EULAs, which are identified by performing similarity analysis of the previously studied EULAs. Two candidate similarity analysis methods for this purpose are experimentally compared: cosine similarity assessment in conjunction with latent semantic analysis (LSA) and normalized compression distance (NCD). The results show that the number of false positives can be reduced significantly by removing noise identified by either method. However, the experimental results also indicate subtle performance differences between LSA and NCD. To improve the performance even further and to decrease the large number of attributes, the categorical proportional difference (CPD) feature selection algorithm was applied. CPD managed to greatly reduce the number of attributes while at the same time increase classification performance on the original data set, as well as on the LSA- and NCD-based data sets.

availability, reliability and security | 2011

Accurate Adware Detection Using Opcode Sequence Extraction

Raja Khurram Shahzad; Niklas Lavesson; Henric Johnson

Adware represents a possible threat to the security and privacy of computer users. Traditional signature-based and heuristic-based methods have not been proven to be successful at detecting this type of software. This paper presents an adware detection approach based on the application of data mining on disassembled code. The main contributions of the paper is a large publicly available adware data set, an accurate adware detection algorithm, and an extensive empirical evaluation of several candidate machine learning techniques that can be used in conjunction with the algorithm. We have extracted sequences of opcodes from adware and benign software and we have then applied feature selection, using different configurations, to obtain 63 data sets. Six data mining algorithms have been evaluated on these data sets in order to find an efficient and accurate detector. Our experimental results show that the proposed approach can be used to accurately detect both novel and known adware instances even though the binary difference between adware and legitimate software is usually small.

Explore More