Andrzej Janusz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andrzej Janusz is active.

Explore More

Publication

Featured researches published by Andrzej Janusz.

Information Sciences | 2014

Implementing algorithms of rough set theory and fuzzy rough set theory in the R package “RoughSets”

Lala Septem Riza; Andrzej Janusz; Christoph Bergmeir; Chris Cornelis; Francisco Herrera; Dominik Śle¸zak; José Manuel Benítez

Abstract The package RoughSets , written mainly in the R language, provides implementations of methods from the rough set theory (RST) and fuzzy rough set theory (FRST) for data modeling and analysis. It considers not only fundamental concepts (e.g., indiscernibility relations, lower/upper approximations, etc.), but also their applications in many tasks: discretization, feature selection, instance selection, rule induction, and nearest neighbor-based classifiers. The package architecture and examples are presented in order to introduce it to researchers and practitioners. Researchers can build new models by defining custom functions as parameters, and practitioners are able to perform analysis and prediction of their data using available algorithms. Additionally, we provide a review and comparison of well-known software packages. Overall, our package should be considered as an alternative software library for analyzing data based on RST and FRST.

Applied Artificial Intelligence | 2014

Rough Set Methods for Attribute Clustering and Selection

Andrzej Janusz; Dominik Ślęzak

In this study we investigate methods for attribute clustering and their possible applications to the task of computation of decision reducts from information systems. We focus on high-dimensional datasets, that is, microarray data. For this type of data, the traditional reduct construction techniques either can be extremely computationally intensive or can yield poor performance in terms of the size of the resulting reducts. We propose two reduct computation heuristics that combine the greedy search with a diverse selection of candidate attributes. Our experiments confirm that by proper grouping of similar—in some sense interchangeable—attributes, it is possible to significantly decrease computation time, as well as to increase a quality of the obtained reducts (i.e., to decrease their average size). We examine several criteria for attribute clustering, and we also identify so-called garbage clusters, which contain attributes that can be regarded as irrelevant.

International Conference on Rough Sets and Current Trends in Computing | 2012

Interactive Document Indexing Method Based on Explicit Semantic Analysis

Andrzej Janusz; Wojciech Świeboda; Adam Krasuski; Hung Son Nguyen

In this article we propose a general framework incorporating semantic indexing and search of texts within scientific document repositories. In our approach, a semantic interpreter, which can be seen as a tool for automatic tagging of textual data, is interactively updated based on feedback from the users, in order to improve quality of the tags that it produces. In our experiments, we index our document corpus using the Explicit Semantic Analysis (ESA) method. In this algorithm, an external knowledge base is used to measure relatedness between words and concepts, and those assessments are utilized to assign meaningful concepts to given texts. In the paper, we explain how the weights expressing relations between particular words and concepts can be improved by interaction with users or by employment of expert knowledge. We also present some results of experiments on a document corpus acquired from the PubMed Central repository to show feasibility of our approach.

rough sets and knowledge technology | 2011

Clustering of rough set related documents with use of knowledge from DBpedia

Marcin S. Szczuka; Andrzej Janusz; Kamil Herba

A case study of semantic clustering of scientific articles related to Rough Sets is presented. The proposed method groups the documents on the basis of their content and with assistance of DBpedia knowledge base. The text corpus is first treated with Natural Language Processing tools in order to produce vector representations of the content and then matched against a collection of concepts retrieved from DBpedia. As a result, a new representation is constructed that better reflects the semantics of the texts. With this new representation, the documents are hierarchically clustered in order to form partition of papers that share semantic relatedness. The steps in textual data preparation, utilization of DBpedia and clustering are explained and illustrated with results of experiments performed on a corpus of scientific documents about rough sets.

USAB'11 Proceedings of the 7th conference on Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society: information Quality in e-Health | 2011

Semantic analytics of pubmed content

Dominik Ślęzak; Andrzej Janusz; Wojciech Świeboda; Hung Son Nguyen; Jan G. Bazan; Andrzej Skowron

We present an architecture aimed at semantic search and synthesis of information acquired from the document repositories. The proposed framework is expected to provide domain knowledge interfaces enabling the internally implemented algorithms to identify relationships between documents, researchers, institutions, as well as concepts extracted from various types of knowledge bases. The framework should be scalable with respect to data volumes, diversity of analytic processes, and the speed of search. In this paper, we investigate these requirements for the case of medical publications gathered in PubMed.

Fundamenta Informaticae | 2012

Unsupervised Similarity Learning from Textual Data

Andrzej Janusz; Dominik Ślęzak; Hung Son Nguyen

This paper presents a research on the construction of a new unsupervised model for learning a semantic similarity measure from text corpora. Two main components of the model are a semantic interpreter of texts and a similarity function whose properties are derived from data. The first one associates particular documents with concepts defined in a knowledge base corresponding to the topics covered by the corpus. It shifts the representation of a meaning of the texts from words that can be ambiguous to concepts with predefined semantics. With this new representation, the similarity function is derived from data using a modification of the dynamic rule-based similarity model, which is adjusted to the unsupervised case. The adjustment is based on a novel notion of an information bireduct having its origin in the theory of rough sets. This extension of classical information reducts is used in order to find diverse sets of reference documents described by diverse sets of reference concepts that determine different aspects of the similarity. The paper explains a general idea of the approach and also gives some implementation guidelines. Additionally, results of some preliminary experiments are presented in order to demonstrate usefulness of the proposed model.

8th International Conference on Rough Sets and Current Trends in Computing | 2012

JRS’2012 Data Mining Competition: Topical Classification of Biomedical Research Papers

Andrzej Janusz; Hung Son Nguyen; Dominik Ślęzak; Sebastian Stawicki; Adam Krasuski

We summarize the JRS’2012 Data Mining Competition on “Topical Classification of Biomedical Research Papers”, held between January 2, 2012 and March 30, 2012 as an interactive on-line contest hosted on the TunedIT platform ( http://tunedit.org ). We present the scope and background of the challenge task, the evaluation procedure, the progress, and the results. We also present a scalable method for the contest data generation from biomedical research papers.

International Journal of Approximate Reasoning | 2017

Decision bireducts and decision reducts – a comparison

Sebastian Stawicki; Dominik Ślęzak; Andrzej Janusz; Sebastian Widz

Abstract In this paper we revise the notion of decision bireducts. We show new interpretations and we prove several important and practically useful facts regarding this notion. We also explain the way in which some of the well-known algorithms for computation of decision reducts can be modified for the purpose of computing decision bireducts. For the sake of completeness of our study we extend our investigations to relations between decision bireducts and so-called approximate decision reducts. We compare different formulations of those two approaches and draw analogies between them. We also report new results related to NP-hardness of searching for optimal decision bireducts and approximate decision reducts from data. Finally, we present new results of empirical tests which demonstrate usefulness of decision bireducts in a construction of efficient, yet simple ensembles of classification models.

granular computing | 2015

Mining Data from Coal Mines: IJCRS’15 Data Challenge

Andrzej Janusz; Marek Sikora; Łukasz Wróbel; Sebastian Stawicki; Marek Grzegorowski; Piotr Wojtas; Dominik Ślęzak

We summarize the data mining competition associated with IJCRS’15 conference – IJCRS’15 Data Challenge: Mining Data from Coal Mines, organized at Knowledge Pit web platform. The topic of this competition was related to the problem of active safety monitoring in underground corridors. In particular, the task was to design an efficient method of predicting dangerous concentrations of methane in longwalls of a Polish coal mine. We describe the scope and motivation for the competition. We also report the course of the contest and briefly discuss a few of the most interesting solutions submitted by participants. Finally, we reveal our plans for the future research within this important subject.

international syposium on methodologies for intelligent systems | 2015

Computation of Approximate Reducts with Dynamically Adjusted Approximation Threshold

Andrzej Janusz; Dominik Ślęzak

We continue our research on dynamically adjusted approximate reducts (DAAR). We modify DAAR computation algorithm to take into account dependencies between attribute values in data. We discuss a motivation for this improvement and analyze its performance impact. We also revisit a filtering technique utilizing approximate reducts to create a ranking of attributes according to their relevance. As an illustration we study a data set from AAIA’14 Data Mining Competition.

Explore More