Jakub M. Tomczak
Wrocław University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jakub M. Tomczak.
Applied Soft Computing | 2014
Maciej Ziba; Jakub M. Tomczak; Marek Lubicz; Jerzy Witek
In this paper, we present boosted SVM dedicated to solve imbalanced data problems. Proposed solution combines the benefits of using ensemble classifiers for uneven data together with cost-sensitive support vectors machines. Further, we present oracle-based approach for extracting decision rules from the boosted SVM. In the next step we examine the quality of the proposed method by comparing the performance with other algorithms which deal with imbalanced data. Finally, boosted SVM is used for medical application of predicting post-operative life expectancy in the lung cancer patients.
Expert Systems With Applications | 2016
Maciej Zieba; Sebastian Klaudiusz Tomczak; Jakub M. Tomczak
We propose a novel ensemble model for bankruptcy prediction.We use Extreme Gradient Boosting as an ensemble of decision trees.We propose a new approach for generating synthetic features to improve prediction.The presented method is evaluated on real-life data of Polish companies. Bankruptcy prediction has been a subject of interests for almost a century and it still ranks high among hottest topics in economics. The aim of predicting financial distress is to develop a predictive model that combines various econometric measures and allows to foresee a financial condition of a firm. In this domain various methods were proposed that were based on statistical hypothesis testing, statistical modeling (e.g., generalized linear models), and recently artificial intelligence (e.g., neural networks, Support Vector Machines, decision tress). In this paper, we propose a novel approach for bankruptcy prediction that utilizes Extreme Gradient Boosting for learning an ensemble of decision trees. Additionally, in order to reflect higher-order statistics in data and impose a prior knowledge about data representation, we introduce a new concept that we refer as to synthetic features. A synthetic feature is a combination of the econometric measures using arithmetic operations (addition, subtraction, multiplication, division). Each synthetic feature can be seen as a single regression model that is developed in an evolutionary manner. We evaluate our solution using the collected data about Polish companies in five tasks corresponding to the bankruptcy prediction in the 1st, 2nd, 3rd, 4th, and 5th year. We compare our approach with the reference methods.
Knowledge and Information Systems | 2013
Jakub M. Tomczak; Adam Gonczarek
The knowledge extraction is an important element of the e-Health system. In this paper, we introduce a new method for decision rules extraction called Graph-based Rules Inducer to support the medical interview in the diabetes treatment. The emphasis is put on the capability of hidden context change tracking. The context is understood as a set of all factors affecting patient condition. In order to follow context changes, a forgetting mechanism with a forgetting factor is implemented in the proposed algorithm. Moreover, to aggregate data, a graph representation is used and a limitation of the search space is proposed to protect from overfitting. We demonstrate the advantages of our approach in comparison with other methods through an empirical study on the Electricity benchmark data set in the classification task. Subsequently, our method is applied in the diabetes treatment as a tool supporting medical interviews.
Expert Systems With Applications | 2015
Jakub M. Tomczak; Maciej Zięba
We propose a comprehensible model for credit risk assessment using a scoring table.We use Restricted Boltzmann Machine to determine scoring points in a scoring table.We deal with the imbalanced data by applying the geometric mean criterion.The quality of the presented method is evaluated on four credit scoring datasets. Credit scoring is the assessment of the risk associated with a consumer (an organization or an individual) that apply for the credit. Therefore, the problem of credit scoring can be stated as a discrimination between those applicants whom the lender is confident will repay credit and those applicants who are considered by the lender as insufficiently reliable. In this work we propose a novel method for constructing comprehensible scoring model by applying Classification Restricted Boltzmann Machines (ClassRBM). In the first step we train the ClassRBM as a standalone classifier that has ability to predict credit status but does not contain interpretable structure. In order to obtain comprehensible model, first we evaluate the relevancy of each of binary features using ClassRBM and further we use these values to create the scoring table (scorecard). Additionally, we deal with the imbalanced data issue by proposing a procedure for determining the cutting point using the geometric mean of specificity and sensitivity. We evaluate our approach by comparing its performance with the results gained by other methods using four datasets from the credit scoring domain.
international conference on knowledge based and intelligent information and engineering systems | 2011
Piotr Rygielski; Jakub M. Tomczak
In this paper, the problem of detecting the major changes in the stream of service requests is formulated. The change of stream component varies over time and depends on, e.g., a time of a day. The underlying cause of the change is called a context. Hence, at each moment there exists a probability distribution determining the probability of requesting the system service conditioned by the context. The aim is to find such a moment in which the distributions change. To solve that problem dissimilarity measures between two probability distributions are given. Nevertheless, detecting every change is not interesting but only long-lasting changes because of the costs of the service system resources reallocation. Therefore, in the proposed algorithm an issue of sensitivity to temporary changes detection is considered.
asian conference on intelligent information and database systems | 2010
Janusz Sobecki; Jakub M. Tomczak
In the paper we present recommendation of student courses using Ant Colony Optimization (ACO). ACO is proved to be effective in solving many optimization problems, here we show that ACO also in the problem of prediction of final grades students receives on completing university courses is able to deliver good solutions. To apply ACO in any recommender system we need special problem representation in form of a graph, where each node represents a decision in the problem domain.
Machine Learning | 2015
Jakub M. Tomczak; Maciej Zięba
Application of machine learning to medical diagnosis entails facing two major issues, namely, a necessity of learning comprehensible models and a need of coping with imbalanced data phenomenon. The first one corresponds to a problem of implementing interpretable models, e.g., classification rules or decision trees. The second issue represents a situation in which the number of examples from one class (e.g., healthy patients) is significantly higher than the number of examples from the other class (e.g., ill patients). Learning algorithms which are prone to the imbalance data return biased models towards the majority class. In this paper, we propose a probabilistic combination of soft rules, which can be seen as a probabilistic version of the classification rules, by introducing new latent random variable called conjunctive feature. The conjunctive features represent conjunctions of values of attribute variables (features) and we assume that for given conjunctive feature the object and its label (class) become independent random variables. In order to deal with the between class imbalance problem, we present a new estimator which incorporates the knowledge about data imbalanceness into hyperparameters of initial probability of objects with fixed class labels. Additionally, we propose a method for aggregating sufficient statistics needed to estimate probabilities in a graph-based structure to speed up computations. At the end, we carry out two experiments: (1) using benchmark datasets, (2) using medical datasets. The results are discussed and the conclusions are drawn.
Computer Networks and Isdn Systems | 2012
Jakub M. Tomczak; Katarzyna Cieślińska; Michal Pleszkun
In this paper, the problem of ICT service mapping in service composition process is highlighted. In many cases, especially for telecommunication operators, it is important to allow service providers to compose services which support data transfer and processing. In this work a general service composition process is outlined and the ICT service mapping task is described. Next, the solution is proposed. At the end the case study is presented and conclusions are drawn. The main contribution of this paper is the proposition of decision tables as a tool for ICT service mapping.
Neural Processing Letters | 2017
Jakub M. Tomczak; Adam Gonczarek
The subspace restricted Boltzmann machine (subspaceRBM) is a third-order Boltzmann machine where multiplicative interactions are between one visible and two hidden units. There are two kinds of hidden units, namely, gate units and subspace units. The subspace units reflect variations of a pattern in data and the gate unit is responsible for activating the subspace units. Additionally, the gate unit can be seen as a pooling feature. We evaluate the behavior of subspaceRBM through experiments with MNIST digit recognition task and Caltech 101 Silhouettes image corpora, measuring cross-entropy reconstruction error and classification error.
machine vision applications | 2016
Adam Gonczarek; Jakub M. Tomczak
In this paper, we investigate articulated human motion tracking from video sequences using Bayesian approach. We derive a generic particle-based filtering procedure with a low-dimensional manifold. The manifold can be treated as a regularizer that enforces a distribution over poses during tracking process to be concentrated around the low-dimensional embedding. We refer to our method as manifold regularized particle filter. We present a particular implementation of our method based on back-constrained gaussian process latent variable model and gaussian diffusion. The proposed approach is evaluated using the real-life benchmark dataset HumanEva. We show empirically that the presented sampling scheme outperforms sampling-importance resampling and annealed particle filter procedures.