Anna Bartkowiak | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anna Bartkowiak is active.

Explore More

Publication

Featured researches published by Anna Bartkowiak.

computer information systems and industrial management applications | 2010

Anomaly, novelty, one-class classification: A short introduction

Anna Bartkowiak

In data analysis and decision making we need frequently to judge whether the observed data items are normal or abnormal. This happens in banking, credit card use, diagnosing a patients health state, fault detection in an engine or device like an off-shore oil platform or gearbox in an airplane motor. Sometimes the normal cases are boring and only the abnormal cases are of interest (anomaly hunting). In practice, it happens quite frequently that the normal state has a good representation, however the abnormal cases are rare and the abnormal class is ill-defined - then we have to judge on the abnormality using information from the normal class only. The problem is named ‘one-class classification’ (OCC). The paper gives a survey of methods for performing the OCC. There is also an example: how to detect a masquerader (non-legitimate user) in a computer system - when observing a sequence of commands several thousands long.

Anti-Cancer Drugs | 2014

Killing multiple myeloma cells with the small molecule 3-bromopyruvate: Implications for therapy

Grażyna Majkowska-Skrobek; Daria Augustyniak; Paweł Lis; Anna Bartkowiak; Mykhailo Gonchar; Young Hee Ko; Peter L. Pedersen; André Goffeau; Stanisław Ułaszewski

The small molecule 3-bromopyruvate (3-BP), which has emerged recently as the first member of a new class of potent anticancer agents, was tested for its capacity to kill multiple myeloma (MM) cancer cells. Human MM cells (RPMI 8226) begin to lose viability significantly within 8 h of incubation in the presence of 3-BP. The Km (0.3 mmol/l) for intracellular accumulation of 3-BP in MM cells is 24 times lower than that in control cells (7.2 mmol/l). Therefore, the uptake of 3-BP by MM cells is significantly higher than that by peripheral blood mononuclear cells. Further, the IC50 values for human MM cells and control peripheral blood mononuclear cells are 24 and 58 µmol/l, respectively. Therefore, specificity and selectivity of 3-BP toward MM cancer cells are evident on the basis of the above. In MM cells the transcription levels of the gene encoding the monocarboxylate transporter MCT1 is significantly amplified compared with control cells. The level of intracellular ATP in MM cells decreases by over 90% within 1 h after addition of 100 µmol/l 3-BP. The cytotoxicity of 3-BP, exemplified by a marked decrease in viability of MM cells, is potentiated by the inhibitor of glutathione synthesis buthionine sulfoximine. In addition, the lack of mutagenicity and its superior capacity relative to Glivec to kill MM cancer cells are presented in this study.

International Journal of Biometrics | 2010

Outliers in biometrical data: What's old, What's new

Anna Bartkowiak

Nowadays a huge amount of data is gathered in the biometric area, e.g., sequences of DNA code, graphical images for recognition or authorisation of subjects, video monitoring, clinical trials or health care. Outliers are observations which are discordant with the model describing the data. The appearance of an outlier may be caused by a gross error; alternatively, an outlier (or a group of them) may represent observations which are caused by phenomena not accounted for in the assumed model. The paper shows a subjective survey of some methods serving for detection of outliers or anomalies in multivariate data. The methods are viewed from historical perspective.

intelligent data engineering and automated learning | 2015

NMF and PCA as Applied to Gearbox Fault Data

Anna Bartkowiak; Radoslaw Zimroz

Both Non-negative matrix factorization (NMF) and Principal component analysis (PCA) are data reduction methods. Both of them act as approximation methods permitting to represent data by lower rank matrices. The two methods differ by their criteria how to obtain the approximation. We show that the main assumption of PCA demanding orthogonal principal components leads to a higher rank approximation as that established by NMF working without that assumption. This can be seen when analyzing a data matrix obtained from vibration signals emitted by a healthy and a faulty gearbox. To our knowledge this fact has not been clearly stated so far and no real example supporting our observation has been shown explicitly.

International Journal of Biometrics | 2010

Outliers in some Faces and non-Faces data

Anna Bartkowiak

We look for outliers in graphical data containing n = 6977 faces or non-faces images from Seungs collection. Our concern is: what kind of outliers may be found in such graphical data. To obtain the global geometrical characteristics, the Pseudo Grand Tour and Kohonenss self-organising maps are applied. We define as outliers those images which reproduce themselves badly from K principal components, with K denoting intrinsic dimension. The concept of mild and severe outliers, and own and alien principal components is also introduced.

International Journal of Biometrics | 2009

Are amino acids counts in yeast ORFs negative binomial

Anna Bartkowiak; Adam Szustalewicz

The genetic code of living organisms is inscribed into so called Open Reading Frames (ORFs) positioned in chromosomes. The code uses 20 amino acids as building blocks for the inscribed information. We show that the number of appearances of a given amino acid in ORFs on a yeast chromosome may be described in a highly satisfactory manner by the Negative Binomial (NB) distribution. The fit is surprisingly good. We show the results for ORFs found in four yeast chromosomes, namely no. 4, 7, 11 and 13. The negative binomial fit is shown (1) graphically; (2) considering the Kolmogorov statistic; (3) performing a chi-square test and (4) using simulated samples. The applicability of the Kolmogorov test to the analysed data is discussed.

computer information systems and industrial management applications | 2008

Intrinsic Dimensionality of Data and of their Representatives. A Case Study of Amino-Acid Distribution in ORFs

Anna Bartkowiak; Adam Szustalewicz

By intrinsic dimensionality of a data set we mean the smallest number of base vectors which permit to reconstruct the considered set. Nowadays we obtain very huge data sets, which are computationally demanding. Therefore we look for some representative data vectors (prototypes) which might yield an insight into the data and be used for a (preliminary) data analysis. Let D of size n times d denote the observed data set, and D1 of size M times d the set of representatives of the data. Denote by r, the number of base vectors spanning D, and by r1 the number of base vectors spanning the data set D1 appropriately. Our questions: (1) Are r and r1 equal? (2) Say, we want to choose base vectors k and k1 approximating the sets D and D1 with a given accuracy. Are k and k1 equal? We answer these questions by considering the data set amino 569 containing frequency distribution of twenty amino-acids composing the ORFs in the 7th yeast chromosome. The answer is: twice NO.

Archive | 2007

Visualization of Five Erosion Risk Classes using Kernel Discriminants

Anna Bartkowiak; Niki Evelpidou; Andreas Vasilopoulos

Kernel discriminants are greatly appreciated because 1) they permit to establish nonlinear boundaries between classes and 2) they offer the possibility of visualizing graphically the data vectors belonging to different classes. One such method, called Generalized Discriminant analysis (GDA) was proposed by Baudat and Anouar (2000). GDA operates on a kernel matrix of size N x N, (N denotes the sample size) and is for large N prohibitive. Our aim was to find out how this method works in a real situation, when dealing with relatively large data. We considered a set of predictors of erosion risk in the Kefallinia island categorized into five classes of erosion risk (together N=3422 data items). Direct evaluation of the discriminants, using entire data, was computationally demanding. Therefore, we sought for a representative sample. We found it by a kind of sieve algorithm. It appeared that using the representative sample, we could greatly speed up the evaluations and obtain discriminative functions with good generalization properties. We have worked with Gaussian kernels which need one declared parameter SIGMA called kernel width. We found that for a large range of parameters the GDA algorithm gave visualization with a good separation of the considered risk classes.

symbolic and numeric algorithms for scientific computing | 2005

Remarks on evaluation of correlation dimension for 5 French stock data

Anna Bartkowiak; Piotr Lipinski

Fractal correlation dimension (D/sub 2/) was introduced by Grassberger and Procaccia (1983) when considering some deterministic models originating from differential equations. Since then D/sub 2/ was applied in many situations; it has become also a tool in statistical data analysis of data burdened with noise. The presented paper attempts to emphasize the difficulties and uncertainties met when calculating correlation dimension for real time data observed as time series. In such situation the correlation dimension method works usually with an embedding space of dimension m. We see here two serious problems: 1) for given data, how to choose the embedding space; and 2) what is the accuracy of the characteristics D/sub 2/ obtained from the embedded data. The problems are illustrated considering time series of 5 French stock data, for which, despite all the restraints, we obtain reasonable results permitting to distinguish between stocks differentiated dynamics.

Przegląd Elektrotechniczny | 2017

Classic and convex non-negative matrix visualization in clustering two benchmark data

Anna Bartkowiak

Both the classic and the convex NMF (Nonnegative Matrix Factorization) yield a parsimonious, lower rank representation of the data. They may yield also an indication on a soft clustering of the data vectors, We analyze two sets of diagnostic data, wine and sonar, for which the classic and convex nonnegative matrix factorization (NMF) behave differently when indicating group membership of the data vectors. The data are given as mxn matrices, with columns denoting objects, and rows their attributes. We assess the clustering by multivariate graphical visualization methods. Streszczenie. Dla wybranych danych ’wine’ i ’sonar’ znajdujemy – za pomoca̧ NMF (nieujemna faktoryzacja macierzy) – ukryta̧ strukturȩ tych macierzy oraz wskazania co do klasteryzacji obiektów przedstawianych w kolumnach danych. Otrzymana̧ klasteryzacjȩ potwierdzamy trzema metodami wielozmiennej wizualizacji wektorów danych. (Klasteryzacja przy użyciu klasycznej i typu convex nieujemnej faktoryzacji macierzy na przykładzie dwóch zbiorów danych)

Explore More