Andrej Kastrin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andrej Kastrin is active.

Explore More

Publication

Featured researches published by Andrej Kastrin.

Journal of Human Genetics | 2007

Role of genetic polymorphisms in ACE and TNF-α gene in sarcoidosis: a meta-analysis

Igor Medica; Andrej Kastrin; Aleš Maver; Borut Peterlin

AbstractA great number of association studies have been performed to identify the genes involved in the etiology and prognosis of sarcoidosis. We performed a systematic review of case-control studies through the PubMed database and evaluated them for a possible inclusion into a meta-analysis in order to assess whether the reported genetic polymorphisms are the risk factors of sarcoidosis. Case-control studies with clear diagnostic criteria and interventions were included. Only investigations of a single polymorphism/gene involvement in sarcoidosis reported more than five times were selected. Aggregating data from 12 studies on ID/ACE polymorphisms, the odds ratio (OR) for sarcoidosis, if the polymorphism was considered under the dominant genetic model, was not significantly increased: 1.19 (95% CI 0.98-1.43); OR under the recessive model was 1.20 (95% CI 0.98-1.46). In seven case-control studies on −308/TNF-α polymorphism, the OR for sarcoidosis if the polymorphism considered under the dominant genetic model was significantly increased at 1.47 (95% CI 1.03-2.08); the OR under the recessive model was 1.39 (95% CI 0.67-2.90). In conclusion, the results showed that the TNF-α genotype could be a significant risk factor for sarcoidosis, whereas the risk of sarcoidosis due to the ACE genotype was not substantially elevated.

Reproductive Biomedicine Online | 2009

Association between genetic polymorphisms in cytokine genes and recurrent miscarriage – a meta-analysis

Igor Medica; Saša Ostojić; Nina Pereza; Andrej Kastrin; Borut Peterlin

A meta-analysis of association studies was performed to assess whether the reported genetic polymorphisms in cytokine genes are risk factors for recurrent miscarriage (RM). The electronic PubMed database was searched for case-control studies on immunity-related genes in RM. Investigations of a single polymorphism/gene involvement in RM reported more than five times were selected. Aggregating data from seven case-control studies on -308/tumour necrosis factor-alpha polymorphism, the odds ratio (OR) for RM was 1.1 (0.87-1.39) if the polymorphism was considered under a dominant genetic model. In six studies on -1082/interleukin-10 (IL-10) polymorphism, the OR under a dominant model was 0.76 (0.58-0.99), and under a recessive model the OR was 0.90 (0.71-1.15). In five case-control studies on -174/IL-6 polymorphism, the OR for RM under a recessive model was 1.29 (0.69-2.40). The results show a statistically significant association with RM for the -1082/IL-10 genotype.

Movement Disorders | 2009

Gene expression changes in blood as a putative biomarker for Huntington's disease†

Luca Lovrečić; Andrej Kastrin; Jan Kobal; Zvezdan Pirtošek; Dimitri Krainc; Borut Peterlin

Several studies demonstrated alterations of gene expression in blood in various neurological disorders including Huntingtons disease (HD). Using microarray technology, a recent study identified a large number of significantly altered mRNAs in HD blood, from which a 12‐gene set was selected as classifier for discriminating controls and HD patients. The aim of our study was to validate expression changes of these 12 genes in an independent cohort of HD patients and evaluate their sensitivity and specificity. Four different subject groups were included—patients with HD, Parkinsons disease (PD), acute ischemic stroke (AS) and healthy controls. Although the previous results were successfully validated, gene expression changes in HD blood partly overlapped with those observed in blood from PD and AS patients. Predictive value of the selected biomarker set for HD group was 78%, with 82% sensitivity and 53% specificity. Further gene expression analyses in longitudinal studies are needed to validate and refine possible transcriptomic blood biomarkers in HD.

BMC Bioinformatics | 2015

Biomedical question answering using semantic relations

Dimitar Hristovski; Dejan Dinevski; Andrej Kastrin; Thomas C. Rindflesch

BackgroundThe proliferation of the scientific literature in the field of biomedicine makes it difficult to keep abreast of current knowledge, even for domain experts. While general Web search engines and specialized information retrieval (IR) systems have made important strides in recent decades, the problem of accurate knowledge extraction from the biomedical literature is far from solved. Classical IR systems usually return a list of documents that have to be read by the user to extract relevant information. This tedious and time-consuming work can be lessened with automatic Question Answering (QA) systems, which aim to provide users with direct and precise answers to their questions. In this work we propose a novel methodology for QA based on semantic relations extracted from the biomedical literature.ResultsWe extracted semantic relations with the SemRep natural language processing system from 122,421,765 sentences, which came from 21,014,382 MEDLINE citations (i.e., the complete MEDLINE distribution up to the end of 2012). A total of 58,879,300 semantic relation instances were extracted and organized in a relational database. The QA process is implemented as a search in this database, which is accessed through a Web-based application, called SemBT (available at http://sembt.mf.uni-lj.si). We conducted an extensive evaluation of the proposed methodology in order to estimate the accuracy of extracting a particular semantic relation from a particular sentence. Evaluation was performed by 80 domain experts. In total 7,510 semantic relation instances belonging to 2,675 distinct relations were evaluated 12,083 times. The instances were evaluated as correct 8,228 times (68%).ConclusionsIn this work we propose an innovative methodology for biomedical QA. The system is implemented as a Web-based application that is able to provide precise answers to a wide range of questions. A typical question is answered within a few seconds. The tool has some extensions that make it especially useful for interpretation of DNA microarray results.

intelligent systems in molecular biology | 2009

Combining semantic relations and DNA microarray data for novel hypotheses generation

Dimitar Hristovski; Andrej Kastrin; Borut Peterlin; Thomas C. Rindflesch

Although microarray experiments have great potential to support progress in biomedical research, results are not easy to interpret. Information about the functions and relations of relevant genes needs to be extracted from the vast biomedical literature. A potential solution is to use computerized text analysis methods. Our proposal enhances these methods with semantic relations. We describe an application that integrates such relations with microarray results and discuss its benefits in supporting enhanced access to the relevant literature for interpretation of results and novel hypotheses generation. The application is available at http://sembt.mf.uni-lj.si

Expert Systems With Applications | 2010

Rasch-based high-dimensionality data reduction and class prediction with applications to microarray gene expression data

Andrej Kastrin; Borut Peterlin

Class prediction is an important application of microarray gene expression data analysis. The high-dimensionality of microarray data, where number of genes (variables) is very large compared to the number of samples (observations), makes the application of many prediction techniques (e.g., logistic regression, discriminant analysis) difficult. An efficient way to solve this problem is by using dimension reduction statistical techniques. Increasingly used in psychology-related applications, Rasch model (RM) provides an appealing framework for handling high-dimensional microarray data. In this paper, we study the potential of RM-based modeling in dimensionality reduction with binarized microarray gene expression data and investigate its prediction accuracy in the context of class prediction using linear discriminant analysis. Two different publicly available microarray data sets are used to illustrate a general framework of the approach. Performance of the proposed method is assessed by re-randomization scheme using principal component analysis (PCA) as a benchmark method. Our results show that RM-based dimension reduction is as effective as PCA-based dimension reduction. The method is general and can be applied to the other high-dimensional data problems.

PLOS ONE | 2014

Large-Scale Structure of a Network of Co-Occurring MeSH Terms: Statistical Analysis of Macroscopic Properties

Andrej Kastrin; Thomas C. Rindflesch; Dimitar Hristovski

Concept associations can be represented by a network that consists of a set of nodes representing concepts and a set of edges representing their relationships. Complex networks exhibit some common topological features including small diameter, high degree of clustering, power-law degree distribution, and modularity. We investigated the topological properties of a network constructed from co-occurrences between MeSH descriptors in the MEDLINE database. We conducted the analysis on two networks, one constructed from all MeSH descriptors and another using only major descriptors. Network reduction was performed using the Pearsons chi-square test for independence. To characterize topological properties of the network we adopted some specific measures, including diameter, average path length, clustering coefficient, and degree distribution. For the full MeSH network the average path length was 1.95 with a diameter of three edges and clustering coefficient of 0.26. The Kolmogorov-Smirnov test rejects the power law as a plausible model for degree distribution. For the major MeSH network the average path length was 2.63 edges with a diameter of seven edges and clustering coefficient of 0.15. The Kolmogorov-Smirnov test failed to reject the power law as a plausible model. The power-law exponent was 5.07. In both networks it was evident that nodes with a lower degree exhibit higher clustering than those with a higher degree. After simulated attack, where we removed 10% of nodes with the highest degrees, the giant component of each of the two networks contains about 90% of all nodes. Because of small average path length and high degree of clustering the MeSH network is small-world. A power-law distribution is not a plausible model for the degree distribution. The network is highly modular, highly resistant to targeted and random attack and with minimal dissortativity.

Methods of Information in Medicine | 2010

Chi-square-based Scoring Function for Categorization of MEDLINE Citations

Andrej Kastrin; Borut Peterlin; Dimitar Hristovski

OBJECTIVES Text categorization has been used in biomedical informatics for identifying documents containing relevant topics of interest. We developed a simple method that uses a chi-square-based scoring function to determine the likelihood of MEDLINE citations containing genetic relevant topic. METHODS Our procedure requires construction of a genetic and a nongenetic domain document corpus. We used MeSH descriptors assigned to MEDLINE citations for this categorization task. We compared frequencies of MeSH descriptors between two corpora applying chi-square test. A MeSH descriptor was considered to be a positive indicator if its relative observed frequency in the genetic domain corpus was greater than its relative observed frequency in the nongenetic domain corpus. The output of the proposed method is a list of scores for all the citations, with the highest score given to those citations containing MeSH descriptors typical for the genetic domain. RESULTS Validation was done on a set of 734 manually annotated MEDLINE citations. It achieved predictive accuracy of 0.87 with 0.69 recall and 0.64 precision. We evaluated the method by comparing it to three machine-learning algorithms (support vector machines, decision trees, naïve Bayes). Although the differences were not statistically significantly different, results showed that our chi-square scoring performs as good as compared machine-learning algorithms. CONCLUSIONS We suggest that the chi-square scoring is an effective solution to help categorize MEDLINE citations. The algorithm is implemented in the BITOLA literature-based discovery support system as a preprocessor for gene symbol disambiguation process.

Methods of Information in Medicine | 2016

Link Prediction on a Network of Co-occurring MeSH Terms: Towards Literature-based Discovery

Andrej Kastrin; T. C. Rindflesch; Dimitar Hristovski

OBJECTIVES Literature-based discovery (LBD) is a text mining methodology for automatically generating research hypotheses from existing knowledge. We mimic the process of LBD as a classification problem on a graph of MeSH terms. We employ unsupervised and supervised link prediction methods for predicting previously unknown connections between biomedical concepts. METHODS We evaluate the effectiveness of link prediction through a series of experiments using a MeSH network that contains the history of link formation between biomedical concepts. We performed link prediction using proximity measures, such as common neighbor (CN), Jaccard coefficient (JC), Adamic / Adar index (AA) and preferential attachment (PA). Our approach relies on the assumption that similar nodes are more likely to establish a link in the future. RESULTS Applying an unsupervised approach, the AA measure achieved the best performance in terms of area under the ROC curve (AUC = 0.76), followed by CN, JC, and PA. In a supervised approach, we evaluate whether proximity measures can be combined to define a model of link formation across all four predictors. We applied various classifiers, including decision trees, k-nearest neighbors, logistic regression, multilayer perceptron, naïve Bayes, and random forests. Random forest classifier accomplishes the best performance (AUC = 0.87). CONCLUSIONS The link prediction approach proved to be effective for LBD processing. Supervised statistical learning approaches clearly outperform an unsupervised approach to link prediction.

discovery science | 2014

Link Prediction on the Semantic MEDLINE Network

Andrej Kastrin; Thomas C. Rindflesch; Dimitar Hristovski

Retrieving and linking different segments of scientific information into understandable and interpretable knowledge is a challenging task. Literature-based discovery (LBD) is a methodology for automatically generating hypotheses for scientific research by uncovering hidden, previously unknown relationships from existing knowledge (published literature). Semantic MEDLINE is a database consisting of semantic predications extracted from MEDLINE citations. The predications provide a normalized form of the meaning of the text. The associations between the concepts in these predications can be described in terms of a network, consisting of nodes and directed arcs, where the nodes represent biomedical concepts and the arcs represent their semantic relationships. In this paper we propose and evaluate a methodology for link prediction of implicit relationships in the Semantic MEDLINE network. Link prediction was performed using different similarity measures including common neighbors, Jaccard index, and preferential attachment. The proposed approach is complementary to, and may augment, existing LBD approaches. The analyzed network consisted of 231,589 nodes and 10,061,747 directed arcs. The results showed high prediction performance, with the common neighbors method providing the best area under the ROC curve of 0.96.

Explore More