Mile Šikić
University of Zagreb
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mile Šikić.
PLOS Computational Biology | 2009
Mile Šikić; Sanja Tomić; Kristian Vlahoviček
Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i) a combination of sequence- and structure-derived parameters and (ii) sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras–Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information.
Acta Crystallographica Section D-biological Crystallography | 2008
Ivan Dokmanić; Mile Šikić; Sanja Tomić
Metal ions are constituents of many metalloproteins, in which they have either catalytic (metalloenzymes) or structural functions. In this work, the characteristics of various metals were studied (Cu, Zn, Mg, Mn, Fe, Co, Ni, Cd and Ca in proteins with known crystal structure) as well as the specificity of their environments. The analysis was performed on two data sets: the set of protein structures in the Protein Data Bank (PDB) determined with resolution <1.5 A and the set of nonredundant protein structures from the PDB. The former was used to determine the distances between each metal ion and its electron donors and the latter was used to assess the preferred coordination numbers and common combinations of amino-acid residues in the neighbourhood of each metal. Although the metal ions considered predominantly had a valence of two, their preferred coordination number and the type of amino-acid residues that participate in the coordination differed significantly from one metal ion to the next. This study concentrates on finding the specificities of a metal-ion environment, namely the distribution of coordination numbers and the amino-acid residue types that frequently take part in coordination. Furthermore, the correlation between the coordination number and the occurrence of certain amino-acid residues (quartets and triplets) in a metal-ion coordination sphere was analysed. The results obtained are of particular value for the identification and modelling of metal-binding sites in protein structures derived by homology modelling. Knowledge of the geometry and characteristics of the metal-binding sites in metalloproteins of known function can help to more closely determine the biological activity of proteins of unknown function and to aid in design of proteins with specific affinity for certain metals.
BMC Structural Biology | 2008
Josip Mihel; Mile Šikić; Sanja Tomić; Branko Jeren; Kristian Vlahoviček
BackgroundPSAIA (Protein Structure and Interaction Analyzer) was developed to compute geometric parameters for large sets of protein structures in order to predict and investigate protein-protein interaction sites.ResultsIn addition to most relevant established algorithms, PSAIA offers a new method PIADA (Protein Interaction Atom Distance Algorithm) for the determination of residue interaction pairs. We found that PIADA produced more satisfactory results than comparable algorithms implemented in PSAIA.Particular advantages of PSAIA include its capacity to combine different methods to detect the locations and types of interactions between residues and its ability, without any further automation steps, to handle large numbers of protein structures and complexes. Generally, the integration of a variety of methods enables PSAIA to offer easier automation of analysis and greater reliability of results.PSAIA can be used either via a graphical user interface or from the command-line. Results are generated in either tabular or XML format.ConclusionIn a straightforward fashion and for large sets of protein structures, PSAIA enables the calculation of protein geometric parameters and the determination of location and type for protein-protein interaction sites. XML formatted output enables easy conversion of results to various formats suitable for statistic analysis.Results from smaller data sets demonstrated the influence of geometry on protein interaction sites. Comprehensive analysis of properties of large data sets lead to new information useful in the prediction of protein-protein interaction sites.
Nature Communications | 2016
Ivan Sović; Mile Šikić; Andreas Wilm; Shannon Nicole Fenlon; Swaine L. Chen; Niranjan Nagarajan
Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%). Evaluation on MinION sequencing data sets against short- and long-read mappers indicates that GraphMap increases mapping sensitivity by 10–80% and maps >95% of bases. GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap.
Nature Protocols | 2016
Robert Vaser; Swarnaseetha Adusumalli; Sim Ngak Leng; Mile Šikić; Pauline C Ng
The SIFT (sorting intolerant from tolerant) algorithm helps bridge the gap between mutations and phenotypic variations by predicting whether an amino acid substitution is deleterious. SIFT has been used in disease, mutation and genetic studies, and a protocol for its use has been previously published with Nature Protocols. This updated protocol describes SIFT 4G (SIFT for genomes), which is a faster version of SIFT that enables practical computations on reference genomes. Users can get predictions for single-nucleotide variants from their organism of interest using the SIFT 4G annotator with SIFT 4Gs precomputed databases. The scope of genomic predictions is expanded, with predictions available for more than 200 organisms. Users can also run the SIFT 4G algorithm themselves. SIFT predictions can be retrieved for 6.7 million variants in 4 min once the database has been downloaded. If precomputed predictions are not available, the SIFT 4G algorithm can compute predictions at a rate of 2.6 s per protein sequence. SIFT 4G is available from http://sift-dna.org/sift4g.
Cell Reports | 2012
Kuljeet Singh Sandhu; Guoliang Li; Huay Mei Poh; Yu Ling Kelly Quek; Yee Yen Sia; Su Qin Peh; Fabianus Hendriyan Mulawadi; Joanne Lim; Mile Šikić; Francesca Menghi; Anbupalam Thalamuthu; Wing-Kin Sung; Xiaoan Ruan; Melissa J. Fullwood; Edison T. Liu; Péter Csermely; Yijun Ruan
SUMMARY Chromatin interactions play important roles in transcription regulation. To better understand the underlying evolutionary and functional constraints of these interactions, we implemented a systems approach to examine RNA polymerase-II-associated chromatin interactions in human cells. We found that 40% of the total genomic elements involved in chromatin interactions converged to a giant, scale-free-like, hierarchical network organized into chromatin communities. The communities were enriched in specific functions and were syntenic through evolution. Disease-associated SNPs from genome-wide association studies were enriched among the nodes with fewer interactions, implying their selection against deleterious interactions by limiting the total number of interactions, a model that we further reconciled using somatic and germline cancer mutation data. The hubs lacked disease-associated SNPs, constituted a nonrandomly interconnected core of key cellular functions, and exhibited lethality in mouse mutants, supporting an evolutionary selection that favored the nonrandom spatial clustering of the least-evolving key genomic domains against random genetic or transcriptional errors in the genome. Altogether, our analyses reveal a systems-level evolutionary framework that shapes functionally compartmentalized and error-tolerant transcriptional regulation of human genome in three dimensions.
Genome Research | 2017
Robert Vaser; Ivan Sović; Niranjan Nagarajan; Mile Šikić
The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource-intensive error-correction and consensus-generation steps to obtain high-quality assemblies. We show that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment-based, stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore data sets, we show that Racon coupled with miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster.
Bioinformatics | 2013
Matija Korpar; Mile Šikić
Summary: We propose SW#, a new CUDA graphical processor unit-enabled and memory-efficient implementation of dynamic programming algorithm, for local alignment. It can be used as either a stand-alone application or a library. Although there are other graphical processor unit implementations of the Smith–Waterman algorithm, SW# is the only one publicly available that can produce sequence alignments on genome-wide scale. For long sequences, it is at least a few hundred times faster than a CPU version of the same algorithm. Availability: Source code and installation instructions freely available for download at http://complex.zesoi.fer.hr/SW.html. Contact: [email protected] Supplementary information: Supplementary results are available at Bioinformatics online.
Physical Review Letters | 2015
Nino Antulov-Fantulin; Alen Lancic; Tomislav Šmuc; Hrvoje Stefancic; Mile Šikić
The detection of an epidemic source or the patient zero is an important practical problem that can help in developing the epidemic control strategies. In this paper, we study the statistical inference problem of detecting the source of epidemics from a snapshot of a contagion spreading process at some time on an arbitrary network structure. By using exact analytic calculations and Monte Carlo simulations, we demonstrate the detectability limits for the SIR model, which primarily depend on the spreading process characteristics. We introduce an efficient Bayesian Monte Carlo source probability estimator and compare its performance against state-of-the-art approaches. Finally, we demonstrate the applicability of the approach in a realistic setting of an epidemic spreading over an empirical temporal network of sexual interactions. Introduction The majority of biological, technological, social and information systems structures can be represented as a complex network [1, 2, 3]. The most prevalent type of dynamic processes of public interest characteristic for the real-life complex networks are contagion processes [4]. Different mathematical methods have been used to study the epidemic spreading on complex networks including the bond percolation method [7, 8], the mean-field approach [9, 10], the reaction-diffusion processes [11, 12], pair and the master equation approximations [13] as well as models with the complex compartmental structure along with population mobility dynamics [14]. Epidemiologists detect the epidemic source or the patient-zero either by analysing the temporal genetic evolution of virus strains [5] or try to do a contact backtracking [6] from the available observed data. However, in cases where the information on the times of contact is unknown, or incomplete, the backtracking method is no longer adequate. This 1 ar X iv :1 40 6. 29 09 v1 [ cs .S I] 1 1 Ju n 20 14 becomes especially hard if the only data available is a static snapshot of a epidemic process at some time. Even in cases when we do have some information on the times of contact, the longer the recovery time and subtler the symptoms the harder it becomes to establish the proper ordering of the transmissions that have occurred. Due to its practical aspects and theoretical importance, the epidemic source detection problem on contact networks has recently gained a lot of attention in complex network science community. This has led to the development of many different source detection estimators for static networks, which vary in their assumptions on the network structure and the spreading process models [15, 16, 17, 18, 19, 20, 21, 22, 23]. For the source detection with the SI model the following interesting results have been obtained. Zaman et. al. developed a rumor centrality measure, which is the maximum likelihood estimator for regular trees under the SI model [15]. Dong et. al. also studied the problem of rooting the rumor source with the SI model and demonstrated the asymptotic source detection probability on regular tree-type networks [16]. Comin et. al. compared the different centrality measures e.g. the degree, the betweenness, the closeness and the eigenvector centrality as the source detection estimators [22]. Wang et. al. addressed the problem of source estimation from multiple observations under the SI model [17]. Pinto et. al. used the SI model and assumed that the direction and the times of the infection are known exactly, and solved diffusion tree problem using breadth first search from sparsely placed observers [19]. In the case of the SIR model there are two different approaches. Zhu et. al. adopted the SIR model and proposed a sample path counting approach for the source detection [18]. They proved that the source node on infinite trees minimizes the maximum distance (Jordan centrality) to the infected nodes. Lokhov et. al. used a dynamic message-passing algorithm (DMP) for the SIR model to estimate the probability that a given node produces the observed snapshot. They use a mean-field-like approximation (independence approximation) and an assumption of a tree-like contact network to compute the marginal probabilities [20]. The main contributions of the paper are the following: (i) given the non-uniqueness of finding a single epidemic source of the SIR realization on general networks, we turn the problem to finding a source probability distribution, which is a well-posed problem; (ii) we develop the analytic combinatoric and the direct Monte-Carlo approaches for determining theoretical source probability distribution and produce the benchmark solutions on the 4-connected lattice; (iii) we measure the source detectability by using the normalized Shannon entropy of the estimated source probability distribution for each of the source detection problems and observe the existence of the highly detectable and the highly undetectable regimes; (iv) using the above insights, we construct the Soft Margin epidemic source detection estimator for the arbitrary networks (static and temporal) and show that it is robust and more accurate than the state-ofthe-art approaches and much faster than the analytic combinatoric or the direct Monte-Carlo approach; (v) by using the simulations of the sexually transmitted disease (STD) model on a realistic time interval of 200 days on an empirical temporal network of sexual contacts (see the network visualization in Figure 4, plot C) we demonstrate the robustness to the uncertainty in the epidemic starting time, the network interaction orderings and in incompleteness of observations. Although we use the SIR model of epidemic spreading, our algorithms are easily applicable to other compartmental models, e.g. SI and SEIR and all other compartmental models where the states cannot be recurrent. 1 Detectability limits The main goals of this work are to better understand the nature of the epidemic source detection problem in networks, characterize its complexity, and develop efficient algorithms for estimating source probability distribution. Next, we introduce the terminology and formalize the problem. In a general case, the contactnetwork during an epidemic process can be temporal
Bioinformatics | 2016
Ivan Sović; Krešimir Križanović; Karolj Skala; Mile Šikić
MOTIVATION Recent emergence of nanopore sequencing technology set a challenge for established assembly methods. In this work, we assessed how existing hybrid and non-hybrid de novo assembly methods perform on long and error prone nanopore reads. RESULTS We benchmarked five non-hybrid (in terms of both error correction and scaffolding) assembly pipelines as well as two hybrid assemblers which use third generation sequencing data to scaffold Illumina assemblies. Tests were performed on several publicly available MinION and Illumina datasets of Escherichia coli K-12, using several sequencing coverages of nanopore data (20×, 30×, 40× and 50×). We attempted to assess the assembly quality at each of these coverages, in order to estimate the requirements for closed bacterial genome assembly. For the purpose of the benchmark, an extensible genome assembly benchmarking framework was developed. Results show that hybrid methods are highly dependent on the quality of NGS data, but much less on the quality and coverage of nanopore data and perform relatively well on lower nanopore coverages. All non-hybrid methods correctly assemble the E. coli genome when coverage is above 40×, even the non-hybrid method tailored for Pacific Biosciences reads. While it requires higher coverage compared to a method designed particularly for nanopore reads, its running time is significantly lower. AVAILABILITY AND IMPLEMENTATION https://github.com/kkrizanovic/NanoMark CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.