Konstantin Tretyakov | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Konstantin Tretyakov is active.

Explore More

Publication

Featured researches published by Konstantin Tretyakov.

conference on information and knowledge management | 2011

Fast fully dynamic landmark-based estimation of shortest path distances in very large graphs

Konstantin Tretyakov; Abel Armas-Cervantes; Luciano García-Bañuelos; Jaak Vilo; Marlon Dumas

Computing the shortest path between a pair of vertices in a graph is a fundamental primitive in graph algorithmics. Classical exact methods for this problem do not scale up to contemporary, rapidly evolving social networks with hundreds of millions of users and billions of connections. A number of approximate methods have been proposed, including several landmark-based methods that have been shown to scale up to very large graphs with acceptable accuracy. This paper presents two improvements to existing landmark-based shortest path estimation methods. The first improvement relates to the use of shortest-path trees (SPTs). Together with appropriate short-cutting heuristics, the use of SPTs allows to achieve higher accuracy with acceptable time and memory overhead. Furthermore, SPTs can be maintained incrementally under edge insertions and deletions, which allows for a fully-dynamic algorithm. The second improvement is a new landmark selection strategy that seeks to maximize the coverage of all shortest paths by the selected landmarks. The improved method is evaluated on the DBLP, Orkut, Twitter and Skype social networks.

Scientific Reports | 2015

Age-related profiling of DNA methylation in CD8+ T cells reveals changes in immune response and transcriptional regulator genes.

Liina Tserel; Maia Limbach; Konstantin Tretyakov; Silva Kasela; Kai Kisand; Mario Saare; Jaak Vilo; Andres Metspalu; Lili Milani; Pärt Peterson

Human ageing affects the immune system resulting in an overall decline in immunocompetence. Although all immune cells are affected during aging, the functional capacity of T cells is most influenced and is linked to decreased responsiveness to infections and impaired differentiation. We studied age-related changes in DNA methylation and gene expression in CD4+ and CD8+ T cells from younger and older individuals. We observed marked difference between T cell subsets, with increased number of methylation changes and higher methylome variation in CD8+ T cells with age. The majority of age-related hypermethylated sites were located at CpG islands of silent genes and enriched for repressive histone marks. Specifically, in CD8+ T cell subset we identified strong inverse correlation between methylation and expression levels in genes associated with T cell mediated immune response (LGALS1, IFNG, CCL5, GZMH, CCR7, CD27 and CD248) and differentiation (SATB1, TCF7, BCL11B and RUNX3). Our results thus suggest the link between age-related epigenetic changes and impaired T cell function.

Genome Biology | 2010

Comprehensive transcriptome analysis of mouse embryonic stem cell adipogenesis unravels new processes of adipocyte development

Nathalie Billon; Jüri Reimand; Miguel C. Monteiro; Meelis Kull; Hedi Peterson; Konstantin Tretyakov; Priit Adler; Brigitte Wdziekonski; Jaak Vilo; Christian Dani

BackgroundThe current epidemic of obesity has caused a surge of interest in the study of adipose tissue formation. While major progress has been made in defining the molecular networks that control adipocyte terminal differentiation, the early steps of adipocyte development and the embryonic origin of this lineage remain largely unknown.ResultsHere we performed genome-wide analysis of gene expression during adipogenesis of mouse embryonic stem cells (ESCs). We then pursued comprehensive bioinformatic analyses, including de novo functional annotation and curation of the generated data within the context of biological pathways, to uncover novel biological functions associated with the early steps of adipocyte development. By combining in-depth gene regulation studies and in silico analysis of transcription factor binding site enrichment, we also provide insights into the transcriptional networks that might govern these early steps.ConclusionsThis study supports several biological findings: firstly, adipocyte development in mouse ESCs is coupled to blood vessel morphogenesis and neural development, just as it is during mouse development. Secondly, the early steps of adipocyte formation involve major changes in signaling and transcriptional networks. A large proportion of the transcription factors that we uncovered in mouse ESCs are also expressed in the mouse embryonic mesenchyme and in adipose tissues, demonstrating the power of our approach to probe for genes associated with early developmental processes on a genome-wide scale. Finally, we reveal a plethora of novel candidate genes for adipocyte development and present a unique resource that can be further explored in functional assays.

BMC Genomics | 2013

Fast probabilistic file fingerprinting for big data

Konstantin Tretyakov; Sven Laur; Geert Smant; Jaak Vilo; Pjotr Prins

BackgroundBiological data acquisition is raising new challenges, both in data analysis and handling. Not only is it proving hard to analyze the data at the rate it is generated today, but simply reading and transferring data files can be prohibitively slow due to their size. This primarily concerns logistics within and between data centers, but is also important for workstation users in the analysis phase. Common usage patterns, such as comparing and transferring files, are proving computationally expensive and are tying down shared resources.ResultsWe present an efficient method for calculating file uniqueness for large scientific data files, that takes less computational effort than existing techniques. This method, called Probabilistic Fast File Fingerprinting (PFFF), exploits the variation present in biological data and computes file fingerprints by sampling randomly from the file instead of reading it in full. Consequently, it has a flat performance characteristic, correlated with data variation rather than file size. We demonstrate that probabilistic fingerprinting can be as reliable as existing hashing techniques, with provably negligible risk of collisions. We measure the performance of the algorithm on a number of data storage and access technologies, identifying its strengths as well as limitations.ConclusionsProbabilistic fingerprinting may significantly reduce the use of computational resources when comparing very large files. Utilisation of probabilistic fingerprinting techniques can increase the speed of common file-related workflows, both in the data center and for workbench analysis. The implementation of the algorithm is available as an open-source tool named pfff, as a command-line tool as well as a C library. The tool can be downloaded from http://biit.cs.ut.ee/pfff.

advances in social networks analysis and mining | 2012

Fraud Detection: Methods of Analysis for Hypergraph Data

Anna Leontjeva; Konstantin Tretyakov; Jaak Vilo; Taavi Tamkivi

Hyper graph is a data structure that captures many-to-many relations. It comes up in various contexts, one of those being the task of detecting fraudulent users of an on-line system given known associations between the users and types of activities they take part in. In this work we explore three approaches for applying general-purpose machine learning methods to such data. We evaluate the proposed approaches on a real-life dataset of customers and achieve promising results.

PLOS ONE | 2011

G = MAT: linking transcription factor expression and DNA binding data.

Konstantin Tretyakov; Sven Laur; Jaak Vilo

Transcription factors are proteins that bind to motifs on the DNA and thus affect gene expression regulation. The qualitative description of the corresponding processes is therefore important for a better understanding of essential biological mechanisms. However, wet lab experiments targeted at the discovery of the regulatory interplay between transcription factors and binding sites are expensive. We propose a new, purely computational method for finding putative associations between transcription factors and motifs. This method is based on a linear model that combines sequence information with expression data. We present various methods for model parameter estimation and show, via experiments on simulated data, that these methods are reliable. Finally, we examine the performance of this model on biological data and conclude that it can indeed be used to discover meaningful associations. The developed software is available as a web tool and Scilab source code at http://biit.cs.ut.ee/gmat/.

BMC Genomics | 2013

Summary of talks and papers at ISCB-Asia/SCCG 2012

Konstantin Tretyakov; Tatyana Goldberg; Victor X. Jin; Paul Horton

The second ISCB-Asia conference of the International Society for Computational Biology took place December 17-19, 2012, in Shenzhen, China. The conference was co-hosted by BGI as the first Shenzhen Conference on Computational Genomics (SCCG).45 talks were presented at ISCB-Asia/SCCG 2012. The topics covered included software tools, reproducible computing, next-generation sequencing data analysis, transcription and mRNA regulation, protein structure and function, cancer genomics and personalized medicine. Nine of the proceedings track talks are included as full papers in this supplement.In this report we first give a short overview of the conference by listing some statistics and visualizing the talk abstracts as word clouds. Then we group the talks by topic and briefly summarize each one, providing references to related publications whenever possible. Finally, we close with a few comments on the success of this conference.

Algorithms and Applications | 2010

An evolutionary model of DNA substring distribution

Meelis Kull; Konstantin Tretyakov; Jaak Vilo

DNA sequence analysis methods, such as motif discovery, gene detection or phylogeny reconstruction, can often provide important input for biological studies. Many of such methods require a backgroundmodel, representing the expected distribution of short substrings in a given DNA region. Most current techniques for modeling this distribution disregard the evolutionary processes underlying DNA formation. We propose a novel approach for modeling DNA k-mer distribution that is capable of taking the notions of evolution and natural selection into account. We derive a computionally tractable approximation for estimating k-mer probabilities at genetic equilibrium, given a description of evolutionary processes in terms of fitness and mutation probabilities. We assess the goodness of this approximation via numerical experiments. Besides providing a generative model for DNA sequences, our method has further applications in motif discovery.

Archive | 2004