Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sergei Chumakov is active.

Publication


Featured researches published by Sergei Chumakov.


FEBS Journal | 2006

Human-blind probes and primers for dengue virus identification: Exhaustive analysis of subsequences present in the human and 83 dengue genome sequences

Catherine Putonti; Sergei Chumakov; Rahul Mitra; George E. Fox; Richard C. Willson; Yuriy Fofanov

Reliable detection and identification of pathogens in complex biological samples, in the presence of contaminating DNA from a variety of sources, is an important and challenging diagnostic problem for the development of field tests. The problem is compounded by the difficulty of finding a single, unique genomic sequence that is present simultaneously in all genomes of a species of closely related pathogens and absent in the genomes of the host or the organisms that contribute to the sample background. Here we describe ‘host‐blind probe design’– a novel strategy of designing probes based on highly frequent genomic signatures found in the pathogen genomes of interest but absent from the host genome. Upon hybridization, an array of such informative probes will produce a unique pattern that is a genetic fingerprint for each pathogen strain. This multiprobe approach was applied to 83 dengue virus genome sequences, available in public databases, to design and perform in silico microarray experiments. The resulting patterns allow one to unequivocally distinguish the four major serotypes, and within each serotype to identify the most similar strain among those that have been completely sequenced. In an environment where dengue is indigenous, this would allow investigators to determine if a particular isolate belongs to an ongoing outbreak or is a previously circulating version. Using our probe set, the probability that misdiagnosis at the serotype level would occur is ≈ 1 : 10150.


Journal of Physics A | 1999

On the spectrum of a Hamiltonian defined on suq(2) and quantum optical models

Angel Ballesteros; Sergei Chumakov

Analytical expressions are given for the eigenvalues and eigenvectors of a Hamiltonian with suq(2) dynamical symmetry. The relevance of such an operator in quantum optics is discussed. As an application, the ground-state energy in the Dicke model is studied through suq(2) perturbation theory.


Bioinformatics | 2007

Effect of the mutation rate and background size on the quality of pathogen identification

Chris Reed; Viacheslav Y. Fofanov; Catherine Putonti; Sergei Chumakov; Tom Slezak; Yuriy Fofanov

MOTIVATION Genomic-based methods have significant potential for fast and accurate identification of organisms or even genes of interest in complex environmental samples (air, water, soil, food, etc.), especially when isolation of the target organism cannot be performed by a variety of reasons. Despite this potential, the presence of the unknown, variable and usually large quantities of background DNA can cause interference resulting in false positive outcomes. RESULTS In order to estimate how the genomic diversity of the background (total length of all of the different genomes present in the background), target length and target mutation rate affect the probability of misidentifications, we introduce a mathematical definition for the quality of an individual signature in the presence of a background based on its length and number of mismatches needed to transform the signature into the closest subsequence present in the background. This definition, in conjunction with a probabilistic framework, allows one to predict the minimal signature length required to identify the target in the presence of different sizes of backgrounds and the effect of the targets mutation rate on the quality of its identification. The model assumptions and predictions were validated using both Monte Carlo simulations and real genomic data examples. The proposed model can be used to determine appropriate signature lengths for various combinations of target and background genome sizes. It also predicted that any genomic signatures will be unable to identify target if its mutation rate is >5%. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Medical physics : ... Mexican symposium. Mexican Symposium on Medical Physics | 2006

Using mutual information to discover temporal patterns in gene expression data

Sergei Chumakov; Efren Ballesteros; Jorge E. Rodriguez Sanchez; Arturo Chávez; Meizhuo Zhang; B. Montgomery Pettit; Yuriy Fofanov

Finding relations among gene expressions involves the definition of the similarity between experimental data. A simplest similarity measure is the Correlation Coefficient. It is able to identify linear dependences only; moreover, is sensitive to experimental errors. An alternative measure, the Shannon Mutual Information (MI), is free from the above mentioned weaknesses. However, the calculation of MI for continuous variables from the finite number of experimental points, N, involves an ambiguity arising when one divides the range of values of the continuous variable into boxes. Then the distribution of experimental points among the boxes (and, therefore, MI) depends on the box size. An algorithm for the calculation of MI for continuous variables is proposed. We find the optimum box sizes for a given N from the condition of minimum entropy variation with respect to the change of the box sizes. We have applied this technique to the gene expression dataset from Stanford, containing microarray data at 18 time points from yeast Saccharomyces cerevisiae cultures (Spellman et al.,[3]). We calculated MI for all of the pairs of time points. The MI analysis allowed us to identify time patterns related to different biological processes in the cell.


BMC Genomics | 2016

The ability of human nuclear DNA to cause false positive low-abundance heteroplasmy calls varies across the mitochondrial genome

Levent Albayrak; Kamil Khanipov; Maria Pimenova; George Golovko; Mark Rojas; Ioannis T. Pavlidis; Sergei Chumakov; Gerardo Aguilar; Arturo Chávez; William R. Widger; Yuriy Fofanov

BackgroundLow-abundance mutations in mitochondrial populations (mutations with minor allele frequency ≤ 1%), are associated with cancer, aging, and neurodegenerative disorders. While recent progress in high-throughput sequencing technology has significantly improved the heteroplasmy identification process, the ability of this technology to detect low-abundance mutations can be affected by the presence of similar sequences originating from nuclear DNA (nDNA). To determine to what extent nDNA can cause false positive low-abundance heteroplasmy calls, we have identified mitochondrial locations of all subsequences that are common or similar (one mismatch allowed) between nDNA and mitochondrial DNA (mtDNA).ResultsPerformed analysis revealed up to a 25-fold variation in the lengths of longest common and longest similar (one mismatch allowed) subsequences across the mitochondrial genome. The size of the longest subsequences shared between nDNA and mtDNA in several regions of the mitochondrial genome were found to be as low as 11 bases, which not only allows using these regions to design new, very specific PCR primers, but also supports the hypothesis of the non-random introduction of mtDNA into the human nuclear DNA.ConclusionAnalysis of the mitochondrial locations of the subsequences shared between nDNA and mtDNA suggested that even very short (36 bases) single-end sequencing reads can be used to identify low-abundance variation in 20.4% of the mitochondrial genome. For longer (76 and 150 bases) reads, the proportion of the mitochondrial genome where nDNA presence will not interfere found to be 44.5 and 67.9%, when low-abundance mutations at 100% of locations can be identified using 417 bases long single reads. This observation suggests that the analysis of low-abundance variations in mitochondria population can be extended to a variety of large data collections such as NCBI Sequence Read Archive, European Nucleotide Archive, The Cancer Genome Atlas, and International Cancer Genome Consortium.


bioinformatics and biomedicine | 2015

CoCo: An application to store High-Throughput Sequencing data in compact text and binary file formats

Kamil Khanipov; Georgiy Golovko; Mark Rojas; Levent Albayrak; Otto Dobretsberger; Maria Pimenova; Nels Olson; Sergei Chumakov; Yuriy Fofanov

The storage, manipulation, and especially internet transfer of large amounts of data produced by High-Throughput Sequencing (HTS) instruments present major obstacles utilizing the full potential of this promising technology. The current standard is based on storing all data, which are produced in text (FASTQ and FASTA) and often stored in binary (SRA and BAM) formats. To date, significant effort has been devoted to efficiently compressing these cumbersome sequencing data sets in their existing formats. However, given the substantial improvements in the quality of HTS data, we believe that if one can afford to exclude low quality data and read headers, new much more compressed data formats can be used to reduce size of HTS data files by at least two orders of magnitude. Here we present several examples of file formats specifically designed to store only high quality sequencing reads in space efficient text and binary form. The basic principles used to decrease file size include storage of only one copy of a sequence when reads are present in multiple copies; alphabetical sorting of all reads and storage of only the differences (suffixes) between consecutive reads; and optimization of the number of bits/bytes required to store the information in binary formats. While file size reduction depends on properties of the sequencing data, the size of the resulting files can be as low as 0.1 %-5% of the original FASTQ, SRA, or BAM files. The greatest advantage of the proposed formats however, is based on its time and memory efficiency. The time required to convert reads from FASTQ/FAST A files into the proposed formats is up to 10 times faster than gzip and SRA. The conversion of files in the proposed formats back to FAST A is limited only by the time required to read the file from the hard drive. We present the source code of the C++ object (class) implemented to store, sort, and perform I/O operations with equal length subsequences; and two executable LINUX command line applications (CoCo and CoCo-PIus) able to work with all types of sequencing data including paired-end and flexible size reads. Source code, Linux executables, as well as user manual can be downloaded from http://bgl.utmb.edu/publications/34cocoplus.


PLOS ONE | 2015

Secondary Analysis of the NCI-60 Whole Exome Sequencing Data Indicates Significant Presence of Propionibacterium acnes Genomic Material in Leukemia (RPMI-8226) and Central Nervous System (SF-295, SF-539, and SNB-19) Cell Lines.

Mark Rojas; Georgiy Golovko; Kamil Khanipov; Levent Albayrak; Sergei Chumakov; B. Montgomery Pettitt; Alex Y. Strongin; Yuriy Fofanov

The NCI-60 human tumor cell line panel has been used in a broad range of cancer research over the last two decades. A landmark 2013 whole exome sequencing study of this panel added an exceptional new resource for cancer biologists. The complementary analysis of the sequencing data produced by this study suggests the presence of Propionibacterium acnes genomic sequences in almost half of the datasets, with the highest abundance in the leukemia (RPMI-8226) and central nervous system (SF-295, SF-539, and SNB-19) cell lines. While the origin of these contaminating bacterial sequences remains to be determined, observed results suggest that computational control for the presence of microbial genomic material is a necessary step in the analysis of the high throughput sequencing (HTS) data.


MEDICAL PHYSICS: Ninth Mexican Symposium on Medical Physics | 2006

Statistical properties of short subsequences in microbial genomes and their link to pathogen identification and evolution

Meizhuo Zhang; Catherine Putonti; Sergei Chumakov; Adhish Gupta; George E. Fox; Dan Graur; Yuriy Fofanov

Numerous sequencing projects have unveiled partial and full microbial genomes. The data produced far exceeds one person’s analytical capabilities and thus requires the power of computing. A significant amount of work has focused on the diversity of statistical characteristics along microbial genomic sequences, e.g. codon bias, G+C content, the frequencies of short subsequences (n‐mers), etc. Based upon the results of these studies, two observations were made: (1) there exists a correlation between regions of unusual statistical properties, e.g. difference in codon bias, etc., from the rest of the genomic sequence, and evolutionary significant regions, e.g. regions of horizontal gene transfer; and (2) because no two microbial genomes look statistically identical, statistical properties can be used to distinguish between genomic sequences. Recently, we conducted extensive analysis on the presence/absence of n‐mers for many microbial genomes as well as several viral and eukaryotic genomes. This analysis reve...


bioRxiv | 2018

Using High Throughput DNA Sequencing to Evaluate the Accuracy of Serial Dilution Based Tests of Microbial Activities in Oil Pipelines

Kamil Khanipov; George Golovko; Mark Rojas; Maria Pimenova; Levent Albayrak; Sergei Chumakov; Renato Duarte; William R. Widger; Tom Pickthall; Yuriy Fofanov

Microbial activities have detrimental effects on industrial infrastructure. If not controlled, microbial presence can result in corrosion, biofilm formation, and product degradation. Serial dilution tests are routinely used for evaluating presence and abundance of microorganisms by diluting samples and culturing microbes in specific media designed to support microorganisms with particular properties, such as sulfate reduction. A high-throughput sequencing approach was used to evaluate changes in microbial composition during four standard serial dilution tests. Analysis of 159 isolates revealed significant differences in the microbial compositions of sequential serial dilution titers and identified several cases where: (a) bacteria known to have a detrimental metabolic function (such as acid production) were lost in the serial dilution medium designed to test for this function; (b) bacteria virtually absent in the original sample became dominant in the serial dilution medium. These observations raise concerns regarding the accuracy and overall usefulness of serial dilution tests.


MEDICAL PHYSICS: Ninth Mexican Symposium on Medical Physics | 2006

Regions of Unusual Statistical Properties as Tools in the Search for Horizontally Transferred Genes in Escherichia coli

Catherine Putonti; Sergei Chumakov; Arturo Chávez; Yi Luo; Dan Graur; George E. Fox; Yuriy Fofanov

The observed diversity of statistical characteristics along genomic sequences is the result of the influences of a variety of ongoing processes including horizontal gene transfer, gene loss, genome rearrangements, and evolution. The rate at which various processes affect the genome typically varies between different genomic regions. Thus, variations in statistical properties seen in different regions of a genome are often associated with its evolution and functional organization. Analysis of such properties is therefore relevant to many ongoing biomedical research efforts. Similarity Plot or S‐plot is a Windows‐based application for large‐scale comparisons and 2D visualization of similarities between genomic sequences. This application combines two approaches wildly used in genomics: window analysis of statistical characteristics along genomes and dot‐plot visual representation. S‐plot is effective in detecting highly similar regions between two genomes. Within a single genome, S‐plot has the ability to i...

Collaboration


Dive into the Sergei Chumakov's collaboration.

Top Co-Authors

Avatar

Yuriy Fofanov

University of Texas Medical Branch

View shared research outputs
Top Co-Authors

Avatar

A. B. Klimov

University of Guadalajara

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kamil Khanipov

University of Texas Medical Branch

View shared research outputs
Top Co-Authors

Avatar

Levent Albayrak

University of Texas Medical Branch

View shared research outputs
Top Co-Authors

Avatar

Mark Rojas

University of Texas Medical Branch

View shared research outputs
Top Co-Authors

Avatar

Maria Pimenova

University of Texas Medical Branch

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Arturo Chávez

University of Guadalajara

View shared research outputs
Researchain Logo
Decentralizing Knowledge