Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Svetlana Karamycheva is active.

Publication


Featured researches published by Svetlana Karamycheva.


Bioinformatics | 2003

TIGR gene indices clustering tools (TGICL): a software system for fast clustering of large est datasets

Geo Pertea; Xiaoqiu Huang; Feng Liang; Valentin Antonescu; Razvan Sultana; Svetlana Karamycheva; Yuandan Lee; Joseph White; Foo Cheung; Babak Parvizi; Jennifer Tsai; John Quackenbush

TGICL is a pipeline for analysis of large Expressed Sequence Tags (EST) and mRNA databases in which the sequences are first clustered based on pairwise sequence similarity, and then assembled by individual clusters (optionally with quality values) to produce longer, more complete consensus sequences. The system can run on multi-CPU architectures including SMP and PVM.


Nucleic Acids Research | 2001

The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species

John Quackenbush; Jennifer Cho; Daniel Lee; Feng Liang; Ingeborg Holt; Svetlana Karamycheva; Babak Parvizi; Geo Pertea; Razvan Sultana; Joseph White

While genome sequencing projects are advancing rapidly, EST sequencing and analysis remains a primary research tool for the identification and categorization of gene sequences in a wide variety of species and an important resource for annotation of genomic sequence. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi. shtml) are a collection of species-specific databases that use a highly refined protocol to analyze EST sequences in an attempt to identify the genes represented by that data and to provide additional information regarding those genes. Gene Indices are constructed by first clustering, then assembling EST and annotated gene sequences from GenBank for the targeted species. This process produces a set of unique, high-fidelity virtual transcripts, or Tentative Consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to mapping and genomic sequence data, to provide links between orthologous and paralogous genes and as a resource for comparative sequence analysis.


Nature Genetics | 2000

Gene Index analysis of the human genome estimates approximately 120,000 genes

Feng Liang; Ingeborg Holt; Geo Pertea; Svetlana Karamycheva; John Quackenbush

Although sequencing of the human genome will soon be completed, gene identification and annotation remains a challenge. Early estimates suggested that there might be 60,000–100,000 (ref. 1) human genes, but recent analyses of the available data from EST sequencing projects have estimated as few as 45,000 (ref. 2) or as many as 140,000 (ref. 3) distinct genes. The Chromosome 22 Sequencing Consortium estimated a minimum of 45,000 genes based on their annotation of the complete chromosome, although their data suggests there may be additional genes. The nearly 2,000,000 human ESTs in dbEST provide an important resource for gene identification and genome annotation, but these single-pass sequences must be carefully analysed to remove contaminating sequences, including those from genomic DNA, spurious transcription, and vector and bacterial sequences. We have developed a highly refined and rigorously tested protocol for cleaning, clustering and assembling EST sequences to produce high-fidelity consensus sequences for the represented genes (F.L. et al., manuscript submitted) and used this to create the TIGR Gene Indices—databases of expressed genes for human, mouse, rat and other species (http://www.tigr.org/tdb/tgi.html). Using highly refined and tested algorithms for EST analysis, we have arrived at two independent estimates indicating the human genome contains approximately 120,000 genes.


Nucleic Acids Research | 2004

The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes.

Yuandan Lee; Jennifer Tsai; Sirisha Sunkara; Svetlana Karamycheva; Geo Pertea; Razvan Sultana; Valentin Antonescu; Agnes P. Chan; Foo Cheung; John Quackenbush

Although the list of completed genome sequencing projects has expanded rapidly, sequencing and analysis of expressed sequence tags (ESTs) remain a primary tool for discovery of novel genes in many eukaryotes and a key element in genome annotation. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi) are a collection of 77 species-specific databases that use a highly refined protocol to analyze gene and EST sequences in an attempt to identify and characterize expressed transcripts and to present them on the Web in a user-friendly, consistent fashion. A Gene Index database is constructed for each selected organism by first clustering, then assembling EST and annotated cDNA and gene sequences from GenBank. This process produces a set of unique, high-fidelity virtual transcripts, or tentative consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to genetic and physical maps, to provide links to orthologous and paralogous genes, and as a resource for comparative and functional genomic analysis.


Plant Physiology | 2003

Comparative Analyses of Potato Expressed Sequence Tag Libraries

Catherine M. Ronning; Svetlana Stegalkina; Robert A. Ascenzi; Oleg Bougri; Amy L. Hart; Teresa R. Utterbach; Susan E. Vanaken; Steve B. Riedmuller; Joseph White; Jennifer Cho; Geo Pertea; Yuandan Lee; Svetlana Karamycheva; Razvan Sultana; Jennifer Tsai; John Quackenbush; H. M. Griffiths; Silvia Restrepo; Christine D. Smart; William E. Fry; Rutger Van der Hoeven; Steve Tanksley; Peifen Zhang; Hailing Jin; Miki L. Yamamoto; Barbara Baker; C. Robin Buell

The cultivated potato (Solanum tuberosum) shares similar biology with other members of the Solanaceae, yet has features unique within the family, such as modified stems (stolons) that develop into edible tubers. To better understand potato biology, we have undertaken a survey of the potato transcriptome using expressed sequence tags (ESTs) from diverse tissues. A total of 61,940 ESTs were generated from aerial tissues, below-ground tissues, and tissues challenged with the late-blight pathogen (Phytophthora infestans). Clustering and assembly of these ESTs resulted in a total of 19,892 unique sequences with 8,741 tentative consensus sequences and 11,151 singleton ESTs. We were able to identify a putative function for 43.7% of these sequences. A number of sequences (48) were expressed throughout the libraries sampled, representing constitutively expressed sequences. Other sequences (13,068, 21%) were uniquely expressed and were detected only in a single library. Using hierarchal and k means clustering of the EST sequences, we were able to correlate changes in gene expression with major physiological events in potato biology. Using pair-wise comparisons of tuber-related tissues, we were able to associate genes with tuber initiation, dormancy, and sprouting. We also were able to identify a number of characterized as well as novel sequences that were unique to the incompatible interaction of late-blight pathogen, thereby providing a foundation for further understanding the mechanism of resistance.


Genome Biology | 2001

RESOURCERER: a database for annotating and linking microarray resources within and across species

Jennifer Tsai; Razvan Sultana; Yudan Lee; Geo Pertea; Svetlana Karamycheva; Valentin Antonescu; Jennifer Cho; Babak Parvizi; Foo Cheung; John Quackenbush

Microarray expression analysis is providing unprecedented data on gene expression in humans and mammalian model systems. Although such studies provide a tremendous resource for understanding human disease states, one of the significant challenges is cross-referencing the data derived from different species, across diverse expression analysis platforms, in order to properly derive inferences regarding gene expression and disease state. To address this problem, we have developed RESOURCERER, a microarray-resource annotation and cross-reference database built using the analysis of expressed sequence tags (ESTs) and gene sequences provided by the TIGR Gene Index (TGI) and TIGR Orthologous Gene Alignment (TOGA) databases [now called Eukaryotic Gene Orthologs (EGO)].


Cytogenetic and Genome Research | 2003

Sequence analysis of a rainbow trout cDNA library and creation of a gene index

C.E. Rexroad; Yuandan Lee; J. W. Keele; Svetlana Karamycheva; G. Brown; B. Koop; S.A. Gahr; Y. Palti; John Quackenbush

Expressed sequence tag (EST) projects have produced extremely valuable resources for identifying genes affecting phenotypes of interest. A large-scale EST sequencing project for rainbow trout was initiated to identify and functionally annotate as many unique transcripts as possible. Over 45,000 5′ ESTs were obtained by sequencing clones from a single normalized library constructed using mRNA from six tissues. The production of this sequence data and creation of a rainbow trout Gene Index eliminating redundancy and providing annotation for these sequences will facilitate research in this species.


Nucleic Acids Research | 2015

Araport: the Arabidopsis Information Portal

Vivek Krishnakumar; Matthew R. Hanlon; Sergio Contrino; Erik S. Ferlanti; Svetlana Karamycheva; Maria Kim; Benjamin D. Rosen; Chia Yi Cheng; Walter Moreira; Stephen A. Mock; Joe Stubbs; Julie Sullivan; Konstantinos Krampis; Jason R. Miller; Gos Micklem; Matthew W. Vaughn; Christopher D. Town

The Arabidopsis Information Portal (https://www.araport.org) is a new online resource for plant biology research. It houses the Arabidopsis thaliana genome sequence and associated annotation. It was conceived as a framework that allows the research community to develop and release ‘modules’ that integrate, analyze and visualize Arabidopsis data that may reside at remote sites. The current implementation provides an indexed database of core genomic information. These data are made available through feature-rich web applications that provide search, data mining, and genome browser functionality, and also by bulk download and web services. Araport uses software from the InterMine and JBrowse projects to expose curated data from TAIR, GO, BAR, EBI, UniProt, PubMed and EPIC CoGe. The site also hosts ‘science apps,’ developed as prototypes for community modules that use dynamic web pages to present data obtained on-demand from third-party servers via RESTful web services. Designed for sustainability, the Arabidopsis Information Portal strategy exploits existing scientific computing infrastructure, adopts a practical mixture of data integration technologies and encourages collaborative enhancement of the resource by its user community.


Nature Communications | 2016

Local admixture of amplified and diversified secreted pathogenesis determinants shapes mosaic Toxoplasma gondii genomes

Hernan Lorenzi; Asis Khan; Michael S. Behnke; Sivaranjani Namasivayam; Lakshmipuram S. Swapna; Michalis Hadjithomas; Svetlana Karamycheva; Deborah F. Pinney; Brian P. Brunk; James W. Ajioka; Daniel Ajzenberg; John C. Boothroyd; Jon P. Boyle; Marie Laure Dardé; Maria A. Diaz-Miranda; J. P. Dubey; Heather M. Fritz; Solange Maria Gennari; Brian D. Gregory; Kami Kim; Jeroen Saeij; C. Su; Michael W. White; Xing Quan Zhu; Daniel K. Howe; Benjamin M. Rosenthal; Michael E. Grigg; John Parkinson; Liang Liu; Jessica C. Kissinger

Toxoplasma gondii is among the most prevalent parasites worldwide, infecting many wild and domestic animals and causing zoonotic infections in humans. T. gondii differs substantially in its broad distribution from closely related parasites that typically have narrow, specialized host ranges. To elucidate the genetic basis for these differences, we compared the genomes of 62 globally distributed T. gondii isolates to several closely related coccidian parasites. Our findings reveal that tandem amplification and diversification of secretory pathogenesis determinants is the primary feature that distinguishes the closely related genomes of these biologically diverse parasites. We further show that the unusual population structure of T. gondii is characterized by clade-specific inheritance of large conserved haploblocks that are significantly enriched in tandemly clustered secretory pathogenesis determinants. The shared inheritance of these conserved haploblocks, which show a different ancestry than the genome as a whole, may thus influence transmission, host range and pathogenicity.


BMC Microbiology | 2009

Whole genome single nucleotide polymorphism based phylogeny of Francisella tularensis and its application to the development of a strain typing assay

Gagan A Pandya; Michael H. Holmes; Jeannine M. Petersen; Sonal Pradhan; Svetlana Karamycheva; Mark J. Wolcott; Claudia R. Molins; Marcus B. Jones; Martin E. Schriefer; Robert D. Fleischmann; Scott N. Peterson

BackgroundA low genetic diversity in Francisella tularensis has been documented. Current DNA based genotyping methods for typing F. tularensis offer a limited and varying degree of subspecies, clade and strain level discrimination power. Whole genome sequencing is the most accurate and reliable method to identify, type and determine phylogenetic relationships among strains of a species. However, lower cost typing schemes are necessary in order to enable typing of hundreds or even thousands of isolates.ResultsWe have generated a high-resolution phylogenetic tree from 40 Francisella isolates, including 13 F. tularensis subspecies holarctica (type B) strains, 26 F. tularensis subsp. tularensis (type A) strains and a single F. novicida strain. The tree was generated from global multi-strain single nucleotide polymorphism (SNP) data collected using a set of six Affymetrix GeneChip® resequencing arrays with the non-repetitive portion of LVS (type B) as the reference sequence complemented with unique sequences of SCHU S4 (type A). Global SNP based phylogenetic clustering was able to resolve all non-related strains. The phylogenetic tree was used to guide the selection of informative SNPs specific to major nodes in the tree for development of a genotyping assay for identification of F. tularensis subspecies and clades. We designed and validated an assay that uses these SNPs to accurately genotype 39 additional F. tularensis strains as type A (A1, A2, A1a or A1b) or type B (B1 or B2).ConclusionWhole-genome SNP based clustering was shown to accurately identify SNPs for differentiation of F. tularensis subspecies and clades, emphasizing the potential power and utility of this methodology for selecting SNPs for typing of F. tularensis to the strain level. Additionally, whole genome sequence based SNP information gained from a representative population of strains may be used to perform evolutionary or phylogenetic comparisons of strains, or selection of unique strains for whole-genome sequencing projects.

Collaboration


Dive into the Svetlana Karamycheva's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Geo Pertea

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar

Feng Liang

J. Craig Venter Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yuandan Lee

J. Craig Venter Institute

View shared research outputs
Top Co-Authors

Avatar

Foo Cheung

J. Craig Venter Institute

View shared research outputs
Top Co-Authors

Avatar

Ingeborg Holt

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Jennifer Cho

J. Craig Venter Institute

View shared research outputs
Top Co-Authors

Avatar

Jennifer Tsai

J. Craig Venter Institute

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge