With the development of new sequencing technologies, sequencing costs dropped dramatically from 2008 to 2012, making transcriptome assembly an ideal choice for research. In the past, the cost of genome sequencing prevented many non-model organisms from receiving sufficient attention, but this has all changed with the introduction of high-throughput sequencing technology (i.e., next-generation sequencing technology). The development of these technologies has not only reduced costs but also improved work efficiency, enabling the expansion of research objects to a wider range of non-model organisms. For example, brain transcriptomes of chickpeas, flatworms, Hawaiian blue crabs, and Nile crocodiles, corn snakes, bearded dragons, and red-eared sliders have been assembled and analyzed.
Examining non-model organisms can provide new insights into the mechanisms of the "fascinating morphological innovations" that allow life on Earth to flourish.
In the plant and animal kingdoms, many "innovations" such as mimicry, symbiosis, parasitism and asexual reproduction cannot be tested in common model organisms. Because transcriptome assemblies are generally cheaper and simpler than genomes, this approach is often the best choice for studying non-model organisms. The transcriptomes of these organisms may reveal new proteins and their variant forms associated with these unique biological phenomena.
Comparison of transcriptome and genome assembliesAn assembled set of transcripts is essential for initial gene expression studies. Before the development of computational programs for transcriptome assembly, transcriptome data were mainly analyzed by mapping to a reference genome. Although genome alignment is a robust method for characterizing transcribed sequences, it suffers from the limitation of being unable to account for structural changes in mRNA transcripts, such as alternative splicing. Since the genome contains all introns and exons that may appear in a transcript, splice variants with discontinuous alignments may be overlooked as actual protein variants. Even if a reference genome is available, de novo assembly should be performed as it enables the recovery of transcripts from fragments missing from the master genome.
Once RNA is extracted and purified from cells, it is sent to a high-throughput sequencing facility where it is first reverse transcribed to create a cDNA library. These cDNAs can then be fragmented into various lengths depending on the sequencing platform used. The following various platforms utilize different types of technologies to sequence millions of short reads, including 454 sequencing, Illumina, and SOLiD.
The cDNA sequence reads generated above will be assembled into transcripts through a short-read transcript assembly program. Often a few amino acid variations can be detected, which may reflect different protein variants or may represent different genes in the same gene family or even genes that only share conserved domains. While these programs are generally successful in assembling genomes, they face unique challenges in transcriptome assembly. Unlike high sequence coverage for the genome, for the transcriptome, high sequence coverage may imply abundant rather than repetitive sequences. Additionally, transcriptome sequencing can be strand-specific, in which case both sense and antisense transcripts are present. Ultimately, reconstructing and dissecting all splice variants may prove difficult.
Functional annotation of assembled transcripts provides insights into specific molecular functions of putative proteins, cellular components, and biological processes. Blast2GO (B2G) can annotate sequence data that do not yet have GO annotations by aligning assembled contig fragments with NCBI's non-redundant protein database. This is a tool frequently used in functional genomic studies of non-model species.
Because good reference genomes are rarely available, the quality of computational assemblies can be validated by comparing assembled sequences to the reads used to generate them (without a reference) or by aligning conserved gene domain sequences to the transcriptome or genome of a closely related species (based on a reference). Tools like Transrate and DETONATE perform statistical analysis through these methods to assess the quality of the assembly.
In this rapidly developing field of genomic research, transcriptome assembly is undoubtedly one of the core tools for understanding the diversity of life. With such rich biodiversity, how can we apply these findings to future biotechnology and conservation efforts?