Of those, 21 transcripts remained after removing redundant isoforms on the basis of their similarity at the amino acid level. Phylogenetic analysis of the deduced protein sequences of the 21 transcripts revealed that some transcripts were closely grouped with the three reported UGT proteins ( Fig. 6A). In particular, three transcripts, CP_comp126017, CP_comp142900, and CP_comp82124, showed much higher similarity to the reported UGT proteins than the other transcripts. Overall, the expression patterns of the 21 transcripts were similar between CP and CS, with the exception find more of CP_comp144124, which showed about two-fold higher expression in CP
(2.5 in CP vs. 1.3 in CS; Fig. 6B). To investigate the transcript expression differences between adventitious roots and primary roots, we compared 35,527 CP reference transcripts with 38,966 transcripts from 11-year-old ginseng primary roots, after assembly of 454 reads from the NCBI SRA database (accession no. SRX017443) [35]. When their sequence similarity was analyzed, 6,057 (17.0%) transcripts in adventitious roots and 6,354 (16.3%) in primary roots were found to be uniquely expressed. A total of 62,082 transcripts, 29,470 (83.0%) from adventitious roots and 32,612 (83.7%)
from primary roots, were commonly expressed. GO analysis of unique transcripts was performed to characterize their functional category. As shown in Fig. 7, more transcripts from adventitious roots were assigned GO terms than from normal roots. Overall, the proportion of Decitabine cell line GO assignment in adventitious root transcriptomes was two-fold higher than
that in normal roots, although the most frequent GO terms such as binding, response to other Reverse transcriptase organisms, and nuclear lumen were generally similar between both datasets. In particular, 11 out of 20 GO terms for biological processes had more transcripts in adventitious roots than in normal roots. Terms such as response to metal ion, transcription, multicellular organismal development, and reproductive developmental process showed more than eight-fold higher proportions compared to those in normal roots. By contrast, only two biological process terms, regulation of growth rate and response to stress, accounted for higher proportions in normal roots than in adventitious roots. Transcriptome profiling using NGS technology, the so-called RNA-Seq, is one of the most efficient tools for gene discovery and various functional studies. Illumina transcriptome sequencing and assembly have been used successfully for several nonmodel organisms [36], [37], [38] and [39], but transcriptome assembly has many challenges, including misassembled or chimeric contigs (i.e., assembled contigs containing reads from different transcripts [40]). Here, we describe a method to choose the best assembly result for both biologically and computationally meaningful results.