DNA is available through the DNA Bank Network [43]. Genome sequencing and assembly The draft genome sequence generated using Illumina sequencing free copy technology. For this genome, we constructed and sequenced an Illumina short-insert paired-end library with an average insert size of 270 bp which generated 5,484,184 reads and an Illumina long-insert paired-end library with an average insert size of 7,670 +/- 2,475 bp which generated 4,839,808 reads totaling 1,549 Mb of Illumina data (Feng Chen, unpublished). All general aspects of library construction and sequencing performed can be found at the JGI web site [44]. The initial draft assembly contained 54 contigs in 17 scaffolds. The initial draft data was assembled with Allpaths [45] and the consensus was computationally shredded into 10 kbp overlapping fake reads (shreds).
The Illumina draft data was also assembled with Velvet [46], and the consensus sequences were computationally shredded into 1.5 kbp overlapping fake reads (shreds). The Illumina draft data was assembled again with Velvet using the shreds from the first Velvet assembly to guide the next assembly. The consensus from the second Velvet assembly was shredded into 1.5 kbp overlapping fake reads. The fake reads from the Allpaths assembly and both Velvet assemblies and a subset of the Illumina CLIP paired-end reads were assembled using parallel phrap (High Performance Software, LLC) [47]. Possible mis-assemblies were corrected with manual editing in Consed [47]. Gap closure was accomplished using repeat resolution software (Wei Gu, unpublished), and sequencing of bridging PCR fragments with PacBio (Cliff Han, unpublished) technologies.
A total of 45 additional sequencing reactions were completed to close gaps and to raise the quality of the final sequence. The final assembly is based on 1,549 Mbp of Illumina draft data, which provides an average 287 �� coverage of the genome. Genome annotation Genes were identified using Prodigal [48] as part of the JGI genome annotation pipeline [49], followed by a round of manual curation using the JGI GenePrimp pipeline [50]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes �C Expert Review (IMG-ER) platform.
Genome properties The genome statistics are provided in Table 3 and Figure 3. The assembly of the genome sequence consists of the genome sequence consists of three large scaffolds for the chromosome (3,520,924 bp, 564,457 bp and 447,629 bp in length, respectively) GSK-3 and six plasmids with sizes of 21,535 bp to 270,810 bp and a total G+C content of 63.3%. Of the 5,335 genes predicted, 5,227 were protein-coding genes, and 108 RNAs; 81 pseudo genes were also identified.