Comparative genomic analysis of the two strains identified a number of genomic regions and genes containing virulence factors. Of particular interest was the discovery of a novel plasmid
pPAA3 that was previously unknown in the genus Photorhabdus. The pPAA3 plasmid contains a Type IV secretion system similar to the pCRY plasmid in Yersinia pestis. Type IV secretion systems are well-known virulence factors, involved in delivering ‘effectors’ AZD6244 such as toxins into eukaryotic cells. We speculate that this plasmid may be responsible for the ability of the Australian isolates to invade nonphagocytic cells in tissue culture, which is not seen in the closely related US isolates that lack this plasmid (Costa et al., 2009). We used a combination of Illumina, 454 and Sanger sequencing
to gather primary sequence data for the P. asymbiotica Kingscliff genome. We also constructed libraries to provide both Illumina-based paired-end reads and large insert fosmid libraries for conventional Sanger-based end sequencing (see Supporting Information, Appendix S1 for details). We used three different workflows, combining different types of sequence data with different assembly algorithms, to look for the optimal de novo assembly (see Fig. S1). The first (Workflow A) used the VCAKE pipeline (Reinhardt et al., 2009) to perform a hybrid assembly of the 454 data and the Illumina paired and unpaired reads. Illumina reads were de novo clustered with vcake version Bortezomib 1.03 into GNA12 VCAKE contigs. Newbler then assembles VCAKE contigs and 454 long reads into hybrid contigs. The Newbler scaffolder orients the hybrid contigs into larger hybrid scaffolds using
454 paired-end data. Hybrid scaffolds are cleaned of 1–2 base pairs (bp) indels using Illumina read depth; longer gaps within scaffolds were filled with unused VCAKE and hybrid contigs in Finisher. Finally, polymorphism and coverage in the scaffolds were used to identify any putative repeat regions. The second workflow (Workflow B) used the velvet assembler (version 0.7.27) to produce an assembly of the Illumina paired read data. The third workflow (Workflow C) was a hybrid assembly of Illumina paired read data and 454 data using the VELVET assembler. Once the assemblies were complete, Sanger-derived fosmid end sequences were aligned to the different assemblies to verify that the contigs were in the correct orientation. Sequence alignments were performed using the newbler gsmapper software, providing the assembly contigs as a reference sequence. Alignments were visualized using Seqman (dna star version 8). The optimal draft assembly was selected by choosing the output that had the optimal characteristics of high N50, low N and a sum of contigs equal to the estimated genome size, which was estimated to be ∼5 Mb. For comparisons with the finished genome of P. asymbiotica ATCC43949, Illumina paired-end reads were mapped both to the genome and to the pPAU1 and pCRY plasmids using the maq assembler version 0.6.