Despite the higher probability of errors in gene assignments characterizing draft genomes, we decided to include them to expand the scope of our genomic comparison. A whole genome scanning was performed using a PWM derived from the region comprising several experimentally validated VirR binding sites [7, 8]. A new PWM was generated from the targets identified in the first scanning by using 30 motifs found in the promoters of genes that are orthologous to known targets and then used for a second genome scanning. In this way we avoid the biases that affect the first
matrix, obtained from only a few sequences mainly coming from one check details strain. After our two-step strategy, we collected all genes with a motif scoring more than 0.88, which is the lowest value observed for an experimentally
tested VirR target gene (corresponding to gene CPF_1074, [8]). At this threshold we retained at end 53 occurrences of the VirR motif. Analysis of their location with respect to the start codon of the downstream coding sequence revealed thet most of them are at around 100 bp from the beginning of the gene (figure 2). The larger distance observed for some of the motifs may be due to longer 5′ untranslated regions or may account for some different level of regulation for those genes. Cell Cycle inhibitor The list of genes putatively regulated by VirR was splitted in three different groups after clustering similar sequences (see Methods), by defining the: i) conserved VirR regulon as formed by chromosomal genes retrieved in at least two different genomes; ii) the accessory regulon with chromosomal genes present in a single strain; iii) the mobile regulon, including Ceramide glucosyltransferase genes found on plasmids. Figure 2 Distribution of distances from gene. The distance of the motifs with respect to the translation start site (selleck screening library x-axis) is shown. Motifs are grouped by homology of the downstream gene (cluster identifier is on the y-axis). Most of the targets are located in the first 200 nt from the start of the gene, but some of them (and notably several corresponding to characterized ones) are
located at larger distances. Red circles correspond to orthologous groups from Table 2. The conserved VirR regulon The conserved regulon (Table 2), appeared to contain all known target genes [7, 8] with the exception of CPR 0761 and virT. The former can be identified in the genome of strain SM101 only, while the latter has been found in strain 13 and ATCC3626; in both cases we were able to identify a VirR binding motif in their promoter (Table 3). Table 2 Conserved VirR regulon Product Genomes REF ATCC13124 Str.13 SM101 F4969† JGS1721† JGS1495† JGS1987† ATCC3626† α -clostripain CPF_0840 CPE0846 CPR_0833 AC5_0918 CJD_0991 CPC_0878 AC3_1028 AC1_0991 [7] ccp 1.52 1.52 1.52 1.52 1.52 1.52 1.52 1.52 Reg.