This tendency is confirmed by the fact that a similar study made with the set of data in which the O-glycosylation positions were randomized (Figure 4B) resulted in a completely different distribution, with pHGRs more homogeneously scattered along the length of proteins. Figure 4 Distribution of pHGRs along the length of proteins . For each organism, the relative position of the centers of all pHGRs along the length of their respective protein was calculated, as percent distance from the N-terminus. The graph displays the frequency distribution of these pHGR centers in ten groups. A: distribution
obtained with the position of O-glycosylation sites obtained from NetOGlyc. B: distribution obtained when the position Ruxolitinib solubility dmso of the O-glycosylation sites were randomized. C: distribution obtained for the group of B. cinerea secretory enzymes active on polysaccharides, using the
not-randomized O-glycosylation positions. The location of pHGRs towards protein ends can be more clearly seen when only secretory enzymes are considered. This was studied by analyzing a specific set of proteins from B. cinerea predicted IDH cancer to have signal peptide and classified as enzymes active on polysaccharides in the CAZY database [16, 17]. This list of proteins contains 177 members with signal peptide and at least one O-glycosylation site, as predicted by signalP and NetOGlyc, respectively. Among them, we found 72 enzymes displaying pHGRs (not shown). The distribution of these regions along the length of the respective proteins (Figure 4C) Ketotifen shows clearly a much more marked tendency to be located at the ends, especially at the C-terminus. Discussion We have shown here that the most popular in silico tool to predict O-glycosylation, NetOGlyc, is able to predict O-glycosylation
for fungal proteins, although with less accuracy than for mammalian proteins, and has a fairly good ability to predict regions with a high density of O-glycosylation, better that the mere search for Ser/Thr-rich regions. We have also shown that fungal secretory proteins are rich in regions with a high Ser/Thr content and are frequently predicted to have pHGRs of varying length, averaging 24 residues but going up to 821, that can be found anywhere along the proteins but have a slight tendency to be at either one of the two ends. The coincidence between Ser/Thr-rich regions and pHGRs was studied for a representative number (361) of B. cinerea proteins (not shown), and the results obtained are similar to those shown in Figure 1, 91% of residues within pHGRs also belonged to a Ser/Thr-rich region, while only 25% of residues inside a Ser/Thr rich region were also within an pHGR. Although the abundance of Thr, Ser, and Pro residues has been used before to search for mucin-type regions in mammalian proteins [10], these results and the comparison of predicted vs.