For this BLASTP, is opened from the DEG home page and the probable this website proteins were isolated from the above step are entered in the FASTA format as the query sequence with the default parameters. All the genes having similarity with Mycoplasma genitalium were selected. The selected genes were then subjected to BLASTP again with the human genome. This is necessary to remove any protein present in common to human and bacteria proteome because as targeting that very
protein may have adverse effect on humans. This may be side-effects such as some allergic reactions or toxic effects. In the study, all the virulent genes were extracted from the Virulent Factor Database which was 21 in number.17 and 18 To predict new virulent genes the available microarray data was retrieved from Stanford Microarray Database. These
genes were subjected to clustering which helped in identifying many more genes that co-expressed along with the virulent genes that were isolated from VFDB. According to the cluster theory all the co-expressed genes are grouped in same cluster. Clustering resulted in the formation of 450 clusters out of which 21 clusters were selected in which already known virulent see more genes were found. Some genes were found in more than one cluster from which we can infer that a large number of genes are being expressed at the same time as the corresponding gene might have one of the vital roles in the survival of bacteria. To identify the paralogous genes, above genes were subjected to BLAST2. Since gene duplication is a rare phenomenon, none such gene was identified for S. pneumoniae. Target proteins should be essential to the concerned pathogenic bacteria, i.e., any disruption in the functioning of those also genes will lead to bacterial death. To identify the essential proteins, all the proteins were subjected to BLASTP against DEG. The proteins that were showing a hit of more than 90 and e-value taken as 0.1 was selected as essential genes. Only 50 were able to fulfill this requirement. Fewer hits depicted that only few proteins of the genes that co-expressed along with the virulent factor reported are essential for the survival
of the bacteria. As we know that the host of S. pneumoniae is human so it is essential to check the hits of the same with the Homo sapiens and Escherichia coli (gut flora). The proteins similar to host proteome are to be checked for the prevention of further dead ends. In case of any similarity, it can hamper the hosts’ survival (because if the drug developed against any gene present in bacteria shows similarity to host then it can disturb the normal functioning of the host genome). The reason of similarity is the horizontal and vertical gene transfer during the course of evolution. Proteins showing sequence similarity with any human protein may lead to drug reactions with the host that can be responsible for toxic effects.