AZ-33

Efficient System Wide Metabolic Pathway Comparisons in Multiple
Microbes Using Genome to KEGG Orthology (G2KO) Pipeline Tool
Chandrakant Joshi1
· Swati Sharma2
· Neil MacKinnon3
· Shyam Kumar Masakapalli1
Received: 31 July 2019 / Revised: 17 May 2020 / Accepted: 25 May 2020
© International Association of Scientists in the Interdisciplinary Areas 2020
Abstract
Comparison of system-wide metabolic pathways among microbes provides valuable insights of organisms’ metabolic capa￾bilities that can further assist in rationally screening organisms in silico for various applications. In this work, we present a
much needed, efcient and user-friendly Genome to KEGG Orthology (G2KO) pipeline tool that facilitates efcient com￾parison of system wide metabolic networks of multiple organisms simultaneously. The optimized strategy primarily involves
automatic retrieval of the KEGG Orthology (KO) identifers of user defned organisms from the KEGG database followed
by overlaying and visualization of the metabolic genes using the KEGG Mapper reconstruct pathway tool. We demonstrate
the applicability of G2KO via two case studies in which we processed 24,314 genes across 15 organisms, mapped on to 530
reference pathways in KEGG, while focusing on pathways of interest. First, an in-silico designing of synthetic microbial
consortia towards bioprocessing of cellulose to valuable products by comparing the cellulose degradation and fermentative
pathways of microbes was undertaken. Second, we comprehensively compared the amino acid biosynthetic pathways of
multiple microbes and demonstrated the potential of G2KO as an efcient tool for metabolic studies. We envisage the tool
will fnd immensely useful to the metabolic engineers as well as systems biologists. The tool’s web-server, along with tutorial
is publicly available at https://faculty.iitmandi.ac.in/~shyam/tools/g2ko/g2ko.cgi. Also, standalone tool can be downloaded
freely from https://sourceforge.net/projects/g2ko/, and from the supplementary.
Keywords KEGG · Cellulose degradation · Metabolic networks · Bioprocess
1 Introduction
The exponential rise in the number of genome sequences
provides opportunities to annotate and compare the vari￾ous functions and capabilities of different organisms.
This comparison is feasible even at the level of metabolic
networks by comparing the functions of the genes (by
Orthology) involved in various pathways. Kyoto Ency￾clopedia of Genes and Genomes (KEGG) is a valuable
resource that integrates various types of biological data.
The gene functions of each organism are assigned a KO
identifer based on its functional characterization. Across
the 6030 organisms in the KEGG database, genes with
the same function (orthologs) carry same KO identifers.
There are several ways one can extract the KO identifers
from annotated genomes either manually or automated.
Alternatively, tools like KEGG Orthology and Links
Annotation (KOALA) [1] and KO-Based Annotation
System (KOBAS) [2] can be used frst to assign KO iden￾tifers for newly sequenced genomes or microarray and
then KEGG Mapper reconstruct pathway can be used for
pathway identifcation and visualization. Another pipeline,
Gene Function Identifcation Tool (GFIT) [3] searches
for the genes in the genomes present in the KEGG data￾base and maps them on KEGG networks. The presence
and absence of genes in annotated genomes can help
understand the completeness of the metabolic networks.
Electronic supplementary material The online version of this
article contains
supplementary material, which is available to authorized users.
* Shyam Kumar Masakapalli
[email protected]
1 School of Basic Sciences, Indian Institute of Technology
Mandi, Kamand, Himachal Pradesh 175005, India
2 School of Engineering, Indian Institute of Technology
Mandi, Kamand, Himachal Pradesh 175005, India
3 Institute of Microstructure Technology, Karlsruhe
Institute of Technology, Hermann-von-Helmholtz-Platz,
176344 Karlsruhe, Baden-Württemberg, Germany
Combining the genome data with metabolic networks
using KEGG tools can provide information about meta￾bolic features like types of sugars diferent microbes can
utilize or diferent molecules they can synthesize [4–6].
Pathway comparisons can also help uncover striking con￾trast in cellular metabolism by identifying missing key
enzymes and their alternatives [7]. Currently, the KEGG
pathway reconstruction tool can map the KEGG Orthology
(KO) data onto existing KEGG metabolic maps and com￾pare the metabolic pathways of multiple organisms, but the
input fle should contain the KO identifers in a specifc
format. The KEGG database provides the KO annotations
but along with a wide range of data, not in the conveni￾ent form and format which is required by KEGG Mapper
pathway reconstruct tool. Extracting the KO identifers
manually (one by one) from KEGG database and arrang￾ing into the input format is cumbersome and alternatives
are welcome. To overcome this limitation, we developed
Genome to KEGG Orthology (G2KO), which is a pipe￾line tool designed to simplify and facilitate the pathway
comparison of multiple organisms (up to 10). In princi￾ple, G2KO expands the ease of use for KEGG pathway
reconstruct tool by acting as a pipeline which can fasten
the quick screening of desired metabolic capabilities of
multiple microbes, as quick as in few minutes. This allows
for better utilization of KEGG by bridging the gap between
diferent KEGG resources. Comparing multiple metabolic
networks at the level of targeted pathways of genome￾annotated organisms could provide valuable insight into
evolutionary relationships [8], versatility and metabolic
capacity [9]. These comparisons can have applications in
agriculture, health and environment.
In this contribution, we demonstrate the G2KO pipeline
and KEGG mapper-based analysis to identify microbes
for synthetic microbial consortia based on their metabolic
capabilities, by comparing the cellulose breakdown and
fermentation capabilities of multiple organisms. Based on
the pathway analysis, microbial partners are identifed as
potential candidates for lab-based consortia experiments.
This tool and developed pipeline are not limited to cellu￾lose degradation and can be extended to understand other
metabolic networks and reduce the time involved in meta￾bolic pathway comparison.
2 Materials and Methods
G2KO is a pipeline tool developed in Perl and the web￾server is developed in CGI which combines HTML and
Perl. It also utilizes the KEGG Representational State
Transfer (REST) Application Programming Interface
(API). G2KO retrieves all the KO identifers of multiple
organisms and writes them as an output text fle, which can
be directly inserted into the KEGG pathway reconstruc￾tion tool.
2.1 Package Installation
The G2KO webserver does not require any installa￾tion and all the processing is done server side. User
needs to visit the G2KO page (https://faculty.iitmandi.
ac.in/~shyam/tools/g2ko/g2ko.cgi) and follow the sim￾ple instructions there. The webserver combines all the
functionalities of standalone tool and is accessible world￾wide through internet connection and browser. We have
tested the server on Google Chrome, Mozilla Firefox and
Apple Safari on different operating systems. The G2KO
standalone package can be downloaded freely from
SourceForge (https://sourceforge.net/projects/g2ko/) and
installed on a computer by unpacking the compressed
file. The G2KO package contains: the main tool, one
perl script to download the list of organisms, a list of
all annotated organisms on the KEGG database and a
readme file for the instructions and supporting informa￾tion. The tool is independent of operating system envi￾ronments and can be run on any OS given that the appro￾priate version of perl is installed (tested on Ubuntu perl
version 5.22.1, Windows ActivePerl version 5.24.3.2404
and MacOS perl version 5.18.2).
2.2 Input
The standalone tool takes input in the command line and
the webserver has appropriate text boxes for input. The
three or four letter KEGG organism codes are the only
inputs given, which can be found in the KEGG database
as well as on the webserver and in the fle “organisms.txt”
provided with the package. A simple script “getorg.pl” can
be run, to update the fle “organisms.txt” to include any
new organisms added to KEGG database recently. G2KO
doesn’t accept any arguments at the time of running the
command. G2KO standalone tool currently limits pro￾cessing of KO identifers of up-to 10 organisms because
KEGG Mapper reconstruct pathway can handle 10 organ￾isms. Webserver does not impose any limitation on data
retrieval however, limitations of KEGG mapper reconstruct
pathway should be kept in mind.
2.3 Output
After input of the KEGG organisms, the code will fetch the
KO identifers from KEGG and process them. The webserver
will prompt the user to download a fle named “output.txt”.
The output of standalone is generated in a text fle in the
same folder you run the tool, named “outfle.txt”. The output
is in required format for the subsequent step where all the
KO are arranged separately for each diferent organism. This
“outfle.txt” or “output.txt” fle can simply uploaded into
the KEGG pathway reconstruct tool (https://www.genom
e.jp/kegg/tool/map_pathway.html). KEGG pathway recon￾struct tool does not have any parameters to modify while
running. The KEGG mapper reconstruct pathway provides
the color-coded images for presence and absence of genes
in user selected pathways, which is the basis of metabolic
networks comparisons. The pathways of interest can then be
manually derived and drawn on the basis of diferent path￾way outcomes of KEGG mapper.
2.4 Dependencies
G2KO webserver is software dependencies free and it only
requires working internet connection and suitable internet
browser like Internet explorer, Google Chrome, Apple Safari
or Mozilla Firefox. G2KO standalone tool is platform inde￾pendent but is implemented using perl, thus installation of
latest version of perl language is required for the tool to run.
Tool also requires a working internet connection to retrieve
the KO data from KEGG. The tool does not use any extra
perl modules and all the modules which are used in the
G2KO tool are already installed with the perl.
3 Results
3.1 Genome to KEGG Orthology Pipeline Tool
The pipeline developed (Fig. 1) can compare metabolic
networks of multiple organisms based on KO identifers
of annotated genomes. The tool can either be assessed via
an online web-browser with graphical user interface (GUI)
or be installed as a standalone package. In the web-server
option, user frst visits the G2KO tool home (https://facul
ty.iitmandi.ac.in/~shyam/tools/g2ko/g2ko.cgi) through the
internet browser. First thing the tool asks for is the num￾ber of organisms the user wants to compare. Based on your
input, the text boxes will be generated for the input of each
organisms’ name and code. After entering the organisms’
names and codes, click “Download KO identifers” button
to download the text fle containing KO identifers of the
selected organisms. This “Download KO identifers” but￾ton retrieves real time data from KEGG so the information
of KO annotation is always updated. The webserver also
contains detailed steps, as well as an option to download
information of all the organisms and their KEGG codes
available in KEGG genome database in real time. Once the
KO fle of all the organisms is saved on your computer, you
can simply upload the fle on KEGG Mapper pathway recon￾struction tool (https://www.genome.jp/kegg/tool/map_pathw
ay.html) to visualize the metabolic pathway comparison, the
link to which is also available on the webserver results page.
Detailed visual instructions to use the G2KO web-server are
provided in the supplementary fle as well (Supplementary
text S3). In case of standalone, the organism list fle can
be updated by running the command:>perl getorg.pl. The
main tool does not accept any run time parameters and can
be run by using the command:>perl G2KO.pl. The tool frst
asks for the number of organisms to be compared, followed
by the organism’s code and name. To validate the pipeline,
we applied G2KO pipeline to compare metabolic networks
of multiple organisms and identify potential for cellulose
degradation and fermentative abilities as well as amino acid
biosynthetic capabilities.
3.2 Case Study of Cellulose to Fermentative
Products Using G2KO
Cellulose is one of the most abundant biomolecules on earth,
which is used as raw material for the production of a variety
of chemicals and biochemicals including biofuels, antibi￾otics, alcohols and many organic acids. In most cases, the
microbial processes do not feature a high enough efciency
to convert the entire raw material into sugars, thus resulting
in large quantities of cellulosic waste. In addition, cellulose
containing waste is generated in the form of agriculture and
forestry byproducts [10], as cellulose is the most dominant
component in various crops and plants. Although this natural
polymer is biodegradable, its signifcantly large quantities
may raise environmental and human health concerns [11]
owing to the fact that its natural decomposition is relatively
slow. Several existing technologies used for an immediate
treatment of cellulosic waste rely on the idea of further con￾verting it into proftable chemicals employing specifcally
Fig. 1 G2KO pipeline—A tool to facilitate comparison of metabolic
networks of multiple organisms. The major working of the web￾server and standalone tool is largely same, except for the diference in
input. Web-server takes input in graphical user interface (GUI) text￾boxes on the webpage while standalone tool takes input in command
line. a Organisms KEGG code is given as input in G2KO web-server
textboxes GUI or command line. b G2KO sends a request to KEGG
database for KO identifer data retrieval of selected organisms. c The
retrieved data is processed into the text fle in the suitable format for
the KEGG Mapper reconstruct pathway. This text fle is prompted for
download in the browser or generated in the same folder where stan￾dalone tool was run. d This generated text fle can be uploaded to the
KEGG Mapper reconstruct pathway to facilitate the comparison. e
List of all pathways in the KEGG is given. Any pathway (like glyco￾lysis) can be selected for visualization. f Selected pathway is visual￾ized by the KEGG mapper
Fig. 2 Validation of G2KO pipeline along with KEGG Mapper
reconstruct pathway tool to investigate and compare multiple meta￾bolic networks. A Pathways of typical cellulose breakdown (reac￾tions 1–7) and fermentation (reactions 8–27). B Heatmap showing
the presence or absence of reactions in 15 selected microbes derived
from KO identifers. The completeness of cellulose degradation and
fermentation calculated as percent is presented. Glu/6P, Glucose or
Glucose 6P; Fru/6P, Fructose or Fructose6P; G/3P, Glyceraldehyde
or Glyceraldehyde3P; Pyr, Pyruvate; 2-Hyd-ThPP, 2-Hydroxyethyl￾ThPP; AcCoa, Acetyl CoA; Act, Acetate; Actld, Acetaldehyde; Lac,
Lactate; OAA, Oxaloacetic acid; For, Formate; Cit, Citrate; Mal,
Malate; Fum, Fumarate; Succ, Succinate. The colours in the heat￾map correlate to the reactions. Absence of an annotated gene prod￾uct (enzyme) is indicated by colourless box. The pathway fgure and
heatmap was drawn using Microsoft Ofce PowerPoint. An example
of raw image outputs from KEGG mapper reconstruct pathway tool
is provided in the Supplementary Figure S1, which is used to com￾pile this fgure. List of all the reactions along with their EC numbers
are provided in Supplementary Table S1 and detailed description of
organisms selected for the comparison are provided in Supplementary
Table S2. (Note—KEGG Mapper reconstruct pathway supports com￾paring metabolic networks of 10 microbes at a time. Hence, here the
analysis was performed twice using 10 and 5 microbes separately and
then compiled together for comparison. This strategy allows compar￾ing any number of microbial metabolic networks)
designed bioprocesses. While there is worldwide research
eforts directed towards a complete utilization of cellu￾losic and lignocellulosic biomass [12], there is still plenty
of scope in terms of cost-reduction and process efciency
improvement. In order to maximize the production yield
of proftable chemicals from both raw and waste cellulose,
there is a compelling need to identify microbes which can
break down the cellulose into fermentable sugars with a high
conversion efciency. These simpler sugars can then be uti￾lized by either a single organism or multiple organisms in a
consortium to produce the desirable fnal products.
Based on literature [13, 14], industrial microbes with
well annotated genomes and known to be either cellulose
degraders or fermenter were selected to validate G2KO and
identify potential microbes of interest (Fig. 2). Thermococ￾cus kodakarensis is considered as a negative control as it
encodes a variety of glucosidases and glucanases but does
not degrade cellulose [15]. G2KO extracted the KO identi￾fers and then the KEGG pathway reconstruction and path￾way visualization allowed pathway comparisons of multiple
microbes. Mainly the pathways pertaining the breakdown
of the cellulose and fermentation were focused. The extent
of completeness of pathways is derived by assigning scores
(presence of reaction is given score 1) and calculating over￾all percentage for comparisons between organisms. List of
all the reactions along with their EC numbers are provided
in Supplementary Table S1.
G2KO was successfully validated by a case study on iden￾tifying microbial consortia for cellulose degradation. The
study highlights that Clostridium cellulovorans and Ther￾mobifda fusca are promising candidates for the break down
cellulose as these have a high completeness (86%) of the
cellulose degradation pathway (Fig. 2). Chaetomium thermo￾philum, Trichoderma reesei, Caldicellulosiruptor saccharo￾lyticus, and Myceliophthora thermophila are also good can￾didates to explore the cellulose degradation capabilities with
a completeness score of 71%. Thermococcus kodakarensis
does not have any annotated KO identifers related to cel￾lulose degradation pathway, indicating a complete inability
to degrade cellulose or cellobiose, but the organism encodes
a variety of glucosidases and glucanases [13, 15]. The KO
identifer-based annotation also highlighted that both Geo￾bacillus kaustophilus and Parageobacillus thermoglucosida￾sius cannot degrade cellulose. However, these can take up
extracellular cellobiose and breakdown it into glucose that
can further be oxidized via central metabolism. In case of
fermentative ability, Parageobacillus thermoglucosidasius
showed a very high completeness (75%), indicating that it is
a versatile fermenter capable of producing the products like
ethanol, lactic acid and acetate [16].
For a synthetic consortium to convert cellulosic waste
into industrial products, at least one organism needs to ef￾ciently break down the cellulose (indicated by a high cellu￾lose breakdown completeness score) and at least one organ￾ism which can utilize the extra sugars to produce industrial
products (indicated by a high fermentation completeness
score). The completeness scores derived for the organisms
evaluated in this study suggest that a consortia of T. fusca
and P. thermoglucosidasius, both being thermophiles and
having best cellulose degradation and fermentation pathway
completeness score respectively among selected organisms,
could be efective partners. In this hypothetical thermophilic
consortium, T. fusca would be able to break down the cellu￾lose into cellobiose and glucose and P. thermoglucosidasius
would be able to utilize the resulting sugars for fermentation.
C. cellulovorans can also be used as a cellulose degrader
in consortia, but the optimum growth temperature of 37 °C
[17] limits its functionality in a thermophilic consortium
and being absolute anaerobic poses another challenge. In a
separate approach, one can hypothesize fungal-bacterial (T.
reesei and P. thermoglucosidasius) and bacterial-bacterial
sequential consortia (C. cellulovorans/T. fusca and P. ther￾moglucosidasius) at diferent temperatures in a bioreactor
setup for cellulose degradation into valuables. However,
these results would need to be validated in vitro and the
compatibility of microbes needs to be evaluated, which is
the current focus of our group. Overall, G2KO can assist in
designing evidence-based consortia for various bioprocess￾ing and other applications.
3.3 Case Study of Amino Acid Biosynthetic
Pathways Using G2KO
The biosynthetic pathways of all 20 amino acids are well
established in typical model organisms such as E. coli.
Researchers also reported alternative metabolic pathways
leading to Amino acid biosynthesis in some organisms [18,
19]. G2KO can assist researchers to compare the amino acid
biosynthesis capabilities between various microbes. These
comparisons of amino acids biosynthetic pathways can shed
light into the lifestyle and adaptation of the organisms [20].
In addition, understanding the amino acid biosynthetic path￾ways and the precursors metabolites provide key information
that is essential for 13C metabolic fux analysis [21] studies
which further support rational metabolic engineering [22].
In this case study, we compared the amino acid biosyn￾thesis capabilities of 15 selected microbes having distinct
lifestyles (mesophiles, thermophiles and acidophiles) across
diferent domains (bacteria, archaea and eukaryote) of life,
anticipating it would shed light on their metabolic adapta￾tion to some extent. The biosynthetic pathways of all 20
amino acids were constructed (Fig. 3) using KEGG path￾way database as reference. Amino acid pathways are well
conserved in many selected organisms and comparison
shed light into their metabolism. It was observed that, the
amino acid biosynthetic pathways are conserved in majority
of the selected 15 organisms, while some unique pathway
features are also present. G. kaustophilus, T. fusca and S. sol￾fataricus can convert arginine into citrulline, a commercial
Fig. 3 The amino acid biosynthetic capabilities of selected organ￾isms compared to validate the G2KO pipeline. a The possible amino
acid biosynthetic pathways in the microbial metabolism. The reac￾tion numbers are given for every reaction. Some reactions are sim￾plifed for a better summary, which are indicated by multiple arrows.
b Shows the heatmap of amino acid biosynthetic pathway compari￾son. Every amino acid is colour coded diferently and absence of
colour indicates absence of gene of enzyme needed to perform that
reaction or set of reaction. In case of multiple reaction steps, pres￾ence is indicated if all the enzymes to perform all the intermediate
reaction are present in an organism. The pathway fgure and heatmap
was drawn using Microsoft Ofce PowerPoint. G6P, Glucose 6 Phos￾phate; F6P, Fructose 6 Phosphate; 3PG, 3-Phosphoglyceric acid; PEP,
Phosphoenol-pyruvate; Pyr, Pyruvate; Gluc6P, Gluconate 6 phos￾phate; Ru5P, Ribulose 5 phosphate; R5P, Ribose 5 phosphate; X5P,
Xyluose 5 phosphate; E4P, Erythrose 4 phosphate; S7P, Sedpheptu￾lose 7 phosphate; PRPP, 5-Phosphoribosyl 1-pyrophosphate; DAHP,
7P-2-Dehydro-3-deoxy-arabino-heptonate; AcCoA, Acetyl CoA;
OAA, Oxaloacetic acid; AKG, α-ketoglutarate; 3PHP, 3P-Hydroxy￾pyruvate; SEP, Phosphoserine; AcSer, Acetyl-serine; AKB,
α-ketobutyrate; AKV α-ketovalerate; HomoCys, Homo-cysteine
Interdisciplinary Sciences: Computational Life Sciences
1 3Table 1 Features of the tools that facilitate the comparison of metabolic pathways of multiple organisms. Incudes the Genome to KEGG orthology (G2KO) tool from this work Tool and refer- ences Goal/type No. of organ- isms that can be compared Download require Dependencies Target data/data- base Ontology Input Output Updates/mainte- nance
G2KO (current
work)
Pathway compar￾ison for screen￾ing of desired
metabolic
capabilities
User defned
(retrieves KO
numbers of
any number
of organisms;
Pathways can
be compared 10
KO datasets at
a time)
No (online
mode); Stan￾dalone also
provided
No KEGG KO Organisms
KEGG code
KO list in KEGG
mapper format
Yes (real time data
retrieval from
KEGG)
Pathway Booster
[28]
Manual com￾parison and
curation of
metabolic
models
6 (multiple) Yes Python BRENDA/
KEGG
EC GenBank/EMBL
or FASTA fle
Browsable
HTML
Yes
ComPath [29] Integrative soft￾ware for path￾way analysis Multiple (unspecifed) No Java Runtime Environment KEGG EC, GO Multiple (in tool selection) Interactive spreadsheet Yes
EC2KEGG [30] Automated
comparison
of enzymes
from newly
sequenced
organisms
against anno￾tated reference
genomes
2 Yes Perl and several
other perl
modules
KEGG EC Third party
obtained list of
EC numbers
Report fle
containing
annotations and
statistics
No (last update
6 years ago)
KOBAS 2.0 [2] Annotation of
genes with
putative
pathways and
disease rela￾tionships based
on mapping
to genes with
known annota￾tions
– Yes, Webserver
also available
Python for down￾loaded version,
web server is
dependency
free
KEGG, PID,
BioCyc, Reac￾tome, Panther
and disease
databases
KO ID, FASTA
sequence, or
tabular BLAST
output
Table with
mapped KEGG
GENE IDs and
other staticstics
Yes
Comparative
Pathway Ana￾lyzer [36]
Finding the dif￾ferences in the
metabolic net￾works between
two groups of
organisms
2 groups of
organisms (No.
unspecifed)
No No KEGG EC, KEGG EC numbers in
EMBL format
or KEGG reac￾tion identifer
Colour coded
amino acid used in pharmaceuticals [23]. C. thermophillum,
T. reesei and A. saccharovorans contains alternate pathway
of cysteine biosynthesis from homocysteine. Pyrococcus
horikoshii is an archaeon which is obligate heterotroph and
cannot grow in a media without peptide source [24], also
indicated by its lack of amino acid biosynthetic abilities. Aci￾dilobus saccharovorans is another archaeon lacking many
of the amino acid biosynthetic enzymes, but its genome
annotation reveal that t-RNA coding genes for 20 amino
acids are present in the genome [25]. Lysine biosynthesis is
another feature of interest, as diferent types of organisms
utilize diferent pathways. Bacteria synthesize lysine from
arginine, while archaea and eukaryotes synthesizing from
α-ketoglutarate. Archaeon and eukaryotic biosynthetic path￾ways also split in between (Fig. 3a), indicating similarities of
archaea and eukaryotes. Threonine can be synthesized from
either glycine or aspartate; with most organisms seeming
to synthetize it from the aspartate and would need further
validation. T. fusca, a cellulose degrading organism, selected
from previous case study (see Sect. 3.2) seems to lack the
enzymes for cystine biosynthesis, which can be further vali￾dated through experiments. Another selected organism for
consortia, P. thermoglucosidasius, seems to lack the key
enzyme to synthesize serine, which is similar to other Geo￾bacillus spp.[26]. It is intriguing that in 13C experiments,
labelled serine [27] was observed indicating homologus
enzyme or novel pathway for serine biosynthesis that are
yet to be deciphered.
4 Discussion
The G2KO pipeline ofers robust approach to compare mul￾tiple microbial pathways derived from KO annotations fol￾lowed by overlaying on KEGG reference pathways. There
are other similar tools (Table 1) which can compare the path￾ways of multiple organisms, but with their own limitations
and merits. Mainly, G2KO pipeline facilitate visual compari￾sons of metabolic pathways in several microbes defned by
the user. Pathway booster [28] is downloadable package for
comparison and curation of pathways. It however, requires
additional database downloads as well as software package
like python. ComPath [29] is an integrated software for data
analysis which ofers integration of various databases and
statistical analysis of datasets. EC2KEGG [30], is able to
compare only two organisms at a time based on EC annota￾tions and reference dataset of E. coli. G2KO can be used
with existing annotated genome available in the KEGG data￾base. When newly sequenced genome is of interest for path￾way comparison, KEGG Orthology and Links Annotation
(KOALA) can be used frst to generate KO identifers for
newly sequenced genomes and then KEGG Mapper recon￾Table
struct pathway can be used for pathway identifcation and
1 (continued)
Tool and refer￾ences Goal/type No. of organ- isms that can be
compared
Download
require
Dependencies Target data/data￾base Ontology Input Output Updates/mainte- nance
FMM [37] Reconstructing
metabolic path￾ways between
two metabolites
Multiple No No KEGG and
others
KEGG identi￾fers, EC Compound Keywords or
KEGG ID
Colour coded
pathway visu￾alization
Yes
KAAS [38] Gene annotation
server for KO
– No No KEGG KO MultiFASTA List of assigned
KO and path￾ways
Yes
Path-A [39] Annotation
against 10
model pathways
1 against model
pathways
No No Own set of
model path￾ways (10)
EC MultiFASTA of
proteins
visualization [1]. Alternatively, other tools like KO-Based
Annotation System (KOBAS) [31] and KOBAS 2.0 web
server [2] have been developed to get KO annotations from
DNA sequencing and microarrays. These tools mainly focus
on getting the KO identifers annotation for newly sequences
and pathway identifcation, but not on the comparison of
pathways of multiple organisms.
To demonstrate the utility of G2KO, we have employed
G2KO in two case studies comparing 15 organisms, to
comprehensively explain the metabolic features like cel￾lulose degradation and fermentation abilities. These case
studies indicated the utility of G2KO where metabolic path￾way information can have applications, ranging from early
microbes screening to aiding in the pathway curation. We
have selected the bio-processing of cellulose as frst case
study given the recent evidence that synthetic microbial con￾sortia can be used to convert cellulose to value-added prod￾ucts [32]. Although it is possible to convert cellulose into
industrial products like ethanol [33] using just one strain,
studies have shown that using a consortium can have better
efect on the combined growth rates of individual microbes
[34, 35] In a hypothetical consortium, one or multiple organ￾isms could break down the cellulose and other organisms
could then take up the resulting simple sugars and biochemi￾cally synthesize industrial products. Isolation and charac￾terization of natural microbial consortia is a challenging
and time-consuming task. Alternatively, creating synthetic
microbial consortia could accelerate this process. The sec￾ond case study comprehensively compares the amino acid
biosynthetic pathways in the selected 15 organisms. The
comparison sheds some light into diferences between bac￾teria, archaea and fungi, as well as some unique biosynthetic
pathways and abilities of organisms.
Overall, in this study using G2KO pipeline, we processed
24,314 genes of 22,720 KO groups, across 15 organisms,
mapped on 530 reference pathways on KEGG, while focus￾ing on case studies of cellulose degradation to valuables by
microbial consortial bioprocessing and amino acid biosyn￾thetic pathways. G2KO along with KEGG Mapper recon￾struct pathway facilitates the pathway comparison of anno￾tated multiple genomes with ease.
5 Conclusion
G2KO pipeline flls a huge gap in metabolic pathway com￾parison by providing a way for very simplifed KO identi￾fer retrieval and processing. This comparison can provide
early insight into the metabolic capability of organisms,
and the metabolic compatibilities between the organisms.
We have successfully demonstrated two case studies in this
work. First, the comparative analysis of cellulose degrading
pathways in multiple organisms using G2KO allowed
designing a synthetic consortium for cellulose degradation.
Second, the amino acid biosynthetic capabilities among mul￾tiple organisms allowed ready comparisons. The tool can
be utilized in various other applications where metabolic
comparison is of primary interest. The simplicity of G2KO
provides a very easy and fast workfow for metabolic path￾way computational analysis.
Acknowledgements Authors sincerely thank the Department of Bio￾technology (DBT), India, and the Federal Ministry of Education and
Research (BMBF), Germany, for fnancial support via the Indo-German
cooperative project BioPEC. CJ is thankful to the MHRD, Govern￾ment of India and IIT Mandi for Ph.D. scholarship and BioX centre,
IIT Mandi for resources. SS also acknowledges Ministry of Science,
Research and Arts, Baden-Württemberg, Germany, for funds.
Author contributions Conceptualization: C.J., S.K.M, S.S and N.M.;
Methodology: C.J. and S.K.M.; Software: C.J. and S.K.M.; Valida￾tion: C.J., S.K.M, S.S and N.M.; Formal analysis: C.J. and S.K.M;
Investigation: C.J. and S.K.M; Resources: S.K.M; Data curation: C.J.;
Writing—original draft preparation: C.J. and S.K.M; Writing—review
and editing: C.J., S.K.M, S.S and N.M.; Visualization: C.J. and S.K.M.;
Supervision: S.K.M, S.S and N.M; Project administration: S.K.M, S.S
and N.M.; Funding acquisition: S.K.M, S.S and N.M.
Funding This research was funded by Department of Biotechnol￾ogy (DBT), India, ref. no. BT/IN/BMBF-Germany/29/SKM/2016-17
(SKM); and the Federal Ministry of Education and Research (BMBF),
Germany, ref. no. 01DQ17014 (SS, NM), via the Indo-German coop￾erative project BioPEC. CJ receives PhD scholarship from the MHRD,
Government of India and IIT Mandi. SS also acknowledges funding
from the Ministry of Science, Research and Arts, Baden-Württem￾berg, Germany via Grant No. Az: 33-7533-30-20/3/3, HEiKA Center
FunTECH-3D.
Compliance with ethical standards
Conflict of interest The authors declare no confict of interest.
References
1. Kanehisa M (2017) Enzyme annotation and metabolic reconstruc￾tion using KEGG. Protein Funct Predict Methods Protoc. https://
doi.org/10.1007/978-1-4939-7015-5_11
2. Xie C, Mao X, Huang J et al (2011) KOBAS 2.0: a web server for
annotation and identifcation of enriched pathways and diseases.
Nucleic Acids Res 39:W316–W322. https://doi.org/10.1093/nar/
gkr483
3. Bono H, Ogata H, Goto S, Kanehisa M (1998) Reconstruction
of amino acid biosynthesis pathways from the complete genome
sequence. Genome Res 8:203–210. https://doi.org/10.1101/
GR.8.3.203
4. Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of
genes and genomes. Nucleic Acids Res 28:27–30. https://doi.
org/10.1093/nar/28.1.27
5. Kanehisa M, Sato Y, Kawashima M et al (2015) KEGG as a refer￾ence resource for gene and protein annotation. Nucleic Acids Res
6. Kanehisa M, Furumichi M, Tanabe M et al (2016) KEGG: new
perspectives on genomes, pathways, diseases and drugs. Nucleic
Acids Res 45:D353–D361. https://doi.org/10.1093/nar/gkw1092
7. Jyoti P, Shree M, Joshi C et al (2020) The Entner-Doudorof and
nonoxidative pentose phosphate pathways bypass glycolysis and
the oxidative pentose phosphate pathway in Ralstonia solan￾acearum. MSystems 5:e00091. https://doi.org/10.1128/mSyst
ems.00091-20
8. Heymans M, Singh AK (2003) Deriving phylogenetic trees from
the similarity analysis of metabolic pathways. Bioinformatics.

https://doi.org/10.1093/bioinformatics/btg1018

9. Lee SJ, Lee D-Y, Kim TY et al (2005) Metabolic engineering of
Escherichia coli for enhanced production of succinic acid, based
on genome comparison and in Silico gene knockout simulation.
Appl Environ Microbiol 71:7880–7887. https://doi.org/10.1128/
AEM.71.12.7880-7887.2005
10. Thakur VK, Singha AS (2011) Physicochemical and mechani￾cal behavior of cellulosic pine needle-based biocomposites. Int J
Polym Anal Charact 16:390–398. https://doi.org/10.1080/10236
66X.2011.596303
11. Mittal SK, Singh N, Agarwal R et al (2009) Ambient air quality
during wheat and rice crop stubble burning episodes in Patiala.
Atmos Environ 43:238–244. https://doi.org/10.1016/j.atmos
env.2008.09.068
12. Lynd LR, Weimer PJ, van Zyl WH, Pretorius IS (2002) Micro￾bial cellulose utilization: fundamentals and biotechnology.
Microbiol Mol Biol Rev 66:506–577. https://doi.org/10.1128/
MMBR.66.3.506-577.2002
13. Blumer-Schuette SE, Kataeva I, Westpheling J et  al (2008)
Extremely thermophilic microorganisms for biomass conversion:
status and prospects. Curr Opin Biotechnol 19:210–217. https://
doi.org/10.1016/j.copbio.2008.04.007
14. Blumer-Schuette SE, Brown SD, Sander KB et al (2014) Ther￾mophilic lignocellulose deconstruction. FEMS Microbiol Rev
38:393–448. https://doi.org/10.1111/1574-6976.12044
15. Fukui T, Atomi H, Kanai T et  al (2005) Complete genome
sequence of the hyperthermophilic archaeon Thermococcus
kodakaraensis KOD1 and comparison with Pyrococcus genomes.
Genome Res 15:352–363. https://doi.org/10.1101/gr.3003105
16. Cripps RE, Eley K, Leak DJ et al (2009) Metabolic engineering
of Geobacillus thermoglucosidasius for high yield ethanol pro￾duction. Metab Eng 11:398–408. https://doi.org/10.1016/j.ymben
.2009.08.005
17. Sleat R, Mah RA, Robinson R (1984) Isolation and characteriza￾tion of an anaerobic, cellulolytic bacterium, Clostridium cellulo￾vorans sp. nov. Appl Environ Microbiol 48:88–93
18. Price MN, Zane GM, Kuehl JV et al (2018) Filling gaps in bac￾terial amino acid biosynthesis pathways with high-throughput
genetics.
19. Toh-e A, Ohkusu M, Shimizu K et al (2018) Novel biosynthetic
pathway for sulfur amino acids in Cryptococcus neoformans. Curr
Genet. https://doi.org/10.1007/s00294-017-0783-7
20. Payne SH, Loomis WF (2006) Retention and loss of amino
acid biosynthetic pathways based on analysis of whole-genome
sequences. Eukaryot Cell 5:272–276. https://doi.org/10.1128/
EC.5.2.272-276.2006
21. Azizan KA, Ressom HW, Mendoza ER, Baharum SN (2017)
(13)C based proteinogenic amino acid (PAA) and metabolic fux
ratio analysis of Lactococcus lactis reveals changes in pentose
phosphate (PP) pathway in response to agitation and temperature
related stresses. PeerJ 5:e3451–e3451. https://doi.org/10.7717/
peerj.3451
22. Ghosh A, Ando D, Gin J et al (2016) 13C metabolic fux analy￾sis for systematic metabolic engineering of S. cerevisiae for
overproduction of fatty acids. Front Bioeng Biotechnol 4:76. https
://doi.org/10.3389/fbioe.2016.00076
23. Curis E, Nicolis I, Moinard C et al (2005) Almost all about
citrulline in mammals. Amino Acids 29:177–205. https://doi.
org/10.1007/s00726-005-0235-4
24. González JM, Masuchi Y, Robb FT et al (1998) Pyrococcus
horikoshii sp. nov., a hyperthermophilic archaeon isolated from a
hydrothermal vent at the Okinawa Trough. Extremophiles 2:123–
130. https://doi.org/10.1007/s007920050051
25. Mardanov AV, Svetlitchnyi VA, Beletsky AV et al (2010) The AZ-33
genome sequence of the crenarchaeon Acidilobus saccharovorans
supports a new order, Acidilobales, and suggests an important
ecological role in terrestrial acidic hot springs. Appl Environ
Microbiol. https://doi.org/10.1128/AEM.00599-10
26. Cordova LT, Long CP, Venkataramanan KP, Antoniewicz MR
(2015) Complete genome sequence, metabolic model construc￾tion and phenotypic characterization of Geobacillus LC300, an
extremely thermophilic, fast growing, xylose-utilizing bacte￾rium. Metab Eng 32:74–81. https://doi.org/10.1016/j.ymben
.2015.09.009
27. Cordova LT, Cipolla RM, Swarup A et al (2017) (13)C metabolic
fux analysis of three divergent extremely thermophilic bacteria:
Geobacillus sp. LC300, Thermus thermophilus HB8, and Rhodo￾thermus marinus DSM 4252. Metab Eng 44:182–190. https://doi.
org/10.1016/j.ymben.2017.10.007
28. Liberal R, Lisowska BK, Leak DJ, Pinney JW (2015) Path￾wayBooster: a tool to support the curation of metabolic path￾ways. BMC Bioinform 16:86. https://doi.org/10.1186/s1285
9-014-0447-2
29. Choi K, Kim S (2008) ComPath: comparative enzyme analysis
and annotation in pathway/subsystem contexts. BMC Bioinform
9:145. https://doi.org/10.1186/1471-2105-9-145
30. Porollo A (2014) EC2KEGG: a command line tool for comparison
of metabolic pathways. Source Code Biol Med 9:19. https://doi.
org/10.1186/1751-0473-9-19
31. Mao X, Cai T, Olyarchuk JG, Wei L (2005) Automated genome
annotation and pathway identifcation using the KEGG Orthology
(KO) as a controlled vocabulary. Bioinformatics 21:3787–3793.

https://doi.org/10.1093/bioinformatics/bti430

32. Minty JJ, Singer ME, Scholz SA et al (2013) Design and charac￾terization of synthetic fungal-bacterial consortia for direct produc￾tion of isobutanol from cellulosic biomass. Proc Natl Acad Sci
110:14592–14597. https://doi.org/10.1073/pnas.1218447110
33. Singh N, Mathur AS, Tuli DK et al (2017) Cellulosic ethanol
production via consolidated bioprocessing by a novel ther￾mophilic anaerobic bacterium isolated from a Himalayan hot
spring. Biotechnol Biofuels 10:73. https://doi.org/10.1186/s1306
8-017-0756-6
34. Hulme MA, Stranks DW (1970) Induction and the regulation of
production of cellulase by fungi. Nature 226:469–470. https://doi.
org/10.1038/226469a0
35. Zurof TR, Xiques SB, Curtis WR (2013) Consortia-mediated
bioprocessing of cellulose to ethanol with a symbiotic Clostridium
phytofermentans/yeast co-culture. Biotechnol Biofuels. https://doi.
org/10.1186/1754-6834-6-59
36. Oehm S, Gilbert D, Tauch A et al (2008) Comparative path￾way Analyzer–a web server for comparative analysis, clustering
and visualization of metabolic networks in multiple organisms.
Nucleic Acids Res 36:W433–W437. https://doi.org/10.1093/nar/
gkn284
37. Chou C-H, Chang W-C, Chiu C-M et al (2009) FMM: a web
server for metabolic pathway reconstruction and compara￾tive analysis. Nucleic Acids
38. Moriya Y, Itoh M, Okuda S et al (2007) KAAS: an automatic
genome annotation and pathway reconstruction server. Nucleic
Acids Res 35:W182–W185. https://doi.org/10.1093/nar/gkm321
39. Pireddu L, Szafron D, Lu P, Greiner R (2006) The Path-A
metabolic pathway prediction web server. Nucleic Acids Res
34:W714–W719. https://doi.org/10.1093/nar/gkl228