In addition, the distribution of the gene duplications (Figure 4) revealed that clusters of gene duplications of the same COG function exist on both CI and CII and that most of the gene duplications in a cluster possessed roughly similar levels of sequence conservation. As such, it may be MGCD0103 chemical structure possible that these highlighted chromosomal segments are locally selected for, especially as these gene duplications possess similar functions.
The sequence similarity and evolutionary constraints of the duplicate gene-pair are indicative of the essential or nonessential nature of gene function. Previous studies have revealed shown that the type II topoisomerases gyrase and topoisomerase P005091 IV demonstrated 40 to 60% amino acid sequence identity, but each protein has a distinct function essential for cell survival [55, Selleck Batimastat 56] highlighting the limitations in bioinformatics approaches. In a similar note, duplicate protein pairs with very little amino acid identity can share similar functions. In Bacillus subtilis, the peptide defomylases (Def and YkrB) show similarity only across short sequences (motifs) but both independently carry a deformylase reaction
essential for cell viability [57]. Therefore, gene disruption analysis is further required to determine the definitive function of isologous gene-pairs. In the specific analysis involving the carbon metabolism genes, it is likely that the cluster in CI containing cbbA, cbbF, cbbM, cbbP duplicated first and then cbbG and cbbT duplications arose from CI and were inserted between the duplicated cbbA and cbbP genes on CII. In addition, the two genes that
code for hypothetical proteins found between cbbT and cbbG on CI may have arisen through an additional Astemizole insertion or transposition event. Although these duplicated genes exhibit varying levels of protein divergence, these protein-pairs are under negative selection as evidenced by the functional constraints analysis in Figure 10. Additionally, the identity between the cbbM genes was low (31%). This is most probably due to the high degree of difference between cbbM I and cbbM II . More specifically, it has been shown that cbbM, which performs the first critical step in carbon fixation, has two forms (cbbM I and cbbM II ). The form I enzymes possesses large and small subunits while the form II enzyme possesses only large subunits that are different from the form I large subunits [58]. The distinguishing between CO2/O2 is primarily accomplished by loop 6 of the large subunit, which contains a conserved element of 11 amino acid residues. Form II enzymes are primarily anaerobic and unable to function in aerobic environments whereas form I enzymes can function in aerobic environments [59, 60].