Furthermore, we provided strategies for identifying new GIs in different groups of bacteria, which might be potential pathogens for infectious diseases. Figure 1 Relation between sGCSs and GIs. Three genome islands in Lazertinib manufacturer Vibrio Choleae N16961, Streptococcus Suis ZY05 and Escherichia coli O157 were plotted with sGCSs. Methods 2.1 Complete genomic sequences and their bias features Complete bacterial genomes and learn more annotation files were downloaded from the NCBI database ftp://ftp.ncbi.nih.gov/genomes/Bacteria/. The features of the genomes (e.g., organism names, lineages, chromosome topologies, dnaA gene locations, GC contents, and GC coordinates) were used in the comparative
genomic analysis. Genome bias switch signals were detected by signals of the GC skews along the genomes, calculated by [G - C]/[G + C] with window sizes of 100-kb and steps of 50-kb. Here, sGCSs are defined as the sites at the cross point of GC skew and the average GC content. 2.2 GIs and their physical distances For each genome, we calculated GC content with a window size of 2000-bp and a step size of 1000-bp. In our analysis, pGIs were usually > 5 kb. As controls, Pathogenity
Island (PAI), PAI-like sequences overlapping with GIs (candidate PAIs, cPAIs), and PAI-like sequences not overlapping GIs (non-probable PAIs, nPAIs) S3I-201 supplier data were downloaded from the PAI database http://www.gem.re.kr/. Bay 11-7085 2.3 Genomic and evolutionary distances The genomic distances between GIs and sGCSs were calculated
using their genomic coordinates. For each GI, the distance to the sGCSs was determined by the nearest sGCS. To compare genomic distances between different species, instead of using physical distances, we obtained relative distances by dividing them with the length of each genome. This way, relative distances in different genomes are on the same scale (0 to 1) and are thus mutually comparable. GI homologues were obtained by searching evolutionarily highly-correlated bacterial genomes. GIs found in at least two strains were selected for analysis. For each pair, the BLASTN algorithm was used to evaluate their similarity. GIs with ≥ 80% overlap to each other were considered pairs of homologues. Evolutionary distance between each pair was obtained by the sequence similarity distance in the HKY85 model using PAUP [23, 24]. The matrix of distances was parsed to obtain a list of evolutionary distances. Next, correlations between evolutionary distances between homologous GIs and their corresponding genomic distances were calculated with R. A phylogenic tree was also constructed via the neighbor joining method using PAUP. Results 3.1 Identifying special features in bacterial genomes: switch signals of GC skews and GIs The dataset used for this study includes 1090 bacterial chromosomes (from 1009 bacterial species) as samples and 83 chromosomes (from 79 archaeal species) as controls.