55
1 1 2 Diversity of the abundant pKLC102 / PAGI-2 family of 3 genomic islands in Pseudomonas aeruginosa 4 5 Jens Klockgether, Dieco Würdemann, Oleg Reva, 1 Lutz Wiehlmann, and Burkhard Tümmler* 6 Klinische Forschergruppe, OE 6711, Medizinische Hochschule Hannover, D-30625 7 Hannover, Germany 8 1 Current address: Bioinformatics and Computational Biology Unit, University of Pretoria, 9 Pretoria 0002, South Africa 10 11 12 Running title: P. aeruginosa genomic islands 13 14 15 16 17 18 19 Correspondence: Burkhard Tümmler, Klinische Forschergruppe, OE 6710, 20 Medizinische Hochschule Hannover, Carl-Neuberg-Str. 1, D-30625 Hannover, Germany, 21 phone: +49-511-5322920, FAX: +49-511-5326723, email: tuemmler.burkhard@mh- 22 hannover.de 23 ACCEPTED Copyright © 2006, American Society for Microbiology and/or the Listed Authors/Institutions. All Rights Reserved. J. Bacteriol. doi:10.1128/JB.01688-06 JB Accepts, published online ahead of print on 28 December 2006 on February 11, 2020 by guest http://jb.asm.org/ Downloaded from

3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

1

1

2

Diversity of the abundant pKLC102 / PAGI-2 family of 3

genomic islands in Pseudomonas aeruginosa 4

5

Jens Klockgether, Dieco Würdemann, Oleg Reva,1 Lutz Wiehlmann, and Burkhard Tümmler* 6

Klinische Forschergruppe, OE 6711, Medizinische Hochschule Hannover, D-30625 7

Hannover, Germany 8

1Current address: Bioinformatics and Computational Biology Unit, University of Pretoria, 9

Pretoria 0002, South Africa 10

11

12

Running title: P. aeruginosa genomic islands 13

14

15

16

17

18

19

Correspondence: Burkhard Tümmler, Klinische Forschergruppe, OE 6710, 20

Medizinische Hochschule Hannover, Carl-Neuberg-Str. 1, D-30625 Hannover, Germany, 21

phone: +49-511-5322920, FAX: +49-511-5326723, email: tuemmler.burkhard@mh-22

hannover.de 23

ACCEPTED

Copyright © 2006, American Society for Microbiology and/or the Listed Authors/Institutions. All Rights Reserved.J. Bacteriol. doi:10.1128/JB.01688-06 JB Accepts, published online ahead of print on 28 December 2006

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 2: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

2

Abstract. 24

The known genomic islands of Pseudomonas aeruginosa clone C strains integrated 25

into tRNALys

(pKLC102) or tRNAGly

genes (PAGI-2, PAGI-3) and differ from their core 26

genomes by distinctive tetranucleotide usage patterns. pKLC102 and the related island PAPI-27

1 from P. aeruginosa PA14 were spontaneously mobilized from their host chromosomes at 28

frequencies of 3% and 0.3% thus making pKLC102 the most mobile genomic island known to 29

date with a copy number of 10-30 episomal circular pKLC102 molecules per cell. The 30

incidence of islands of the pKLC102 / PAGI-2 type was investigated in 71 unrelated P. 31

aeruginosa strains of diverse habitats and geographic origin. pKLC102- and PAGI-2-like 32

islands were identified in 50 and 31 strains, respectively, whereby 15 and 10 subtypes were 33

differentiated by hybridization on pKLC102 and PAGI-2 macroarrays. The diversity of 34

PAGI-2 type islands was mainly caused by one large block of strain-specific genes, whereas 35

the diversity of pKLC102 type islands was primarily generated by subtype-specific 36

combination of gene cassettes. Chromosomal loss of PAGI-2 could be documented in 37

sequential P. aeruginosa isolates from individuals with cystic fibrosis. PAGI-2 was present in 38

most tested Cupriavidus metallidurans and C. campinensis isolates from polluted 39

environments thus demonstrating the spread of PAGI-2 across habitats and species barriers. 40

The pKLC102 / PAGI-2 family is prevalent in numerous beta and gamma proteobacteria and 41

characterized by high asymmetry of the complementary DNA strands. This evolutionarily 42

ancient family of genomic islands retained its oligonucleotide signature during horizontal 43

spread within and among taxa. 44

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 3: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

3

Introduction. 45

The genome of a bacterium consists of a core that is common to all strains of a taxon 46

and an accessory part that varies within and among clones of a taxon. The accessory genome 47

represents the flexible gene pool that frequently undergoes acquisition and loss of genetic 48

information and hence plays an important role for the adaptive evolution of bacteria (10). The 49

flexible gene pool is made up of elements such as bacteriophages, plasmids, IS elements, 50

transposons, conjugative transposons, integrons and genomic islands. 51

Genomic islands are chromosomal regions that are typically flanked by direct repeats 52

and inserted at the 3‘ end of a tRNA gene. They contain transposase or integrase genes that 53

are required for chromosomal integration and excision and further mobility-related genes. 54

Genomic islands are clone- or strain-specific and are never found in all clones of a taxon. 55

Most islands are easily differentiated from the core genome by their atypical GC-contents and 56

atypical oligonucleotide composition with steep gradients thereof at their boundaries (37, 38). 57

First identified in pathogenic bacteria (‘pathogenicity islands‘), genomic islands have 58

meanwhile been detected in numerous non-pathogenic species. Genomic islands may confer 59

fitness traits, increase metabolic versatility or adaptability or promote bacteria – host 60

interaction in terms of symbiosis, commensalism or virulence, respectively (10). 61

The ubiquitous and metabolically versatile Pseudomonas aeruginosa is an important 62

opportunistic pathogen for humans, plants and animals (34). Several large genomic islands 63

have been detected in strains from human infections and aquatic habitats. All known large 64

genomic islands of P. aeruginosa but one (28) integrated into tRNA genes. Two different 65

types were identified, the islands PAGI-2 / PAGI-3 (25) and pKLC102 (21) / PAPI-1 (16), 66

respectively. PAGI-2 and PAGI-3 were sequenced in strains C and SG17M of the major clone 67

C (41), an isolate from the lungs of a patient with cystic fibrosis and an isolate from a river. 68

PAGI-2 and PAGI-3 integrate into tRNAGly

genes adjacent to the PAO homolog PA2820. In 69

both islands the first open reading frame (ORF) adjacent to the tRNAGly

gene encodes a 70

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 4: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

4

bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 71

bipartite structure. The first part adjacent to the tRNA gene consists of strain-specific ORFs 72

encoding metabolic functions and transporters, the majority of which has homologs of known 73

function in other eubacteria (cargo region). The second part is made up of a syntenic set of 74

ORFs the majority of which are either classified as conserved hypotheticals or are related to 75

DNA replication or mobility genes (conserved part). Forty-seven of these ORFs are arranged 76

in the same order in both islands with an amino acid identity of 35 – 88%. 77

The other known large genomic islands integrated into one of the two identical 78

tRNALys

genes adjacent to PAO1 homologs PA0976 and PA4541, respectively. The 79

sequenced islands that integrated adjacent to PA4541 are the pathogenicity island PAPI-1 of 80

strain PA14 (16) and the mobile genetic element pKLC102 of the clone C strain SG17M (21). 81

The 104 kb pKLC102 and the 108 kb PAPI-1 share a phage module that conferred integrase, 82

the att element and a syntenic set of conserved genes, similar to those detected in PAGI-2 and 83

PAGI-3 (21). The other tRNALys

gene adjacent to PA0976 is targeted by genomic islands of 84

variable size (4 – 81 kb in six sequenced strains) and of variable gene contents (16, 21, 24). 85

These islands encode the type III secretion effector protein ExoU, a potent cytotoxic lipase 86

(43), in exoU-positive strains (24). The sequence analysis suggests that the exoU - containing 87

genomic islands probably evolved from an ancestral plasmid similar to pKLC102. Subsequent 88

integrations of IS elements, deletions and rearrangements may then have led to the 89

contemporary diversity of the islands (24). 90

The integration sites for all these large genomic island are located in the three 91

hypervariable regions of the P. aeruginosa chromosome (17, 39). Since the PAO gene contig 92

of these regions spans genomic segments of variable size in other clones (17), we 93

hypothesized that genomic islands account for their pronounced plasticity. We were curious 94

to know whether and to what extent the sequenced pKLC102 and PAGI-2 are prototypes for 95

these suspected genomic islands. PAGI-2 and pKLC102 share a set of 36 homologous genes 96

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 5: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

5

fifteen of which have been identified in numerous genomic islands of other proteobacteria 97

(29). In this study the presence of homologs of all ORFs of pKLC102 and PAGI-2 was 98

investigated in a panel of 71 genetically unrelated strains of diverse habitats and geographic 99

origin (30) to assess the abundance and conservation of these types of genomic islands in P. 100

aeruginosa. 101

Genomic islands are typically stably integrated in the host chromosome. The 102

reversible integration and excision of genomic islands has so far been documented for only 103

few cases such as the clc element of Pseudomonas putida strain RR21 (14), pathogenicity 104

islands of Vibrio cholerae (32), Shigella flexneri (42) and Yersinia pseudotuberculosis (27, 105

33), integrative and conjugative elements (ICEs) of Escherichia coli strain ECOR31 (44) and 106

of Vibrio cholerae (7, 8), and the SaPIbov2 pathogenicity island of Staphylococcus aureus 107

(51), the latter two not being integrated into a tRNA gene. Among the P. aeruginosa islands 108

pKLC102 is known to coexist in episomal and chromosome-integrated forms in clone C 109

strains (21, 41), but no information was available about the chromosomal stability of the other 110

three sequenced large genomic islands. Hence the relative amounts of integrated and episomal 111

forms were determined for PAGI-2, PAGI-3, pKLC102 and PAPI-1 during growth in vitro. In 112

parallel, the four genomic islands were analyzed in their oligonucleotide usage patterns to 113

unravel their genomic signature and any commonalities with each other and their P. 114

aeruginosa host chromosome. In particular pKLC102 turned out to behave like a foreign 115

selfish element consistent with its exceptionally high mobility. 116

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 6: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

6

Materials and Methods. 117

Oligonucleotide usage statistics. Overlapping oligonucleotide words of a certain 118

length lw were counted in the sequence of Lseq nucleotides by shifting the window in steps of 1 119

nucleotide. The total word number (Wtotal) is Lseq − lw in a linear sequence or Wtotal = Lseq in a 120

circular sequence. Since Lseq >> lw, Wtotal ≅ Lseq in all cases. For a given word length lw, Nw = 121

4lw

different words are possible for a sequence of four letters A, T, G and C. The observed 122

counts of words (Co) were compared with the expected counts of words (Ce). Assuming the 123

same distribution frequency for all words of a common length lw irrespective of their 124

composition and sequence, Ce matches the standard count number Cn0 125

1

0

−×== wtotalne NWCC (1) 126

Correspondingly, if we normalize oligonucleotide usage (OU) by mononucleotide content 127

using zero-order Markov method (1), Ce becomes 128

1ne CC = 129

The deviation ∆w of observed from expected counts is given by 130

1

00 )(−

×−=∆ new CCC (2) 131

In the present work we used the following abbreviations for the different types of patterns: 132

type_lwmer. Types are called ‘n0’, if they are not normalized by mononucleotide frequency, 133

or ‘n1’, if they are normalized by the zero-order Markov method. For example, the non-134

normalized tetranucleotide usage pattern is a n0_4mer type, the normalized tetranucleotide 135

usage pattern is a n1_4mer type. 136

Variance OUV of word deviations were determined as follows: 137

1

2

=

w

w

w

NOUV (3) 138

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 7: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

7

For the comparison of sequences by OU patterns of the same type, the words in each 139

sequence were ranked by ∆w values according to equation 2. Rank numbers instead of word 140

counts were used to simplify pattern comparison. 141

The distance D between two patterns was calculated as the sum of absolute distances between 142

ranks of identical words in patterns i and j as follows: 143

minmax

min,,

100(%)DD

Drankrank

D w

jwiw

−−

×=

∑ (4), 144

whereby 145

2

)1(max

−=

ww NND (5). 146

Dmax is the maximal distance that is theoretically possible between two patterns of lw long 147

words (equation 5). Dmin is the minimal distance between two patterns. The minimal distance 148

is zero for two independent sequences, but has a positive value for the two complementary 149

strands of the same DNA sequence, because the OU patterns designed for both strands of the 150

same DNA molecule cannot be identical. The pattern skew (PS) describes this distance 151

between opposite strands of the same DNA and is a measure of OU symmetry. The minimal 152

theoretical distance between two patterns of opposite strands is realized if the words and their 153

reverse complements are distributed with similar frequencies in the sequence and it is 154

Dmin = 4 lw

, if lw is an odd number (6a) 155

but 156

Dmin = 4 lw

− 2 lw

, if lw is an even number (6b), 157

because palindromes, which occur in both strands with the same frequency, only exist 158

in words with an even number of nucleotides and the total number of all possible palindromes 159

is 2 lw

. 160

The computational program for determining OU patterns, their comparative analysis 161

and storage in a database was written on Python 2.2 [http://www.python.org/] (38). 162

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 8: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

8

Strains. Seventy-one P. aeruginosa strains of diverse origin and unrelated SpeI 163

genotype (Table 1) (30) were selected from an in-house strain collection. Moreover, the 164

sequenced reference strains PAO1 and PA14 were included. Multilocus genotyping was 165

performed in informative SNPs of the loci oriC, oprL, alkB2, gltA, oprI, ampC, fliC, exoS and 166

exoU as described previously (5, 30). The 16 binary SNP genotypes of the 71 strains (Table 1) 167

were represented by a 4-digit hexadecimal code (see supplementary material, Table S1): The 168

16 SNPs were divided into 4 groups of 4 SNPs each, and the 16 possible combinations in each 169

group were differentiated by 16 characters (0-9, A-F). Sequential P. aeruginosa isolates were 170

collected from the airways of 36 individuals with cystic fibrosis in half year intervals since 171

the onset of airways colonization over a period of up to 21 years. Strains were screened for 172

the presence of PAGI-2 by PCR with specific primers for the gene C10 (25). Cupriavidus 173

strains were supplied by Max Mergeay, Mol, Belgium (Table 1). Unless otherwise stated, 174

strains were grown in liquid LB medium or on LB agar plates. 175

DNA preparation. DNA manipulations followed standard procedures (2). High-176

molecular weight chromosomal DNA of P. aeruginosa was prepared following the protocol 177

of Goldberg & Ohman (15). Small scale isolations of plasmid and cosmid DNA were 178

performed by using QIAprep spin miniprep kits (Qiagen), while larger amounts of cosmid 179

DNA were purified by using QIAtip100 columns (Qiagen) following the instructions of the 180

supplier. 181

Combinatorial PCR. PCR was performed with PA14- , PAPI-1-, SG17M-, 182

pKLC102-, C-, PAGI-2- or PAGI-3-derived target-specific primer sequences (see Table S2, 183

supplementary material) and 50 ng P. aeruginosa DNA in a 50 µl reaction mixture (5 µl 10x 184

reaction buffer (Eurogentec), 3.3 µl 25 mM MgCl2, 1 µl DMSO, 10 µl primer solution (5 µM 185

each), 3 µl dNTPs (2 mM each), 1 U GoldstarTM

-DNA-Polymerase (Eurogentec)). For PCR 186

kinetics, aliquots of 5 µl were withdrawn at the indicated cycles, separated by electrophoresis 187

and stained with ethidium bromide. The relative amounts Ni and Nj of the template DNA 188

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 9: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

9

sequences i and j in the reaction mixture were determined from the titration for the first 189

reaction cycle n when the PCR products became visible by ethidium bromide fluorescence 190

during late exponential phase of PCR according to the equation 191

Ni / Nj = (1 + R)n(j) – n(i)

(7), 192

whereby the efficiency R of the used thermocycler during the exponential phase of PCR was 193

determined to be R = 0.78 ± 0.02 for PCR products of 100 – 800 bp in length within the 194

interval of reaction cycles 10 < n < 35 (6, 18). 195

Southern hybridization analysis. To visualize the copy number of PAGI-2 and 196

pKLC102 type islands in P. aeruginosa strains, XhoI- or NcoI-restricted genomic DNA was 197

separated by agarose gel electrophoresis, blotted onto Hybond N+ membranes (Amersham), 198

hybridized with digoxigenin (DIG)-labelled PCR generated probes and detected by 199

chemoluminescent immunoreactive signals applying standard procedures (40). According to 200

BlastN analysis the primer sequences were specific for PAGI-2 or pKLC102 and showed no 201

homology to the PAO1 genomic sequence (49). 202

Macroarrays. 203

Design. PCR products generated with PAGI-2- or pKLC102-derived primer sequences were 204

spotted onto nylon membranes. The scheme is shown in Figure 1. For the PAGI-2 205

macroarrays, 91 PCR products were distributed onto the membrane representing 93 of the 111 206

predicted ORFs (Figure 1A). ORF C47 was represented by two different products (“C47a”, 207

“C47b”), the adjacent genes C54 and C55, C76 and C77, and C82 and C83 were each 208

represented by a single ORF- spanning PCR product. 209

Eighty-five ORFs were represented in the pKLC102 macroarray (Figure 1B), whereby ORFs 210

CP94 and CP103 were represented by three (“CP94a”, “CP94b”, “CP94c”) and two PCR 211

products (“CP103a”, “CP103b”), respectively. Three PCR products span two ORFs each 212

(CP47 and CP48, CP52 and CP53, CP73 and CP74). One spot (“ori2”) contained a part of 213

oriV (21). Control PCR products were spotted in the left lower corner. In case of the PAGI-2 214

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 10: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

10

array, the five dots contained partial sequences of (from top to bottom) the P. aeruginosa 215

genes gltA, fliC (type A), fliC (type B), (positive controls), of an intergenic sequence of 216

Pseudomonas putida KT2440 and of the human ob gene (negative controls). In addition to 217

these five control dots, the pKLC102 macroarray contained in the second lane from the left 218

the five controls „ori1“ of pKLC102, PA0977, PA0981 of P. aeruginosa PAO1 and two 219

PAGI-2 homologs of P. aeruginosa TB. 220

Production of macroarrays. Probe sequences of a length of 208 bp to 805 bp were generated 221

by four PCRs with cosmids encoding pKLC102 (21) or PAGI-2 sequences (25) as template. 222

The contig CP39 – CP41 was amplified from P. aeruginosa C genomic DNA. Primer 223

sequences are listed in Table S2, supplemental material. All PCR reactions were performed 224

with 40 – 200 ng cosmid DNA or 100 – 200 ng genomic DNA in a final volume of 100 µl (10 225

µl 10x buffer (500 mM Tris/HCl, 160 mM NaNH4SO4, 0.1 % (v/v) Tween 20, pH 8.8), 2 µl 226

50 mM MgCl2, 6 µl each of 5 µM primer A and B stock solutions, 2 µl DMSO, 6 µl 8 mM 227

dNTPs (2 mM each nucleotide), 2 U Taq-DNA polymerase (InViTek)). After denaturation for 228

300 s at 96°C, 35 cycles were run (annealing for 45 s at 60°C or 58°C, elongation for 45-90 s 229

at 72°C, denaturation for 120 s at 94°C). According to agarose gel electrophoresis and 230

subsequent ethidium bromide stain more than 80 % of all PCR products were at least 99.9 % 231

pure and all other PCR products were at least 98 % pure. Macroarray copies were produced in 232

parallel from the same stock of pooled PCR products to ensure that the corresponding ORFs 233

were represented by identical amounts of DNA on each membrane. Hence for each of the 96 234

PCR products, an aliquot of 50 µl of pooled PCR product, 85 µl TE-buffer and 15 µl 3 M 235

NaOH was dispensed in a well of a 96-well plate, denatured for 30 min at 65°C and chilled on 236

ice. After addition of 100 µl 3 M ammonium acetate, aliquots of 100 µl each were transferred 237

by a minifold-dot-vaccum-blot apparatus (Schleicher & Schüll) onto Hybond N+ nylon 238

membranes soaked in 1 M ammonium acetate. The membrane was dried and the DNA 239

immobilized by irradiation with UV light. 240

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 11: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

11

Hybridization of macroarrays. Membranes were incubated for 2 – 16 h at 68°C with 241

hybridization buffer (0.5 M sodium phosphate, 7% SDS, 1 mM EDTA, 0.5% blocking reagent 242

(Roche), pH 7.2), hybridized for 16-24 h at 68°C in the same buffer with DIG-labelled 243

genomic DNA and then washed twice for 30-45 min each at 68°C in washing buffer (40 mM 244

sodium phosphate, 1% SDS, 1 mM EDTA, pH 7.2). Detection of DIG-labelled fragments by 245

anti-DIG conjugate antibody, enzymatic cleavage of CDP-StarTM

and exposure to X-ray films 246

were performed as described previously (40). 247

Evaluation of macroarray hybridization signals. Signals were classified as strong, weak or 248

negative according to the signal intensity of the hybridization of labelled PCR products of 249

known sequence onto restricted cosmid DNA. Strong hybridization signals were obtained for 250

homologs of 85% sequence identity or more. Control hybridizations of PAGI-2 onto the 251

pKLC102 macroarray gave negative signals for all pKLC102-derived gene fragments of the 252

array, whereas the reciprocal hybridization of pKLC102 onto the PAGI-2 array revealed weak 253

signals for four of the 34 homologs. The nucleotide sequence identity of the PCR amplified 254

fragment with its homologous gene was 72%, 76%, 74% and 63% for C49, C65, C71, and 255

C108, respectively. The E-values of the corresponding BlastN comparisons were 1E-80, 2E-256

115, 2E-127 and 2E-55. Importantly, the weak homolog of C108 in pKLC102 carried a 28 bp 257

stretch of identical sequence which may explain the occurrence of the weak cross-258

hybridization signal despite of the lower overall homology. In general, however, applying the 259

stringent hybridization conditions a minimal sequence identity of 75% between membrane-260

bound PCR product and DIG-labelled genomic sample was estimated to be the threshold for 261

generating hybridization signals. 262

Parsimony analysis was performed with the program “pars” from the software package 263

“phylip3.66” (http://evolution.genetics.washington.edu/phylip.html). Signals obtained with 264

the positive controls PAGI-2 and pKLC102 were defined as the standard normalized to ‚1‘ for 265

all island ORFs on the macroarrays. In case of PAGI-2 subtypes, signals of C1 (integrase 266

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 12: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

12

gene), C84 and C85 (transposon genes) and C68/C69 were excluded from the analysis 267

because of possible cross-hybridization of homologs or of occasionally false negative signals 268

(C68/C69). Similarly the ORFs CP84, CP85, CP86 and CP103 of pKLC102 were excluded 269

because homologs are encoded elsewhere in the genome. The purified datasets of all strains 270

were then either combined or separately evaluated by parsimony analysis with PAGI-2 and/or 271

pKLC102 as reference, respectively. 272

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 13: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

13

Results and Discussion. 273

Local tetranucleotide signature. The local tetranucleotide usage was calculated for 274

the four genomic islands pKLC102, PAPI-1, PAGI-2 and PAGI-3 (Figure 2). Values for a 5 275

kb sliding window were compared with the global tetranucleotide usage of the whole P. 276

aeruginosa PAO1 chromosome. The variance of tetranucleotide frequencies OUV is the 277

difference between the empiric frequency and the null hypothesis of an equal frequency of all 278

256 tetranucleotides (52). OUV is primarily shaped by the local G+C content in P. aeruginosa 279

(52), and hence we calculated OUV:n1_4mer normalized for mononucleotide frequencies 280

(37). These OUV:n1_4mer values reflect the species-specific selection of tetranucleotides 281

normalized for the high G+C content of P. aeruginosa and are an appropriate measure of the 282

oligonucleotide signature of the genome. The local OUV:n1_4mer values of all four islands 283

were consistently below the median OUV value of 0.37 of the P. aeruginosa PAO1 284

chromosome. In other words, the selection of tetranucleotides is less biased in the islands than 285

in the P. aeruginosa core genome and hence may facilitate the horizontal spread of the islands 286

to bacterial species with another oligonucleotide signature. 287

The parameter ‘distance’ D compares the rank order of tetranucleotide frequencies in 288

two patterns (37), i.e. in this case the rank order in a 5 kb window compared to that of the 289

whole genome (see Material and Methods). Local D values were similar to that of the P. 290

aeruginosa core genome throughout the whole genomic island PAGI-2 (Figure 2). The other 291

three islands, however, showed high peak values in several regions. In case of PAGI-3, almost 292

all genes in the strain-specific cargo regions (see supplementary material, Table S3) (25), but 293

none of the genes that are conserved among members of this family of genomic islands (39) 294

were harbouring an atypical oligonucleotide composition. Peaks of D values are flanked by 295

various small transposable elements highlighting the complex architecture of PAGI-3 (see 296

Table S3, supplementary material) (25). In pKLC102 the loci with atypical local 297

oligonucleotide composition were predominantly associated with genes that are necessary for 298

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 14: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

14

conjugation and integration such as sex pili, relaxase and integrase (Table 2) (21). In PAPI-1 299

these loci were either flanked on one side by a direct or inverted repeat or were part of a type 300

IV pilus biogenesis machinery (16). In summary, regions with an atypical oligonucleotide 301

composition encode repeats and/or elements of genetic mobility. 302

The parameter ‘pattern skew’ PS describes this distance D between opposite strands of 303

the same DNA and is a measure of oligonucleotide symmetry (37, 38). Comparatively low 304

local PS values such as 21% calculated for 5 kb sliding windows in the P. aeruginosa PAO1 305

chromosome are typical for bacterial chromosomes that are characterized by strand symmetry 306

and intrastrand parity of complementary oligonucleotides (37). The profiles of local PS values 307

roughly followed that of local D values in all four genomic islands, but - more importantly - 308

the absolute values were within or above the upper outer quartile of local PS in the host 309

chromosome. With the exception of one small peak, local PS was scattering between 20% and 310

30% throughout PAGI-2. Of the four islands, PAGI-2 has PS values most similar to those of 311

its host chromosome. In contrast, higher basal values of about 30% and numerous peaks with 312

anomalously high local PS are typical for the other three islands. The maximal values were 313

close to or above the value of 60% of a random sequence implicating that in these peak 314

regions no strand symmetry exists. In other words, oligonucleotide frequency on the two 315

strands is only weakly correlated in all four islands and is completely lost in the peak regions 316

of pKLC102 (three segments), PAPI-1 (two segments) and PAGI-3 (three segments). 317

In summary, the local tetranucleotide signature of all four islands is distinct from that 318

of the P. aeruginosa chromosome. PAGI-2 is homogeneous in its tetranucleotide composition 319

throughout the island, but pKLC102, PAPI-1 and PAGI-3 each contain regions of highly 320

atypical tetranucleotide composition. 321

322

Chromosomal stability of island integration. The atypical oligonucleotide signature, 323

particularly the pronounced strand asymmetry, prompted us to investigate whether the islands 324

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 15: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

15

could spontaneously excise from their host chromosomes. All four islands are endowed with 325

genetic elements of mobility. They harbour phage modules (Table 2) that encode 326

chromosome partitioning proteins (soj) at one terminus and integrases of the bacteriophage P4 327

subfamily (PAGI-2, PAGI-3 (25)) or a phage tyrosine integrase (pKLC102 (21); PAPI-1 (16)) 328

at the other end. PAPI-1 and pKLC102 moreover encode numerous ORFs which are related to 329

plasmid-encoded replication and recombination functions. 330

Combinatorial PCR that spans the integration sites of the islands was applied to detect 331

excised circularized islands and island-free chromosomes compared to integrated genomic 332

islands. Overnight growing cultures were diluted with fresh liquid LB medium and samples 333

were then taken from early exponential to late stationary phase of growth. The relative copy 334

number of circularized PAPI-1 was estimated to be 2% of that of PA14 chromosomes. About 335

0.3 - 1% of PA14 chromosomes did not carry an integrated PAPI-1 island. A copy number of 336

10 – 30 circular pKLC102 molecules per SG17M host chromosome was estimated from 337

semiquantitative PCR kinetics (Figure 3). During growth the percentage of pKLC102-free 338

chromosomes increased from about 0.1% in early exponential phase to approximately 3% in 339

stationary phase (Figure 3). In contrast, no circular forms of PAGI-2 or PAGI-3 were detected 340

by combinatorial PCR. Hence the spontaneous excision rates, if they occur, are below the 341

sensitivity threshold of 1 x 10-7

of the assay. Consistent with this finding, no strain C or strain 342

SG17M chromosomes were identified by PCR that had cured PAGI-2 or PAGI-3, 343

respectively. 344

If we assume that PA14 and SG17M cells will grow in rich medium at statistically 345

indistinguishable rates irrespective of the presence or absence of the genomic island in their 346

chromosome, the spontaneous excision rates can be estimated from the semiquantitative PCR 347

kinetics (Figure 3) to be at least 3x10-3

for PAPI-1 in strain PA14 and to be at least 3x10-2

for 348

pKLC102 in strain SG17M. The latter estimate relies on steady state values and hence is 349

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 16: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

16

probably too low because pKLC102 like its relative pKLK106 can reversibly integrate into 350

and excise from its tRNALys

site (20). 351

The precise excision of enterobacterial pathogenicity islands has been reported to 352

occur spontaneously at a frequency of 10-5

to 10-4

(27, 32, 42,44), although mutations, 353

deletions and genome rearrangements are likely to be responsible for the inability of a precise 354

excision and mobilization of most genomic islands. In case of pKLC102 and PAPI-1 the 355

spontaneous excision frequencies from the host chromosome are one or at least two orders of 356

magnitude higher. pKLC102 and PAPI-1 harbour the phage module with the xerC integrase, 357

some plasmid-related genes, a type IV pilus biogenesis gene cassette and a syntenic set of 358

conserved ORFs, similar to those detected in PAGI-2 and PAGI-3 (Table 2). These features 359

probably endow the islands to excise exactly from the chromosome and to form a circular 360

extrachromosomal intermediate of sufficient stability. The relatively lower copy number of 361

PAPI-1 indicates that circular forms were only present in a few percent of cells and will 362

probably modulate the phenotype of the PA14 community only to minor extent. The opposite 363

conclusion applies to pKLC102. Circular forms were in tenfold excess of chromosomal forms 364

demonstrating that circular pKLC102 replicates in its host cell. Moreover, a substantial 365

number of SG17M chromosomes became devoid of pKLC102 during growth to higher cell 366

densities. This data verifies the previous assignment of pKLC102 as being a plasmid (20, 21, 367

41). The functional plasmid module of pKLC102 is apparently responsible for the highest 368

mobility of a genomic island that according to our knowledge has ever been reported. Being a 369

hybrid of phage and plasmid origin (Table 2), pKLC102 may be considered as an 370

intermediate between mobile genetic element and genomic island. 371

Epidemiology of PAGI-2- and pKLC102-like genomic islands in P. aeruginosa. 372

PAGI-2 and pKLC102 share a syntenic set of ORFs (21) homologs of which have meanwhile 373

been detected in more than 30 genomic islands of other beta and gamma proteobacteria (29). 374

The presence of these island types in numerous taxa suggests that they form a family with a 375

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 17: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

17

deep evolutionary origin (29). However, since no epidemiological data have yet been 376

reported, the role of PAGI-2 and pKLC102 in the contemporary P. aeruginosa population is 377

unknown. Hence the abundance and diversity of PAGI-2- and pKLC102-like genomic islands 378

was investigated in 71 strains of unrelated SpeI genotypes (Table 1) (30). The panel includes 379

isolates from diverse habitats and geographic origin and is a representative sample of present-380

day P. aeruginosa clones. The reader may note that 36 of the 71 strains share their SNP 381

genotype with at least one other strain in the panel (7 pairs, 6 trios, 1 quadruple; see the 382

hexadecimal genotype of strains listed in Table 1). This finding implicates that differences in 383

the accessory genome frequently give rise to macrorestriction fragment patterns that are 384

classified as distinct P. aeruginosa genotypes by accepted criteria (40) although the SNP 385

genotype of the core genome is identical. 386

Macroarrays of PAGI-2 and pKLC102 ORFs (Figure 1) were hybridized with the 387

strains’ DNA under high stringency to suppress equivocal cross-hybridization signals of 388

homologous genes (see ‘Material and Methods’). The hybridization analyses were calibrated 389

with samples and probes of known sequence so that a sequence identity of at least 75% was 390

required for a positive signal. An identity of 85% or more between the two sequences yielded 391

strong hybridization signals. Tables 3 and 4 show the results of macroarray hybridizations of 392

strains with positive hybridization signals. 393

PAGI-2 type islands were detected in 31 of the 71 strains (44%). Twelve strains were 394

harbouring one island, 11 strains two islands, 7 strains three islands and one strain four 395

islands. The identified islands were grouped into ten subtypes according to their hybridization 396

patterns (see Figure S1, supplemental material). Typical examples are shown in the upper 397

panel of Figure 4. Two environmental isolates from aquatic habitats in the Rhine-Ruhr area 398

(Germany) and an ear infection isolate from the USA were carrying PAGI-2 (Fig. 4B). The 399

pattern of Fig. 4C was typical for 13 strains: The strain C – specific cargo genes encoding 400

metabolic functions stretching from C2 to C35 were absent (Table S3, supplementary 401

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 18: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

18

material), but the region of conserved hypotheticals was present in the genomic DNA. 402

Subtypes differed by variable hybridization signals of ORFs C76, C77, C82, C83 and C92. 403

Ten strains like that shown in Fig. 4D lacked in addition the gene contig C56 - C63 that is 404

characterized by a consistent low G+C content of 59% (25). Figures 4E and 4F visualize 405

singular cases: the combination of strain C cargo genes with the lack of C56 – C63 (Fig. 4E) 406

and the strain with the least hybridization signals. 407

pKLC102 type islands were identified in 50 of the 71 strains (70%). Fifteen subtypes 408

were differentiated by hybridization pattern (see Figure S1b, supplemental material) eight of 409

which represented by a single strain. Nine clinical and environmental isolates including an oil 410

field isolate from Japan were harbouring pKLC102 (Fig. 4H). Strong oriV-reacting signals 411

were observed in ten subtypes suggesting that these 26 strains may also harbour mobile 412

genomic islands like strain SG17M does. The most common subtype K1d (see supplemental 413

material) was shared by 14 strains. It lacked homologs for eight pKLC102 ORFs including 414

the major putative virulence factor chvB CP94 (21). Weak hybridization signals indicated that 415

the sequences of oriV and 14 further ORFs should substantially deviate from those of the 416

pKLC102 blueprint. Combinatorial PCR of DNA from subtype K1d strains revealed 417

extrachromosomal circular forms in yields similar to those obtained with strain SG17M (data 418

not shown) suggesting that the most abundant subtype can also replicate in its host cell. 419

Figure 5 summarizes the hybridization results. The signal patterns of the PAGI-2 420

macroarray are in accordance with the known bipartite structure of individual cargo and 421

syntenic homologs in the sequenced islands PAGI-2 and PAGI-3 (25). The ‘cargo’ genes C2 422

– C35 which have homologs of known function in other eubacteria (Table S3, supplementary 423

material) were only detected in PAGI-2 and a close derivative thereof (subtype G2c). PAGI-2 424

subtypes vary in their attributes encoded by the accessory clusters of ‘cargo’ genes. The 425

commonalities of PAGI-2 type islands are 68 - 77 homologs that encode genes related to 426

replication or genetic mobility or that are conserved hypotheticals of unknown function. 427

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 19: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

19

Thirty-six of these ORFs have homologs in pKLC102. pKLC102 type islands were more 428

diverse than PAGI-2 types in their combination of gene cassettes in accordance with their 429

nestled arrangement of island- and subtype-specific ORFs (21), but they apparently carried 430

fewer strain-specific cargo genes. A backbone of more than 50% of ORFs including the 36 431

PAGI-2 homologs was found to be highly conserved among all pKLC102 type islands. For 432

90% of the pKLC102 ORFs, homologous sequences were identified in the majority of islands. 433

Only the contig CP94 – CP101 and ORF CP32 were missing in most strains (Table 4). The 434

least abundant ORF CP32 has the lowest G+C content (41.6%) of all ORFs in pKLC102 and 435

served as the integration site for an integron in strain C (21) which led to large genome 436

rearrangements in sequential isolates from individuals with cystic fibrosis (22). In summary, 437

the diversity of PAGI-2 islands is mainly caused by the insertion of one large block of strain-438

specific cargo genes, whereas the diversity of pKLC102 islands is primarily generated by 439

subtype-specific combinations of gene cassettes. 440

Figure 6 visualizes the outcome of the parsimony analysis of the relatedness of strains 441

classified by their PAGI-2 and pKLC102 hybridization patterns. The broad diversity 442

introduced by subtype-typical ORFs is highlighted by the multiple nodes in the dendrogram. 443

Importantly, no segregation of PAGI-2 and pKLC102 subtypes was noted. In other words, no 444

restrictions in the combination of subtypes of the two classes were observed in the 26 strains 445

which harbour both PAGI-2 and pKLC102 type islands in their chromosome. All nodes are 446

just occupied by a single strain implicating that microevolution in the large genomic islands 447

contributed substantially to interclonal diversity of our P. aeruginosa strain panel. 448

Spread and loss of PAGI-2. No spontaneous excision of PAGI-2 from its strain C 449

chromosome was demonstrated during growth in vitro (see above), but we nevertheless still 450

suspected that PAGI-2 type islands are mobilized from their host chromosomes in vivo and 451

can spread to other strains – at least at low frequency, because the closely related clc element 452

of Pseudomonas putida RR21 that shares 85-100% nucleotide sequence identity in the 453

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 20: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

20

conserved region with PAGI-2 (14), was found to be capable of self-transfer to other beta and 454

gamma proteobacteria (35, 36). Hence we searched our P. aeruginosa strain collection of 455

sequential airway isolates from 36 individuals with cystic fibrosis for excision events of 456

PAGI-2 type islands. PAGI-2 typical genes were detected in fifty isolates collected from six 457

chronically colonized patients. Two index cases of loss of PAGI-2 were identified (Figure 7). 458

One patient was chronically co-colonized with PAGI-2 positive and PAGI-2 negative clone C 459

(Fig. 7A, 7B), whereby the first PAGI-2 negative strain (7B) was isolated from the patient’s 460

airways two years after the acquisition of P. aeruginosa clone C. At least another PAGI-2 461

subtype was retained in the PAGI-2 negative clone C strain (7B). The reader may note that 462

the hybridization pattern shown in Fig, 7A represents the sequenced PAGI-2 of the strain C 463

genome (25). In other words, the sequenced PAGI-2 was spontaneously excised from its host 464

chromosome in the cystic fibrosis lung. The other case was another P. aeruginosa clone C 465

carrier who had lost PAGI-2 in her last clone C – positive culture 17 years after the 466

acquisition of clone C (Fig. 7C, 7D) and became subsequently superinfected with two other P. 467

aeruginosa clones. 468

The cystic fibrosis lungs are an atypical and evolutionarily very recent niche for P. 469

aeruginosa which inhabits numerous aquatic as well as animal and plant host associated 470

environments, but to date sequential isolates from CF patients are the only source that is 471

available for longitudinal studies on the microevolution of the P. aeruginosa genome in the 472

time frame of years or decades (20, 47, 50). For all other habitats of P. aeruginosa, the natural 473

history and spread of a genomic island can only be deduced from cross-sectional sequence 474

comparisons of well characterized strains from a documented source. The PAGI-2 type clc 475

element, for example, is almost 100% identical over the whole length to a chromosomal 476

region in the beta proteobacterium Burkholderia xenovorans LB400 (14, 25). Similarly, a 477

contig of close to 100% nucleotide sequence identity to PAGI-2 was identified in the beta 478

proteobacterial Cupriavidus (formerly Ralstonia) metallidurans CH34 genome (25). To test 479

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 21: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

21

whether this finding is typical for Cupriavidus, a collection of six C. metallidurans strains 480

from Belgium and Germany and two C. campinensis strains from the USA was tested for the 481

presence of PAGI-2 by macroarray hybridization (Figure 8). Five isolates from sewage and 482

heavy metal ion polluted environments including the two strains from North America were 483

harbouring complete copies of PAGI-2 (Figure 8A). Strain C. metallidurans CH79 was 484

carrying a PAGI-2 subtype with a complete set of cargo genes. These results demonstrate the 485

transcontinental spread of PAGI-2 type islands into diverse clinical and environmental 486

habitats (see Table 1) and phylogenetically unrelated beta (genus Cupriavidus, Burkholderia) 487

and gamma proteobacteria (genus Pseudomonas). 488

Diversity of pKLC102 and PAGI-2 type islands in proteobacteria. More than 30 489

contiguous sequences were found in the databases to exhibit significant sequence similarity to 490

at least 15 ORFs of the syntenic set of conserved hypotheticals in pKLC102 and PAGI-2. For 491

proteobacteria other than P. aeruginosa, the boundaries of the genomic islands could be 492

precisely defined for 17 strains by oligonucleotide signature and tRNA integration sites. The 493

similarity of these genomic islands was evaluated by the distance D:n0_4mer of their 494

selection of tetranucleotides (Figure 9). Three branches could be distinguished. Haemophilus 495

pathogenicity islands clustered with that of Neisseria gonorrheae, and enterobacterial 496

pathogenicity islands clustered with islands of the phytopathogen Erwinia carotovora and of 497

the entomopathogen Photorhabdus luminescens. PAGI-2 and pKLC102 belonged to the most 498

clearly separated group characterized by a rather homogeneous profile of tetranucleotide 499

selection (Figure 9). The beta and gamma proteobacterial host strains of this group are 500

endowed with broad metabolic versatility, particularly their capability to degrade complex 501

aliphatic and aromatic hydrocarbons, and as a corollary they share highly similar metabolic or 502

fitness islands that by the extreme can be identical in sequence as it has been seen for PAGI-2 503

in Cupriavidus and P. aeruginosa (this work) or for the clc element in P. putida RR21 and 504

Burkholderia xenovorans LB400 (14). The host strains of the islands belong to taxa that prior 505

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 22: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

22

to the introduction of the rDNA based taxonomy had been classified as pseudomonads by 506

lifestyle and metabolic features (48). Consistent with this abandoned chemotaxonomic 507

classification, the genomic islands of the ‘Pseudomonas’ group encode a wide range of 508

metabolic features and resistance determinants. The similar oligonucleotide usage of the 509

islands probably facilitates the spread of this gene pool among strains that previously had 510

been lumped together as ‘pseudomonads’ due to their metabolic versatility. 511

Similar profiles of tetranucleotide usage of pKLC102 type islands of phylogenetically 512

distinct hosts suggest that the horizontal transfer of pKLC102 ancestors between 513

‘pseudomonads’ happened in the past, but the data provides no clue why pKLC102 remained 514

so mobile and was not irreversibly captured by its host chromosomes. pKLC102 did not lose 515

the phage and plasmid modules like the exoU encoding islands that became irreversibly fixed 516

in the chromosome by secondary events such as rearrangements and the acquisition and loss 517

of genetic elements (24). Inspection of the pattern skew of oligonucleotide composition 518

(Figure 10) provided an argument why the genomic islands of the pKLC102 / PAGI-2 family 519

are endowed with exceptionally high mobility. Figure 10 displays the pattern skew PS 520

n0_4mer of 22 pKLC102 type genomic islands and their host chromosomes. Pattern skew was 521

only a few percent for most chromosomes in accordance with previous analyses that bacterial 522

chromosomes are characterized by strand symmetry and intrastrand parity of complementary 523

oligonucleotides (37). Oligonucleotides and their reverse complements share physicochemical 524

properties such as base stacking energy, propeller twist angle, bendability and position 525

preference (3, 4) and occur with similar frequency in bacterial chromosomes in accordance 526

with Chargaff’s second parity rule (9). In contrast, no such correlation is observed for a 527

random sequence (PS ≈ 50 %). The 22 studied genomic islands of the pKLC102 family were 528

intermediate between chromosome and random sequence. The PS n0_4mer values of 18 529

islands were above the 95% confidence intervals of n0_4mer PS values of 155 completely 530

sequenced bacterial chromosomes and 316 plasmids (37) (Figure 10). PAGI-2 belonged to the 531

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 23: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

23

four islands with a PS value within the confidence interval. PS of PAGI-3 was within the 532

inner quartiles of PS of the pKLC102 family. PAPI-1 and pKLC102 together with four other 533

islands from Erwinia carotovora SCRI1043, Photorhabdus luminescens TT01, Pseudomonas 534

fluorescens Pf-5 and Yersinia enterocolitica 8081 exhibited the highest PS of 26% and more. 535

Thus the tetranucleotide frequency of the complementary strands was least correlated in these 536

six members of the pKLC102 family. Stably integrated genomic islands have an atypical 537

oligonucleotide composition compared to that of the core genome, but strand symmetry is 538

locally maintained (38). The conjugative islands of the pKLC102 / PAGI-2 family, 539

particularly the six islands with the highest PS, however, do not adhere to this rule and do not 540

only locally (see Fig. 2), but also globally perturb strand symmetry. Typically pattern skew 541

values of both chromosomal replichors are approximately the same (37) and hence 542

compensate each other to values close to zero for the whole chromosome (see Figure 10). 543

This data means that both replichors carry the same burden of physicochemical constraints 544

exerted by strand asymmetry (37). The integration of a pKLC102 element perturbs this subtle 545

balance. As long as PS of the foreign genetic element remains high, spontaneous excision 546

from the chromosome will occur. In summary, the high pattern skew together with functional 547

phage and/or plasmid modules may account for the high mobility of the islands of the 548

pKLC102 / PAGI-2 family, suggesting that they behave like selfish parasitic DNA that is 549

prone to horizontal spread within and among taxa. 550

The role of pKLC102 / PAGI-2 type islands in bacterial evolution. The gene 551

repertoire of a bacterial cell consists of genes that have been transmitted vertically over long 552

periods of time and of genes that were acquired or generated at various points of the lineage, 553

including some very recently (26). Horizontal gene transfer provides most of the diversity in 554

the genomic repertoire, but the majority of these horizontally acquired genes that persist in 555

genomes are transmitted strictly vertically (26). Hence despite substantial horizontal gene 556

transfer the phylogenetic relationships between taxa are robust as indicated by the congruence 557

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 24: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

24

of gene trees based on rDNA sequence, gene contents or average amino acid identity of 558

shared genes (23). 559

Genomic islands of the pKLC102 / PAGI-2 family are a major exception from this 560

rule. The islands spread across barriers of taxa and genera whilst retaining their identity of 561

oligonucleotide signature. Family members were identified at high frequency in the global P. 562

aeruginosa population, but were also widespread among other beta and gamma 563

proteobacteria. Identical islands were detected in phylogenetically distinct clades and isolates 564

of diverse habitats and geographic origin. High strand asymmetry, phage and/or plasmid 565

modules make up the signature of this evolutionarily ancient island family. 566

pKLC102 / PAGI-2 family members share a syntenic set of homologues. Accessory 567

gene clusters are nestled among this core and encode island-specific features. The sequence 568

diversity of these family-typical core genes is higher than that of vertically transmitted 569

orthologs, probably because divergent evolutionary forces act on sequences during horizontal 570

and vertical transmission: Genes that are irreversibly captured by the host chromosome will 571

minimize strand asymmetry and will become subject to purifying selection like the genes of 572

the core genome (13, 19), whereas the self-transfer of genomic islands into phylogenetically 573

distinct host chromosomes will counterselect strand symmetry, loss of the island-typical 574

oligonucleotide signature and loss of sequence diversity. Thus, on-going horizontal transfer 575

maintains a higher sequence diversity of a genetic element than its irreversible incorporation 576

into a host genome. 577

The function of most genes of the conserved module of the pKLC102 / PAGI-2 family 578

is still unknown although at least a subset should be involved in the excision, transfer, 579

integration or stabilization of the island (45, 46). Moreover, mutagenesis studies in PAPI-1 580

demonstrated that genes of the conserved core are involved in animal and plant virulence of 581

strain PA14 (16). Future research should unravel in more detail to what extent the syntenic 582

gene set is not only essential for the maintenance of the genomic islands of the pKLC102 / 583

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 25: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

25

PAGI-2 family, but also affects the phenotype of its host strain as it has been demonstrated 584

for individual cargo genes (12, 14, 31). 585

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 26: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

26

Acknowledgments. 586

The authors cordially thank Max Mergeay, Laboratories for Microbiology and Radiobiology, 587

SCK.CEN, Mol, Belgium for the supply of Cupriavidus strains. 588

JK, DW and OR were members of the International Research Training Group ‘Pseudomonas: 589

Pathogenicity and Biotechnology’ (IRTG 653 of the Deutsche Forschungsgemeinschaft). 590

Financial support by the priority program ‘Ecology of Bacterial Pathogens: Molecular and 591

Evolutionary Aspects’ and by the Collaborative Research Program SFB 587 (project A9) of 592

the Deutsche Forschungsgemeinschaft is gratefully acknowledged. 593

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 27: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

27

References. 594

1. Almagor, H. 1983. A Markov analysis of DNA sequences. J. Theor. Biol. 104:633-645. 595

2. Ausubel, F. M., R. Brent, R.E. Kingston, D.D. Moore, J.G. Seidmann, J.A. Smith, 596

and K. Struhl, ed. 1994. Current protocols in molecular biology. Wiley, New York, N.Y. 597

3. Baisnee, P.F., S. Hampson, and P. Baldi. 2002. Why are complementary DNA strands 598

symmetric? Bioinformatics 18:1021-33. 599

4. Baldi, P., and P.F. Baisnee. 2000. Sequence analysis by additive scales: DNA structure 600

for sequences and repeats of all lengths. Bioinformatics 16:865-89. 601

5. Bragonzi, A., L. Wiehlmann, J. Klockgether, N. Cramer, D. Worlitzsch, G. Döring, 602

and B. Tümmler. 2006. Sequence diversity of the mucABD locus in Pseudomonas 603

aeruginosa isolates from patients with cystic fibrosis. Microbiology 152:3261-3269 604

6. Bremer, S., T. Hoof, M. Wilke, R. Busche, B. Scholte, J.R. Riordan, G. Maass, and B. 605

Tümmler. 1992. Quantitative expression patterns of multidrug-resistance P-glycoprotein 606

(MDR1) and differentially spliced cystic-fibrosis transmembrane-conductance regulator 607

mRNA transcripts in human epithelia. Eur. J. Biochem. 206:137-49. 608

7. Burrus, V., J. Marrero, and M.K. Waldor. 2006. The current ICE age: biology and 609

evolution of SXT-related integrating conjugative elements. Plasmid 55:173-83. 610

8. Burrus, V., and M.K. Waldor. 2003. Control of SXT integration and excision. J. 611

Bacteriol. 185:5045-54. 612

9. Chargaff, E. 1951. Structure and function of nucleic acids as cell constituents. Fed. Proc. 613

10:344-360. 614

10. Dobrindt, U., B. Hochhut, U. Hentschel, and J. Hacker. 2004. Genomic islands in 615

pathogenic and environmental microorganisms. Nat. Rev. Microbiol. 2:414-424. 616

11. Felsenstein, J. 1986. Distance methods: reply to Farris. Cladistics 2:130-143. 617

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 28: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

28

12. Frantz, B., and A.M. Chakrabarty. 1987. Organization and nucleotide sequence 618

determination of a gene cluster involved in 3-chlorocatechol degradation. Proc. Natl. 619

Acad. Sci. U. S. A. 84:4460-4464. 620

13. Friedman, R., J.W. Drake, and A.L. Hughes. 2004. Genome-wide patterns of 621

nucleotide substitution reveal stringent functional constraints on the protein sequences of 622

thermophiles. Genetics 167:1507-1512. 623

14. Gaillard, M., T. Vallaeys, F.J. Vorholter, M. Minoia, C. Werlen, V. Sentchilo, A. 624

Pühler, and J.R. van der Meer. 2006. The clc element of Pseudomonas sp. strain B13, a 625

genomic island with various catabolic properties. J. Bacteriol. 188:1999-2013. 626

15. Goldberg, J. B., and D.E. Ohman. 1984. Cloning and expression in Pseudomonas 627

aeruginosa of a gene involved in the production of alginate. J. Bacteriol. 158:1115-1121. 628

16. He, J., R.L. Baldini, E. Deziel, M. Saucier, Q. Zhang, N.T. Liberati, D. Lee, J. 629

Urbach, H.M. Goodman, and L.G. Rahme. 2004. The broad host range pathogen 630

Pseudomonas aeruginosa strain PA14 carries two pathogenicity islands harboring plant 631

and animal virulence genes. Proc. Natl. Acad. Sci. U. S. A. 101:2530-2535. 632

17. Heuer, T., C. Bürger, G. Maass, and B. Tümmler. 1998. Cloning of prokaryotic 633

genomes in yeast artificial chromosomes: application to the population genetics of 634

Pseudomonas aeruginosa. Electrophoresis 19:486-94. 635

18. Hoof, T., J.R. Riordan, and B. Tümmler. 1991. Quantitation of mRNA by the kinetic 636

polymerase chain reaction assay: a tool for monitoring P-glycoprotein gene expression. 637

Anal. Biochem. 196:161-169. 638

19. Hughes, A. 2004. Evidence for Abundant Slightly Deleterious Polymorphisms in 639

Bacterial Populations. Genetics 169:533-538. 640

20. Kiewitz, C., K. Larbig, J. Klockgether, C. Weinel, and B. Tümmler. 2000. Monitoring 641

genome evolution ex vivo: reversible chromosomal integration of a 106 kb plasmid at two 642

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 29: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

29

tRNALys

gene loci in sequential Pseudomonas aeruginosa airway isolates. Microbiology 643

146: 2365-2373. 644

21. Klockgether, J., O. Reva, K. Larbig, and B. Tümmler. 2004. Sequence analysis of the 645

mobile genome island pKLC102 of Pseudomonas aeruginosa C. J. Bacteriol. 186:518-646

534. 647

22. Kresse, A.U., S.D. Dinesh, K. Larbig, and U. Römling. 2003. Impact of large 648

chromosomal inversions on the adaptation and evolution of Pseudomonas aeruginosa 649

chronically colonizing cystic fibrosis lungs. Mol. Microbiol. 47:145-58. 650

23. Konstantinidis, K.T. and J.M. Tiedje. 2005. Towards a genome-based taxonomy for 651

prokaryotes. J. Bacteriol. 187:6258-6264. 652

24. Kulasekara, B.R., H.D. Kulasekara, M.C. Wolfgang, L. Stevens, D.W. Frank, and S. 653

Lory. 2006. Acquisition and evolution of the exoU locus in Pseudomonas aeruginosa. J. 654

Bacteriol. 188:4037-4050. 655

25. Larbig, K.D., A. Christmann, A. Johann, J. Klockgether, T. Hartsch, R. Merkl, L. 656

Wiehlmann, H.J. Fritz, and B. Tümmler. 2002. Gene islands integrated into tRNAGly

657

genes confer genome diversity on a Pseudomonas aeruginosa clone. J. Bacteriol. 658

184:6665-6680. 659

26. Lerat, E., V. Daubin, H. Ochman, and N.A. Moran. 2005. Evolutionary origins of 660

genomic repertoires in bacteria. PLoS. Biol. 3:e130. 661

27. Lesic, B., S. Bach, J.M. Ghigo, U. Dobrindt, J. Hacker, and E. Carniel. 2004. Excision 662

of the high-pathogenicity island of Yersinia pseudotuberculosis requires the combined 663

actions of its cognate integrase and Hef, a new recombination directionality factor. Mol. 664

Microbiol. 52:1337-1348. 665

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 30: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

30

28. Liang, X., X.Q. Pham, M.V. Olson, and S. Lory. 2001. Identification of a genomic 666

island present in the majority of pathogenic isolates of Pseudomonas aeruginosa. J. 667

Bacteriol. 183: 843-853. 668

29. Mohd-Zain, Z., S.L. Turner, A.M. Cerdeno-Tarraga, A.K. Lilley, T.J. Inzana, A.J. 669

Duncan, R.M. Harding, D.W. Hood, T.E. Peto, and D.W. Crook. 2004. Transferable 670

antibiotic resistance elements in Haemophilus influenzae share a common evolutionary 671

origin with a diverse family of syntenic genomic islands. J. Bacteriol. 186:8114-8122. 672

30. Morales, G., L. Wiehlmann, P. Gudowius, C. van Delden, B. Tümmler, J.L. 673

Martinez, and F. Rojo. 2004. Structure of Pseudomonas aeruginosa populations 674

analyzed by single nucleotide polymorphism and pulsed-field gel electrophoresis 675

genotyping. J. Bacteriol. 186:4228-4237. 676

31. Müller, T.A., C. Werlen, J. Spain, and J.R. van der Meer. 2003. Evolution of a 677

chlorobenzene degradative pathway among bacteria in a contaminated groundwater 678

mediated by a genomic island in Ralstonia. Environ. Microbiol. 5:163-173. 679

32. Rajanna, C., J. Wang, D. Zhang, Z. Xu, A. Ali, Y.M. Hou, and D.K. Karaolis. 2003. 680

The vibrio pathogenicity island of epidemic Vibrio cholerae forms precise 681

extrachromosomal circular excision products. J. Bacteriol. 185:6893-6901. 682

33. Rakin, A., C. Noelting, P. Schropp, and J. Heesemann. 2001. Integrative module of the 683

high-pathogenicity island of Yersinia. Mol. Microbiol. 39:407-415. 684

34. Ramos, J.L., ed. 2004. Pseudomonas. Vol. 1 – 3. Kluwer Academic / Plenum Publishers. 685

New York, N.Y. 686

35. Ravatn, R., S. Studer, D. Springael, A.J.B. Zehnder, and J.R. van der Meer. 1998. 687

Chromosomal integration, tandem amplification, and deamplification in Pseudomonas 688

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 31: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

31

putida F1 of a 105-kilobase genetic element containing the chlorocatechol degradative 689

genes from Pseudomonas sp. Strain B13. J. Bacteriol. 180:4360-4369. 690

36. Ravatn, R., S. Studer, A.J.B. Zehnder, and J.R. van der Meer. 1998. Int-B13, an 691

unusual site-specific recombinase of the bacteriphage P4 integrase family, is responsible 692

for chromosomal insertion of the 105-kilobase clc element of Pseudomonas sp. strain 693

B13. J. Βacteriol. 180:5505-5514. 694

37. Reva, O.N., and B. Tümmler. 2004. Global features of sequences of bacterial 695

chromosomes, plasmids and phages revealed by analysis of oligonucleotide usage 696

patterns. BMC Bioinformatics 5:90. 697

38. Reva ,O.N., and B. Tümmler. 2005. Differentiation of regions with atypical 698

oligonucleotide composition in bacterial genomes. BMC Bioinformatics 6:251. 699

39. Römling, U., J. Greipel, and B. Tümmler. 1995. Gradient of genomic diversity in the 700

Pseudomonas aeruginosa chromosome. Mol. Microbiol. 17:323-332. 701

40. Römling, U., T. Heuer, and B. Tümmler. 1994. Bacterial genome analysis by pulsed 702

field gel electrophoresis techniques. Adv. Electrophoresis 7:353-406. 703

41. Römling, U., K.D. Schmidt, and B. Tümmler. 1997. Large genome rearrangements 704

discovered by the detailed analysis of 21 Pseudomonas aeruginosa clone C isolates found 705

in environment and disease habitats. J. Mol. Biol. 271:386-404. 706

42. Sakellaris, H., S.N. Luck, K. Al-Hasani, K. Rajakumar, S.A. Turner, and B. Adler. 707

2004. Regulated site-specific recombination of the she pathogenicity island of Shigella 708

flexneri. Mol. Microbiol. 52:1329-1336. 709

43. Sato, H., D.W. Frank, C.J. Hillard, J.B. Feix, R.R. Pankhaniya, K. Moriyama, V. 710

Finck-Barbancon, A. Buchaklian, M. Lei, R.M. Long, J. Wiener-Kronish, and T. 711

Sawa. 2003. The mechanism of action of the Pseudomonas aeruginosa-encoded type III 712

cytotoxin, ExoU. EMBO J. 22:2959-2969. 713

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 32: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

32

44. Schubert, S., S. Dufke, J. Sorsa, and J. Heesemann. 2004. A novel integrative and 714

conjugative element (ICE) of Escherichia coli: the putative progenitor of the Yersinia 715

high-pathogenicity island. Mol. Microbiol. 51:837-848. 716

45. Sentchilo, V., R. Ravatn, C. Werlen, A.J. Zehnder, and J.R. van der Meer JR. 2003. 717

Unusual integrase gene expression on the clc genomic island in Pseudomonas sp. strain 718

B13. J. Bacteriol. 185:4530-4538. 719

46. Sentchilo, V., A.J. Zehnder, and J.R. van der Meer. 2003. Characterization of two 720

alternative promoters for integrase expression in the clc genomic island of Pseudomonas 721

sp. strain B13. Mol. Microbiol. 49:93-104. 722

47. Smith, E.E., D.G. Buckley, Z. Wu, C. Saenphimmachak, L.R. Hoffman, D.A. 723

D'Argenio, S.I. Miller, B.W. Ramsey, D.P. Speert, S.M. Moskowitz, J.L. Burns, R. 724

Kaul, and M.V. Olson. 2006. Genetic adaptation by Pseudomonas aeruginosa to the 725

airways of cystic fibrosis patients. Proc. Natl. Acad. Sci. U. S. A. 103:8487-8492. 726

48. Stanier, R.Y., N.J. Palleroni, and M. Doudoroff. 1966. The aerobic pseudomonads: a 727

taxonomic study. J. Gen. Microbiol. 43:159-271. 728

49. Stover, C. K., X.Q. Pham, A.L. Erwin, S.D. Mizoguchi, P. Warrener, M.J. Hickey, 729

F.S. Brinkman, W.O. Hufnagle, D.J. Kowalik, M. Lagrou, R.L. Garber, L. Goltry, E. 730

Tolentino, S. Westbrock-Wadman, Y. Yuan, L.L. Brody, S.N. Coulter, K.R. Folger, 731

A. Kas, K. Larbig, R. Lim, K. Smith, D. Spencer, G.K. Wong, Z. Wu, I.T. Paulsen, J. 732

Reizer, M.H. Saier, R.E.W. Hancock, S. Lory, and M.V. Olson. 2000. Complete 733

genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen. Nature 734

406:959-964. 735

50. Tümmler, B. 2006. Clonal variations in Pseudomonas aeruginosa, p. 35 – 68. In J.-L. 736

Ramos, and R.C. Levesque (ed.), Pseudomonas, Vol. 4, pp. 35-68. Springer, Heidelberg. 737

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 33: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

33

51. Ubeda, C., M.A. Tormo, C. Cucarella, P. Trotonda, T.J. Foster, I. Lasa, and J.R. 738

Penades. 2003. Sip, an integrase protein with excision, circularization and integration 739

activities, defines a new family of mobile Staphylococcus aureus pathogenicity islands. 740

Mol. Microbiol. 49:193-210. 741

52. Weinel, C., K.E. Nelson, and B. Tümmler. 2002. Global features of the Pseudomonas 742

putida KT2440 genome sequence. Environ. Microbiol. 4:809-818. 743

744

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 34: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

34

Figure Legends. 745

746

Figure 1. Schematic diagram of the position of ORF-derived PCR products on the PAGI-2 747

(A) and pKLC102 (B) macroarrays. A: PAGI-2. ORF C47 is represented twice (C47a, C47b) 748

by different PCR products. B: pKLC102. ORF CP103 is represented twice (CP103a, 749

CP103b), CP94 three times (CP94a, CP94b, CP94c) by different PCR products. Five (A) or 750

ten (B) positive or negative control dots were spotted in the lower left corner. 751

752

Figure 2. Tetranucleotide usage of the four P. aeruginosa genomic islands pKLC102, PAPI-753

1, PAGI-2 and PAGI-3. Local OU patterns were analyzed in 5 kb sliding windows with steps 754

of 0.5 kb. Curves of the distance D:n0_4mer, pattern skew PS:n0_4mer and oligonucleotide 755

variance OUV:n1_4mer are specified by color code: blue for D, green for PS and brown for 756

OUV. Protein coding genes are shown by red bars. The abscissa separates genes by their 757

direction of transcription. Tetranucleotide usage of the genomic islands is significantly 758

different from that of the whole chromosome: The median (inner quartile) values of local 759

tetranucleotide patterns in the whole P. aeruginosa PAO1 chromosome are 13.9 (12.3 – 16.0) 760

for D:n0_4mer, 21.4 (17.9 – 25.6) for PS:n0_4mer and 0.37 (0.32 – 0.43) for OUV:n1_4mer. 761

762

Figure 3. Combinatorial PCR analysis of integrated and episomal versions of genomic islands 763

PAPI-1 in strain P. aeruginosa PA14 and of pKLC102 in P. aeruginosa SG17M. An aliquot 764

from an exponentially growing culture was inoculated into 100 mL fresh medium adjusted to 765

an optical density OD578 of 0.2. Samples were then taken from the growing culture (from left 766

to right) at OD578 0.9, 1.3, 2.0, 2.9, 4.0 and after 24 h (left panel) or at OD578 0.9, 1.3, 2.0, 4.0 767

and after 24 h (right panel). Bacteria were growing aerobically in 250 mL flasks in liquid LB 768

medium at 37 °C at a mixing frequency of 250 rpm. Chromosome-integrated islands were 769

detected by PCR products spanning the 5‘ tRNA (il) or 3‘ tRNA integration sites (ir) utilizing 770

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 35: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

35

PA14- and PAPI-1- or SG17M- and pKLC102- derived primer sequences. Circularized 771

episomal forms (ce) were identified by PCR products spanning the breakpoints in PAPI-1 or 772

pKLC102. PA14 or SG17M chromosomes (fa) devoid of PAPI-1 or pKLC102 were detected 773

by PCR products spanning the tRNALys

gene adjacent to PAO1 homolog PA4541. PCR 774

kinetics were performed with 50 ng P. aeruginosa DNA in 50 µL reaction mixture. Aliquots 775

of 5 µL were withdrawn at the indicated cycles, separated by electrophoresis and stained with 776

ethidium bromide. 777

778

Figure 4. Examples of PAGI-2 (upper panel) and pKLC102 (lower panel) subtypes 779

macroarray hybridization patterns. PAGI-2 macroarrays: A: strain PAO (DSM1707) (negative 780

control); B: strain C (positive control); C: strain 7 (subtype G1b); D: strain 3 (subtype G2a); 781

E: strain 54 (subtype G2c); F: strain 63 (subtype G4). pKLC102 macroarrays: G: strain PAO 782

(DSM1707) (negative control); H: strain SG17M (positive control); I: strain 6 (subtype K1c); 783

J: strain 10 (subtype K3c); K: strain 36 (subtype K3d); L: strain 53 (subtype K4). 784

785

Figure 5. Summary of macroarray hybridization data of 31 PAGI-2 type (A) and 50 786

pKLC102 type (B) positive P. aeruginosa strains. Colors indicate the percentage of island-787

positive strains with a hybridization signal for the respective ORF. Black: ≥ 96 % of strains 788

positive, dark grey: 90 - 95 % positive, light grey: 50 - 89 % positive, white: < 50 % positive. 789

790

Figure 6. Relatedness of macroarray hybridization patterns of 55 PAGI-2 and/or pKLC102 -791

positive P. aeruginosa strains. The unrooted tree is based on the parsimony analysis 792

(“phylip3.66”) of the hybridization data. 793

794

Figure 7. Loss of PAGI-2 type islands in sequential P. aeruginosa airway isolates from 795

patients with cystic fibrosis. Upper panel: PAGI-2 macroarray hybridization patterns of clone 796

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 36: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

36

C strains SG1 (A) and SG3 (B) indicating the loss of PAGI-2 in the later isolate SG3 whilst 797

another PAGI-2 subtype being retained. SG1 (strain C) was isolated from the patient‘s first P. 798

aeruginosa – positive sputum specimen; SG3 is the sixth isolate collected two years later. 799

Lower panel: PAGI-2 macroarray hybridization patterns of clone C strains NN18 (C) and 800

NN86 (D) indicating the loss of PAGI-2 type island(s) in strain NN86 that was isolated from 801

the patient‘s last clone C – positive culture 17 years after the acquisition of clone C. 802

803

Figure 8. PAGI-2 macroarray hybridization patterns of Cupriavidus strains C. campinensis 804

AE2701 (A) and C. metallidurans CH79 (B). 805

806

Figure 9. Similarity of pKLC102 type genomic islands in proteobacteria based on distance of 807

oligonucleotide usage. The distance D:n0_4mer of tetranucleotides was calculated for each 808

genomic island. The obtained matrix of D values was sorted for the degree of evolutionary 809

relationships between the genomic islands by the Fitch-Margoliash criterion assuming a 810

constant molecular clock and least squares methods using the KITSCH programme of the 811

Phylip library (11). 812

813

Figure 10. Pattern skew of pKLC102 type genomic islands (squares) and their corresponding 814

chromosomes (triangles). Pattern skew values (n0_4mer PS) are plotted against the 815

logarithmic scale of sequence lengths. The grey shaded area depicts the 95% confidence 816

intervals of variation of n0_4mer PS values in 155 completely sequenced bacterial 817

chromosomes and 316 plasmids (37). The PS values of n0_4mer patterns of bacterial 818

chromosomes are typically in the range of 1 to 8%. Outliers in the investigated panel are 819

Haemophilus ducreyi 35000HP with a PS value of about 9% and Xylella fastidiosa 9a5c with 820

an extreme value of 24.3%. n0_4mer PS values of pKLC102 type genomic islands exceed the 821

95% confidence interval. Genomic islands are identified by their host strain, the name of the 822

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 37: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

37

island is given in brackets, if available: 1, Azoarcus sp. EbN1; 2, Erwinia carotovora subsp. 823

atroseptica SCRI1043; 3, Haemophilus ducreyi 3500HP; 4, Haemophilus influenzae 86-824

028NP (ICEHin-like); 5, Haemophilus somnus 129PT; 6, Methylobium petroleophilum PM1; 825

7, Photorhabdus luminescens TT01; 8, P. aeruginosa C (pKLC102); 9, P. aeruginosa C 826

(PAGI-2); 10, P. aeruginosa PA14 (PAPI-1); 11, P. aeruginosa SG17M (PAGI-3); 12, 827

Pseudomonas fluorescens Pf-5; 13, Pseudomonas syringae pv. syringae B728a; 14, 828

Salmonella enterica subsp. enterica sv. Typhi CT18 (SPI-7); 15, Xylella fastidiosa 9a5c; 16, 829

Yersinia enterocolitica 8081; 17, H. influenzae 1056.b (ICEHin 1056); 18, Neisseria 830

gonorrhoeae MS11 (GGI); 19, Nitrosomonas eutropha C71; 20, Pseudomonas putida RR21 831

(clc-transposon); 21, P. syringae pv. phaseolicola 1302A (PPHG-1); 22, Yersinia 832

pseudotuberculosis 32777 (YAPI). 833

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 38: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

38

Tables 834

835

Table 1. Description of strains 836

837

Pseudomonas aeruginosa collection 838

839

no. name isolation source SNP-genotype* 840

841

1 ATCC 10145 Prague, Czech Republic (≤1960) unknown 46BA 842

2 ATCC 14886 Osaka, Japan (≤1958) soil EC2A (26; 55) 843

3 ATCC 15522 USA (≤ 1967) soil 481A 844

4 ATCC 15691 Melbourne, Australia (1952) burn wound 7D9A (13) 845

5 ATCC 21472 Japan (≤1973) oil field soil 3412 (6) 846

6 ATCC 21776 Japan (≤1974) soil 3412 (5) 847

7 ATCC 33348 Bonn, Germany (≤1957) human infection 2C1A (42; 54) 848

8 ATCC 33356 Heidelberg, Germany (≤1955) human faeces CD9E 849

9 ATCC 33364 (≤1978) human infection E42A 850

10 ATCC 33818 unknown Agaricus bisporus 6CA2 851

11 ATCC 33988 Ponca City (Ok.), USA fuel tank 6C22 852

12 63741 Hannover, Germany (1990) burn wound 3C52 (33; 62) 853

13 A 5670 Heidelberg, Germany (1992) wound 7D9A (4) 854

14 A 5803 Heidelberg, Germany (1992) trachea F429 855

15 AL 5846 Heidelberg, Germany (1992) wound D429 (39; 56) 856

16 2733/92 Copenhagen, Denmark (1992) CF-patient 3C2A 857

17 2813 A/92 Copenhagen, Denmark (1992) CF-patient 4012 (37) 858

18 BST1 Hannover, Germany (1985) CF-patient E469 859

19 KB1 Sarstedt, Germany (1985) CF-patient 059A 860

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 39: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

39

20 SS1 Lueneburg, Germany (1985) CF-patient 6D92 (60; 65) 861

21 MF6 Bremen, Germany (1987) CF-patient AC9A 862

22 PD1 Hannover, Germany (1985) CF-patient E59A 863

23 RN4 Oldenburg, Germany (1986) CF-patient D421 (41; 44; 51) 864

24 RP1 Hannover, Germany (1985) CF-patient 0C2E (38; 64) 865

25 Va 24437 Halle, Germany (1992) CF-patient 3C51 866

26 Va 26232 Halle, Germany (1992) CF-patient EC2A (2; 55) 867

27 Va 27081 Halle, Germany (1992) CF-patient 081E 868

28 Va 27260 Halle, Germany (1992) CF-patient 239A 869

29 DM Hamburg, Germany (1984) CF-patient E84A 870

30 Zw 30 Innsbruck, Austria (1997) CF-patient B420 871

31 Zw 31 Innsbruck, Austria (1997) CF-patient AC2E 872

32 Zw 41 Verona, Italy (1997) CF-patient 0192 873

33 Zw 43 Genova, Italy (1997) CF-patient 3C52 (12; 62) 874

34 Zw 49 Verona, Italy (1997) CF-patient A5AA 875

35 Zw 54 Milan, Italy (1997) CF-patient 6C12 876

36 Zw 64 Lund, Sweden (1997) CF-patient 279A 877

37 Zw 77 London, Great Britain (1997) CF-patient 4012 (17) 878

38 Zw 79 Galway, Ireland (1997) CF-patient 0C2E (24; 64) 879

39 Zw 81 London, Great Britain (1997) CF-patient D429 (15; 56) 880

40 Zw 83 London, Great Britain (1997) CF-patient 6E12 (46) 881

41 Zw 85 Aberdeen, Great Britain (1997) CF-patient D421 (23; 44; 51) 882

42 Zw 88 London, Great Britain (1997) CF-patient 2C1A (7; 54) 883

43 Zw 92 Marseille, France (1997) CF-patient EC22 884

44 Zw 98 The Hague, Netherlands (1997) CF-patient D421 (23; 41; 51) 885

45 Zw 102 Leuven, Belgium (1997) CF-patient 2E12 886

46 Zw 113 Rotterdam, Netherlands (1997) CF-patient 6E12 (40) 887

47 Zw 117 Vienna, Austria (1997) CF-patient 0812 888

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 40: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

40

48 Zw 119 Poznań, Poland (1997) CF-patient F469 889

49 SG1 (=C) Bueckeburg, Germany (1986) CF-patient C40A (50) 890

50 SG31 (=SG17M) Muelheim, Germany (1993) river C40A (49) 891

51 PT2 Muelheim, Germany (1992) water D421 (23; 41; 44) 892

52 PT6 Muelheim, Germany (1992) water 2992 893

53 PT12 Muelheim, Germany (1992) water F419 894

54 PT20 Muelheim, Germany (1992) water 2C1A (7; 42) 895

55 PT22 Muelheim, Germany (1992) water EC2A (2; 26) 896

56 PT36 Muelheim, Germany (1992) water D429 (15; 39) 897

57 641HD11 Muelheim, Germany (1992) water 249A 898

58 Gr 2052 Athens, Greece (1995) clinic 2C92 (59) 899

59 Gr 2057 Athens, Greece (1995) clinic 2C92 (58) 900

60 Gr 2248 Athens, Greece (1995) clinic 6D92 (20; 65) 901

61 PAO-DSM 1707 Melbourne, Australia (<1955) burn wound 0002 902

62 892 Hannover, Germany (1983) CF-patient 3C52 (12; 33) 903

63 PAK Japan (≤1960) unknown 55AA 904

64 HJ2 Cologne, Germany (1990) CF-patient 0C2E (24; 38) 905

65 G7 Stade, Germany (1986) CF-patient 6D92 (20; 60) 906

66 H2 unknown clinic (catheter) 241A 907

67 K9 Husum, Germany (1985) CF-patient 1BAE 908

68 DSM 288 Goettingen, Germany (1990) hygiene institute 0B92 (71) 909

69 DSM 939 USA (≤1981) animal room water bottle 049A 910

70 DSM 1128 USA (1980) ear infection EC38 911

71 DSM 1253 Stanford (Ca.), USA (≤1949) burn wound 0B92 (68) 912

913

* SNP-genotype defined by 13 SNPs and 3 additional markers, given by a code of four hexadecimal 914

digits (for description see supplement, table S1); the numbers in brackets indicate strains sharing an 915

identical genotype 916

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 41: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

41

917

918

Reference strains 919

920

P. aeruginosa PA14 USA (≤1995) burn wound D421 921

P. aeruginosa TB Hannover, Germany (1983) CF-patient 3C52 922

P. putida KT2440 Minoh City, Japan (1961) TOL-plasmid cured 923

derivative of soil isolate mt-2 924

925

926

Cupriavidus strains 927

928

C. metallidurans CH34 Liège, Belgium (1976) decantation tank, zinc factory 929

C. metallidurans CH42 Liège, Belgium (1976) zinc factory 930

C. metallidurans CH79 Liège, Belgium (1976) zinc factory 931

C. metallidurans KT01 Goettingen, Germany (≤1987) waste water 932

C. metallidurans KT02 Goettingen, Germany (1984) sewage treatment plant 933

C. metallidurans KT21 Goettingen, Germany (≤1987) sewage treatment plant 934

C. campinensis AE2700 Leadville (Co.) USA (≤2002) unknown 935

C. campinensis AE2701 Leadville (Co.) USA (≤2002) unknown 936

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 42: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

42

Table 2. ORFs of the phage and plasmid modules of the genomic island pKLC102 937

938

ORF no.

or feature

gene

name

ORF / protein

length [bp / aa]

putative

product

PAGI-2

homolog

homolog product1

GenBank

accession no. 1

E value1

CP1 soj 885 / 294 chromosome

partitioning-

related protein

C108 chromosome

partitioning related

protein PA14_58910 (P.

aeruginosa UCBPP-

PA14)

YP_792889

4E-142

CP9 dnaB 1278 / 425 replicative

DNA helicase

- replicative DNA

helicase

Paer2_01005575 (P.

aeruginosa 2192)

ZP_00970992

0

CP16 255 / 84 DNA binding

protein

- hypothetical protein

PA14_59060 (P.

aeruginosa UCBPP-

PA14)

YP_792903

1E-23

CP172

1734 / 577 ParB-like

nuclease

C107 hypothetical protein

PA14_59070 (P.

aeruginosa UCBPP-

PA14)

YP_792904

0

CP18 756 / 251 conserved

hypothetical

protein

C106 hypothetical protein

PaerP_01000019 (P.

aeruginosa PA7)

ZP_01297689

2E-118

oriV3

1647

CP19 729 / 242 conserved

hypothetical

protein

C104 hypothetical protein

PA14_59130 (P.

aeruginosa UCBPP-

PA14)

YP_792909

2E-122

CP202

549 / 182 conserved

hypothetical

protein

C103 hypothetical protein

PA14_59140 (P.

aeruginosa UCBPP-

PA14)

YP_792910 2E-71

CP22 ssb 489 / 162 single stranded

DNA binding

protein

C102 single stranded DNA

binding protein

PaerC_01005119 (P.

aeruginosa C3719)

ZP_00965569 2E-83

CP27 topA 1920 / 639 topoisomerase I C101 DNA topoisomerase I

(P. aeruginosa CF005)

AAR01278 0

CP33-

CP42

pilLN

OPQ

RSU

VM

10643 sex pilus

biogenesis

cluster

- type IVB pilus proteins

(P. aeruginosa UCBPP-

PA14)

CP000438 0

CP564

2256 / 751 helicase C71 DNA/RNA helicase

Paer2_01005538 (P.

aeruginosa 2192)

ZP_00971322

0

CP675

2232 / 743 TraG-/TraD-

like

conjugation

protein

C65 hypothetical protein

PaerP_01000052 (P.

aeruginosa PA7)

ZP_01297722

0

CP102 1920 / 639 TraI-like

conjugative

relaxase

C36 hypothetical protein

EXA2 (P. aeruginosa

6077)

ABD94612

0

CP103a xerC 1284 / 427 phage-like

integrase

-6 putative integrase

EXA1a (P. aeruginosa

6077)

ABD94670

0

939 1 closest homolog according to PSI- and PHI-BLAST search; copies of pKLC102 were not 940

considered 941

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 43: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

43

2 homologs of CP17 and CP20 in the clc element are involved in the regulation of the 942

expression of the phage P4 type integrase (45, 46). 943 3 no oriV-like structure in PAGI-2 944

4 the ORF contig CP46 – CP56 is highly conserved in pKLC102 and PAGI-2 type islands 945

5 the ORF contig CP64 – CP68 is highly conserved in pKLC102 and PAGI-2 type islands 946

6 in PAGI-2 type islands a bacteriophage P4 type integrase (25, 45, 46) is encoded at this site 947

948

949

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 44: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

44

Table 3. PAGI-2 macroarray hybridization patterns of island-positive strains 950

951

PAGI-2- strains

ORF 3 7 9 14 15 16 21 22 23 24 25 26 29 33 35 45 46 48 49 50 52 53 54 55 56 60 62 63 64 67 70

C1 x x x x x x x x x ? x x x x x x x x x ? x ? x x ? x x x x x

C2 x x ? x x

C4 x x ? ? x ? x ? ? x x x x x ? x

C5 x x x x x

C6 x x x x x

C7 x x x x x

C8 x x x x x

C10 x x x x x

C12 x x x x x

C13 x x x x x

C14 x x x x x

C18 x x x x x

C20 x x x x x

C21 x x x x x

C22 x x x x x

C23 x x x x x

C25 x x x x x

C26 x x x x x

C27 x x x x x

C29 x x x x x

C30 x x x x x

C31 x x x x x

C32 x x x x x

C33 x x x x x

C34 x x x x x

C35 x x x x x

C36 x ? x ? ? x ? x x ? x ? ? x ? x ? x x ? ? ? ? x

C37 x x x x x x x x x x x x x x x x x x x ? x x ? x x x x x

C38 x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C39 x x x x x x x ? x x ? x x ? x x x x ? ? x x x x ? x ? x x x

C40 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C42 x x x x x x x x x x x x x x x x x x x x x x x x x x x ? x x x

C43 x x x x x x x x x x x x x x x x x x x x x x x x x x x ? x x x

C44 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C45 x x x x ? x x x x x x x x x x x x x x x x x x x x x x ? x x

C46 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C47 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C49 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C50 x x x x x x x x x x x x x x x x x x x x x x x x x x x ? x x x

C51 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C52 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C54 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C55 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C56 ? x x x x x x x x x x x x x x x x x x x x x x x x

C57 x x x x x x x x x x x x x ? x x x x x x x x x x

C58 x x x x x x x x x x x x x x x x x x x x x x x

C59 x x x x x x x x x x x x x x x x x x x x x x x

C61 x x x x x x x x x x x x x x x x x x x x x x x

C62 x x x x x x x x x x x x x x x x x x x x x x x

C63 x x x x x x x ? x x x x x ? ? x ? ? x x ? x x x

C64 x x x x x x x x x x x x x x x x x x x x x x x x x x x ? x x x

C65 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C66 ? x x x ? x x x x x x x x x x x x x x x x x ? x x ? x ? x x

C67 x x x x ? x x x x x x ? x ? x x x x x x x x ? x x ? x ? x x

C70 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C71 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C72 x x x x x x x x x ? x x x x x x x x x ? x x x x ? x x x x x

C73 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C74 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C75 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C76 x x ? x x x x x x x x x x x x x x x x x x x x x x x x x

C77 x x ? x x x x x x x x x x x x x x x x x x x x x x x x x

C78 x x x x x x x x x x x x x x ? ? x x x x x x x x ? x x x x x

C79 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C80 x x x x x x x x x x x x x x x x x x x x x x ? x x x x

C81 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C82 x x ? x x x x ? ? x x x x x x x x ? ? x x

C83 x x ? x x x x ? ? x x x x x x x x ? ? x x

C84 x x x ? x x x ? ? x x ? x x x x x x x x x x x x x x x ? x x x

C85 x x x x x x x x x x x x x x x x x x x x x x x x x ? x ? ? x x

C89 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C90 x x x x x x x x x x ? x x x x x x x x x x x x x x x x x x x x

C91 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C92 x x ? x x x x ? x x x x x x x x x x x ? x ? x x

C93 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C94 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C95 x x x x x x x x x x x x x x x x x x x ? x x x x x x x x ? x x

C96 x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C97 x x x x x x x x x x x x x x x x x x ? x x x x x x x x x x

C98 x x x x x x x x x x x x x x x x x x ? x x x x ? x x ? x x

C99 x x x x x x x ? x x x x x x x ? x x ? x x x x ? x x x ? x x

C100 ? x x x x x x x x x x x x x x x x x x x x x x x x x x

C101 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C102 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C103 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C104 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C105 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C106 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C107 x x x x ? x x x x x x x x x x x x x x x x x x x x x x x x x

C108 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

C110 x x x x x x x ? x x x x x x x x x x x x x x ? ? x x x

C111 x x x x x x x x x x x x x x x x x x x x x x ? ? x ? x x

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 45: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

45

Table 4. pKLC102 macroarray hybridization patterns of island-positive strains 952

pKLC- strains

ORF 2 3 4 5 6 7 9 10 12 13 14 15 16 19 20 21 22 23 24 25 28 29 31 33 35 36 37 39 41 44 45 46 49 50 51 52 53 54 55 56 58 59 60 62 65 66 68 69 70 71

CP1 x x x x x ? x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP2 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP3 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP4 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x xCP5 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP6 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP7 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP8 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP9 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP10 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP11 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP12 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP17 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP18 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

oriV x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP19 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP20 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP21 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP22 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x xCP25 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP27 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP28 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP30 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP32 x x x x x x x x x

CP33 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP34 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP37 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP39 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP40 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP41 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP42 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP43 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP44 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP46 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x xCP47 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP48 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP49 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP50 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP51 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP52 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP53 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP54 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP55 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP56 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP57 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP58 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP59 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP60 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP62 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP63 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x xCP64 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP65 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP66 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP67 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP68 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP69 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP70 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP73 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP74 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP75 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP76 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP77 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP78 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP79 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP80 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x xCP81 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP83 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP84 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP85 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP86 x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP87 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP88 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP89 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP90 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP91 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP92 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP93 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP94 x x x x x x x x x x x x x

CP96 x x x x x x x x x x x x x x

CP97 x x x x x x x x x x xCP98 x x x x x x x x x x x x x x

CP99 x x x x x x x x x x x x x x x x x x x

CP100 x x x x x x x x x x x x x x x x x x x

CP101 x x x x x x x x x x x x x x x

CP102 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CP103 x ? ? x x ? x x x x x x x x x x x x x x x x ? ? x ? x x x ? x x x x ? x x x x x x x x x x x

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 46: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 47: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 48: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 49: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 50: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 51: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 52: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 53: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 54: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 55: 3 Diversity of the abundant pKLC102 / PAGI-2 family …...4 71 bacteriophage P4-related multidomain integrase. PAGI-2 and PAGI-3 have a modular 72 bipartite structure. The first part

ACCEPTED

on February 11, 2020 by guest

http://jb.asm.org/

Dow

nloaded from