28
Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics 27011 Technical University of Denmark, Center for Biological Sequence Analysis Authors: Susanne Schjørring, Alfredo Ramos, Helene Faustrup and Peter F. Hallin, Supervisor: David W. Ussery Lyngby, 14 of May 2002 th

Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Embed Size (px)

Citation preview

Page 1: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Pathogenic investigation of

and Staphylococcus epidermis

Staphylococcus

aureus

Introduction to Bioinformatics 27011

Technical University of Denmark,

Center for Biological Sequence Analysis

Authors:

Susanne Schjørring, Alfredo Ramos, Helene Faustrup and Peter F. Hallin,

Supervisor:

David W. Ussery

Lyngby, 14 of May 2002

th

Page 2: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics
Page 3: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

PrefaceThis project is elaborated as an ending part of the course "Introduction to Bioinformatics" 27011 atthe Centre for Biological Sequence Analysis, at the Technical University of Denmark.

We would like to express our gratitude to professor David Ussery, who supported us during theconstruction of this project. Furthermore we would like to thank Peder Worning for his assistance inmeasuring the point of origin and in addition the rest at CBS for assistance during the project.

Lyngby, 14th of May 2002

Susanne Schjø[email protected]

Helene [email protected]

Alfredo Ramos,[email protected]

Peter Fischer [email protected]

Page 4: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

AbstractFive strains of Staphylococcus aureus, N315, Mu50, MRSA, COL andNCTC8325 (fromwhichN315 andMu50 are annotated) and one strain of Staphylococcus epidermis, RP62A have been investigated for theirpathogenic properties. Virulent genes from the known pathogen S. aureusN315 have been divided intosix groups according to their pathogenic function. It has been shown, that the majority of theinvestigated genes from N315 are conserved in all of the S. aureus strains, while S. epidermis lack manyof these genes. Some of the genes within the three known pathogenic islands of the N315 are includedin this study. Although most of the cluster genes are conserved as comparable islands, a few genes arefound to be located elsewhere in the genome. Only the N315 and Mu50 strain are found to containthe gene, tst, associated with toxic shock syndrome.

Custom genome atlases have been used to investigate characteristics near the virulent genes. Althoughonly a few things have been concluded from these atlases constructed, some characteristic propertiessuch as GC-skew, AT content and repeats have shown to display variations inside or around some ofthe genes investigated.

Page 5: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

Page 3

Table of content

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 4

2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 52.1 About Staphylococcus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 52.2 Pathogenicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 5

2.2.1 Toxins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 62.2.2 Adhesins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 8

2.3 Antibiotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 92.4 Resistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 92.5 Horizontal transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 10

2.5.1 Mobile elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 102.5.2 Gene transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 11

3 Data selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 123.1 Defining the investigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 123.2 Finding the genes in the non-annotated genomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 123.3 Sorting the BLAST results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 14

4 Atlas study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 174.1 About the genome Atlas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 17

4.1.1 Repeats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 174.1.2 Base composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 174.1.3 Structural parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 18

5 Investigation of gene clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 195.1 The enterotoxic island . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 195.2 The exotoxic island . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 195.3 The toxic-shock syndrome toxin 1 island . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 205.4 Investigation of resistance clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 205.5 Investigation of adhesins cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 21

6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 22

7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 23

8 Appendix list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 24

Page 6: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

1Blast is a very used algorithm for aligning DNA sequences.

Page 4

1 IntroductionInfections of microorganisms are still a common problem in the civilised part of the world. One ofthe most important and widespread pathogen at hospitals is Staphylococcus aureus. [Kuroda et al. 2001]The bacteria are normally related to surgical wound infections but is also the cause of other seriousdiseases. Many strains of S. aureus are unusually virulent and resistant to common antibiotics. Anotherspecie included in this genus is Staphylococcus epidermis. The bacteria are not as virulent as S. aureus andis normally not pathogen. In this project we have chosen five strains of S. aureus and one S. epidermisto compare their pathogenicity and resistance against different antibiotics.

The main goal of our project is to screen the pathogenic and resistance genes by BLAST alignment1

of the selected S. aureus and S. epidermis strains. In order to visualise the blast search results, genomeatlases were constructed for each bacterial chromosome. Therefore an overview of the location ofdesired genes can be displayed. Many features of the DNA can be introduced into the atlases.Furthermore the atlases will illustrate the relation between the structure of the DNA and the locationof the genes.

Page 7: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

2Inflammation in the bone marrow

3Inflammation in the membranes of brain and bone marrow

Page 5

2 Theory2.1 About StaphylococcusThe Staphylococcus genus belongs to the group of aerobic and facultative non-sporulatingGram-positivecocci. The bacteria are 0,5 - 1,5 :m in length, non motile and divide in several plans forming clumpslike grapes. Staphylococci produce acid from glucose both from aerobic and anaerobic metabolim andhave no specific nutrient requirement. Staphylococci are naturally observed as part of the normal bacteriaflora on normal skin and mucous membranes. These bacteria strongly resist dehydration and aredispersed through the air and transferred even by hand contact. There are about 20 differentStaphylococci species and themost important are: S. auresus, S. epidermidis and S. saprophyticus. S. auresus andS. epidermidis are the most important in the human flora and will thus be embraced in this project. TheStaphylococcus genus contains common pathogens including S. aureus. [Høiby et al. 1998], [Madigan et al.2000]

S. aureus is a yellow-pigmented coccus, which is catalase and coagulase positive. S. aureus is observedboth as pathogenic and non-pathogenic strain. S. aureus are known to have an important influence tothe following diseases: Boils, pimples, respiratory infections (pneumonia), osteomylitis2, meningitis3,arthritis, scalded skin syndrome, toxic shock syndrome (TSS) and food poisoning. Some plasmids inS. aureus are known to carry heavy metal resistance. [Høiby et al. 1998], [Madigan et al. 2000]

S. epidermidis is a coagulase negative white coccus, which can be pathogenic in some cases, althoughmost often is harmless. Some of the S. epidermis strains are capable of producing a biofilm ofpolysaccharides that protect the cell against host immunosystem or antibiotic treatment. This is aproblem when a foreign element is introduced into the host body, such as heart vessels or insertion ofa catheter. But an infection by S. epidermis is often lacking symptoms due to the absence of toxins.[Høiby et al. 1998], [Madigan et al. 2000]

2.2 PathogenicityPathogenicity is the ability of a microorganism to initiate disease, and includes entry, colonisation andgrowth of the microorganism inside the host. Many microorganism have little or no obvious effect onthe host. Others like S. aureus can cause serious diseases. [Madigan et al. 2000], [Todar (2). 2002]

Microorganismmust usually gain access into the host tissues and multiply before damage can be done.Most bacteria penetrates through wounds, mucous membrane or intestinal epithelium.The access to the host begins with the adhere of the microorganism to the epithelium. Somemicroorganism synthesize a capsule, which is a polymer of two major peptidoglycanes that serves asa coat closely surrounding the cell. The function of the layer is to protect the bacteria from the hostdefence systemsmacrophages and other cell causing phagocytosis. The invasions of the microorganism

Page 8: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

Page 6

starts at the site of adhere and might spread to other regions of the host through the blood and lymphsystem. After adhering, the microorganism multiplies - a process called colonisation. [Madigan et al.2000], [Todar (2). 2002]

During growth the microorganism parasites on the host by digesting the surrounding tissue to getnutrients. The host defence system often form a fibrin clot surrounding the site of invasion of themicroorganism to prevent it to spread through the body. In contrast to many other parasites S. aureuspromotes the fibrin clotting by the enzyme coagulase to protect it from the host defence system, andmakes localised invasion. During growth the microorganism release toxins which result in cell damageand diseases of the host. [Madigan et al. 2000], [Todar (2). 2002]

A few pathogenic bacteria do not invade the host but only produce toxins. In this project theexoenzymes, adhesins and toxins, also called virulence factors, which contribute to the ability of themicroorganism to colonise and cause infection will be examined.

2.2.1 ToxinsS. aureus is characterized by its production of a wide variety of extracellular substances. It is certainlythe versatility of S. aureus, both in ability to adapt to the host and in expression of toxins and enzymes,that has played an important role in causing a variety of diseases. Among the myriad products secretedby S. aureus are exoenzymes and toxins that may contribute significantly to its virulence. To invade thehost S. aureus also produce adhesins and other exoenzymes that facilitates the colonisation.

It can be difficult to differentiate which products should be considered virulence factors and whichones are only derivative proteins of the bacteria life-style. In discussions of these enzymes and toxins,it is possible to divide them into twomajor categories the exoenzymes and toxins. The exoenzymes are"nutrient providers" such as lipases, nucleases, and proteases, and the toxins are molecules directlyinvolved in the pathogenesis such as exotoxins and enterotoxins. Some extracellular proteins, such asalpha-toxin ("-hemolysin), may belong in either category. A good approach for classifying virulencefactors is given by Wassenaar. [Wassenaar et al. 2001]

In order to classify the toxins produced by S. aureuswe have divided them by function of their pursuantdisease (See table 1). Thus we obtained exotoxins (that causes general diseases in host tissues) includingleukotoxins and hemolysins (which can cause destruction of blood cells and immunologycal diseases),TSS toxins (associated to the Toxic Shock Syndrome) and the enterotoxins (related to food poisoning).

Page 9: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

Page 7

ORF Gene nameExotoxinsconserved hypothetical protein, similar to diarrheal toxin SA0276hypothetical protein, similar to exotoxin 2 SA0357exotoxin 6 SA0382 set6exotoxin 7 SA0383 set7exotoxin 8 SA0384 set8exotoxin 9 SA0385 set9exotoxin 10 SA0386 set10exotoxin 11 SA0387 set11exotoxin 12 SA0388 set12exotoxin 13 SA0389 set13exotoxin 14 SA0390 set14exotoxin 15 SA0393 set15hypothetical protein, similar to exotoxin 1 SA1009hypothetical protein, similar to exotoxin 4 SA1010hypothetical protein, similar to exotoxin 3 SA1011hypothetical protein, similar to hemolysin homologue SA0657hypothetical protein, similar to hemolysin SA0780Alpha-Hemolysin precursor SA1007hypothetical protein, similar to enterotoxin A precursor SA1430leukotoxin, LukD [Pathogenicity island SaPIn3] SA1637 lukDleukotoxin LukE SA1638 lukEhypothetical protein, similar to leukocidin chain lukM SA1813hypothetical protein, similar to hemolysin III SA1973gamma-hemolysin chain II precursor SA2207 hlgAgamma-hemolysin component C SA2208 hlgCgamma-hemolysin component B SA2209 hlgBdelta-hemolysin SAS065 hldTSS toxinstoxic shock syndrome toxin-1 SA1819 tstEnterotoxinsextracellular enterotoxin type G precursor SA1642 segenterotoxin SeN SA1643 senenterotoxin YENT2 SA1644 yent2enterotoxin Yent1 SA1645 yent1extracellular enterotoxin type I precursor SA1646 seienterotoxin SEM SA1647 sementerotoxin SeO SA1648 seoenterotoxin P SA1761 sepextracellular enterotoxin L SA1816 selenterotoxin typeC3 SA1817 sec3

Table 1: Toxins produced by Staphylococcus

ExotoxinsLeukocidins produced by S. aureus are cytotoxic to different leukocytes and macrophages are calledleukocidal toxins or leukotoxins. These toxins are clearly distinct from various cytotoxins that possesscytotoxicity not only to phagocytes but also to other cells. Leukocidins destroy the leukocytes by rapiddegradation of the cell membrane, resulting in cell lysis. This ability of S. aureus to escape phagocytosisgives leukocidins importance in the pathogenicity. [Moss et al. 1995], [Madigan et al. 2000]

Hemolysins are other membrane-damaging molecules. The hemolysins are well-known toxins whichtargets blood cell but also other cells. The hemolysins are damaging the cytoplasmicmembrane causingcell lysis and hence cell death. The best-characterized and most potent membrane-damaging toxin of S. aureus is alpha-hemolysin. Inhumans, platelets andmonocytes are particularly sensitive to"-toxin, which causes cell lysis and realiseof cytokines that trigger production of inflammatory mediators. These events cause the symptoms ofseptic shock that occur during severe infections caused by S. aureus. [Todar (1). 2002], [Madigan et al.2000]ß-hemolysin or ß-toxin is an sphingomyelinase that damagesmembranes rich in this lipid. Themajorityof human isolates of S. aureus do not express ß-toxin. A lysogenic bacteriophage is known to encodethe toxin. *-toxin is a very small peptide toxin produced by most strains of S aureus. It is also producedby S. epidermidis. The role of *-toxin in disease is unknown. [Todar (1). 2002]

Page 10: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

Page 8

Toxic Shock SyndromeToxic shock syndrome (TSS) is an acute-onset multisystem illness characterized by fever, hypotension,dizziness, scarlatiniform rash, vomitting and ocassionally death.

One major exotoxin has been associated with TSS. This is a superantigen called toxic shock syndrometoxin-1 (TSST-1) that is able to bind simultaneously to an MHC class II and to a non-specific regionover a T-cell receptor (see figure 1). Unlike regular antigens, antigen-presenting cells do not processsuperantigens. Therefore superantigens can activate a large number of T cells and in this way producean anomalous amount of cytokines, which provoke reactions that include fever, widespread bloodclotting and shock. [Glosdby 2000]

Figure 1: Superantigens and non-specific stimulation of T-cells [Todar 2002]

Toxic shock syndrome was first recognised in menstruating women and associated with the use oftampons. But reports of the appearance of nonmenstrual-associated TSS cases became prevalent in1985. Both menstrually and nonmenstrually associated TSS have the characteristic symptoms andlocalized S. aureus infections, but apparently there are some differences in the toxins produced by theisolates. Also men and children can suffer TSS infections. [Moss et al. 1995]

EnterotoxinsS. aureus produces enterotoxins with superantigen property which can cause food poisoning whenreleased into food. Ingested enterotoxins stimulates T cell localised along the small intestine causinga massive T-cell response leads to changes in the permeabilitity resulting a massive secretion of fluidinto the intestine causing diarrhea and vomiting. When expressed systemically, enterotoxins can alsocause toxic shock syndrome. [Todar (1). 2002], [Madigan et al. 2000]

2.2.2 AdhesinsThe attach of S. aureus at an eukaryotic cell or cell tissue surface requires the participation of twofactors. These two factors are one adhesin on the bacteria and one receptor at the cell surface. Adhesinsusually interact with the receptors in a complementary and specific bonding. Adhesins in S. aureus are cell surface bound proteins and the receptors are usually a specific peptide orcarbonhydrate residue. S. aureus normally binds to the amino terminus of fibronectin. [Madigan et al.2000], [Todar 2. 2002]

Page 11: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

Page 9

An overview of the virulence factors of S. aureus are illustrated in figure 2.

Figure 2 Major Virulence factors in S. aureus [Todar (1). 2002]

2.3 AntibioticsIn Nature, fungus have developed a way to fight against their natural competitors and parasites thatconsist on different sets of biomolecules that can attack bacteria but are never harmful for themselves.This is due to the differences between prokariotic and eukariotic molecular cell mechanisms andcomposition. These substances can damage bacterial cells in distinct ways, always as inhibitors ofmicroorganism growth or provoking prokariotic cell lysis. Antibiotics have threemajor targets: impedethe cell wall synthesis, decrease protein production or block DNA replication. [Madigan et al. 2000],[Andersen. 1999]

The first antibiotic discovered was penicillin in 1928 by Alexander Flemming. Penicillin prevents thecross-linking between peptide side chain during peptidoglycan assembly, so it becomes impossible forthe bacterium to build the cell wall. Today we have a large number of antibiotics, some non-modifiedfrom the original synthesis of the fungi and some semi-synthetic, derived from penicillin, tetracyclineor chloramphenicol and all divided into groups by their function. Some antibiotics are called narrow-somebroad-spectrum antibiotics. Around Second World War the production of penicillin for treatmentof infections began, and since then its use climbed, and then the use of antibiotic as a growthpromotant. The next issue was problems with multiresistance. [Madigan et al. 2000], [Andersen. 1999]

2.4 ResistanceResistance genes produce proteins that defend organism against antibiotics. They are genes that encodefor degrading enzymes or pumping mechanism with makes the organisms resistant to differentantibiotic. These genes can be placed on chromosome, plasmids or other mobile elements astransposons, and are observed to spread (from one organism to another) through horizontal transfer.

The problem with antibiotic resistance and horizontal transfer is, that the more resistant the strain is,the more difficult the treatment of an infection are. [Madigan et al. 2000], [Andersen. 1999]

Page 12: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

Page 10

2.5 Horizontal transferVirulence factors from a variety of bacteria are known to be encoded by a plasmid or by mobile geneticelement as transposons. In S. aureus virulence properties as coagulase, hemolysin and enterotoxin arethought to be plasmid linked. Also multiple antibiotic resistance encoded by plasmid is common in S.aureus. It is also possible that a virulent factor is both encoded by the chromosome of the bacteria andon a virulence plasmid. [Madigan et al. 2000]

2.5.1 Mobile elementsPlasmidsPlasmids are genetic elements that replicate independently of the host chromosome. The plasmids haveno extracellular form and almost all plasmids are double-stranded DNA. Plasmids vary in size fromapproximately 1 to more than 1000 kb. A typical plasmid is less than 1/20 the size of the chromosomeand posses supercoiled configuration. Most plasmids are circular but many linear plasmids are known.[Madigan et al. 2000]

The amount of different plasmids in the cell varies from 1-3 copies to over 100 copies. The amountis controlled by genes on the plasmids and by interactions between the host and the plasmids.Most plasmids of gram-positive bacteria as S. aureus and S. epidermis replicate by a rolling circlemechanism. This give rises to a single stranded intermediate.Some plasmids have the ability to integrate into the chromosome and under such condition theirreplication comes under control of the chromosome. These plasmids are called episomes. [Madigan etal. 2000]

Not all genes present in the plasmids have been characterised. But all plasmids carry genes that ensuretheir own replication. Some plasmids also carry genes necessary for conjugation. Further more theplasmids can carry a wide variety of genes coding resistance or virulence. The "housekeepinggenes"essential to the host under all condition, are present at the chromosome of the cell and not onplasmids. Also genes for antibiotic production and physiological properties as degradation of unusualorganic compounds can be present on plasmids. [Madigan et al. 2000]

TransposonsTransposons contain genes that are capable of moving from one location on the genome to anotherunder certain conditions. The movement is facilitated of special genetic elements called transposableelements. Transposons can carry genes with important properties of the bacteria as resistance orvirulence genes. Some transposons are conjugative transposons and can not only move betweendifferent location on the genome but also transfer themselves from one bacterium to another.Conjugative transposons have mostly been found in gram-positive bacteria. [Madigan et al. 2000]

Page 13: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

Page 11

2.5.2 Gene transferMicroorganisms can receive plasmids and transposons in different ways by especially conjugation andtransformation but also from viruses during transduction. The conjugation, transformation andtransduction process will be described below. [Madigan et al. 2000]

ConjugationConjugation facilitates transfer of characters from one cell to another by cell-to-cell contact.Conjugation is a plasmid-encodedmechanism and that result in transfer of a copy of a plasmid (or partof the chromosome) to a new host. The mechanism involves a donor cell, which contains theconjugative plasmid and a recipient cell, which does not. The transfer during conjugation initiates with the synthesis of a sex pilus. The pili make specific contactwith a receptor on the recipient and then pulling the two cells together. Then a copy of the plasmid istransferred from one cell to another. The genes coded for the synthesis of the sex pilus are containedin the tra-region of the plasmid. [Madigan et al. 2000]

If a F-plasmid is integrated in the host chromosome, the chromosome or normally part of it can betransferred during conjugation by the same procedure described above during cell-to-cell contact. Therecipient can then obtain different character from the donor. These characters can be resistance orvirulence genes. [Madigan et al. 2000]

TransformationGenetic transformation is a process in which free DNA is incorporated into a recipient cell and bringsabout genetic changes. The DNA can come from lysed cells and the size of the fragments which canbe taken up is much smaller than the genome. Not all bacteria can take up DNA, those which can are called competent. Gram-positive bacteria as S.aureus only take up single stranded DNA, others only take up double stranded DNA. The DNA isbound to the cell surface and if the cell only takes up single stranded DNA, nucleases degrades one ofthe strand and the other one is taken up. The DNA is then integrated into the genome of the cell byrecombination processes. Thus by transformation the cell can gain properties as resistance or virulence.[Madigan et al. 2000]

TransductionIn transduction viruses can transfer DNA from one cell to another. When the virus replicates its DNAin the host cell, a part of the cells chromosome with genes can be integrated to the virus particle bymistake. When the phage attacks a new bacteria, the genes obtained from the last cell can be integratedin the new cells genome, resulting in a gain of new characters for the cell.Transduction has found to occur in a variety of bacteria including S. aureus. [Madigan et al. 2000]

Page 14: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

4/home/ibiology1/dave/Bacteria/Staphylococcus/[aureus/epidermis]/Main

Page 12

3 Data selection3.1 Defining the investigationFive strains of Staphylococcus aureus and one strain of Staphylococcus epidermis have been investigated. Inorder to get an overview of which strains contain the different genes, six groups of protein codinggenes have been defined:

I. ExoenzymesII. ToxinsIII. AdhesinsIV. Presumed resistance (found by the gene function)V. Hypothetical resistance (found by the gene function)VI. Known resistance

Groups I, II and III were collected on the basis of previous studies [Kuroda et al. 2001]. Three otherstrains of S. aureus were included in our studies, MRSA, COL and NCTC8325. S. epidermis strain usedwas RP62A.AllDNA sequenceswere obtained from the ibiology-server at CBS4. All the genes selectedare listed in table 2.

For each of these genes, the protein sequence was obtained from the NCBI homepage. This was doneby downloading all known annotations for N315 and extracting only those of our interest (groups I toVI).

3.2 Finding the genes in the non-annotated genomesThe protein sequences collected were then used to perform a BLAST search of protein against DNA(using TBLASTN). The BLAST was done locally on the ibiology server because of the large amountof data. The result of the alignment was a datafile of 146,416 lines. A problem occurred when it wastried to obtain an ideal E-value for discriminating correct andwrong alignments: The short genes (~150amino acids) tend to display a relatively high E-value (for a seemly good alignments) whereas pooralignments of longer genes (~1500 amino acids) resulted in lowE-values. The following approximationwas therefore done: if a good alignment of a short protein gave an acceptable low E-value (fx 10-50),it was unuseful for us, since the short proteins are not so significant than the longer ones. If ourcomparison of the six different strains should be successful, the investigated proteins must besignificant. By examination of the results, it was concluded that only high-quality alignments with a E-value less than 10-90, should be included in our study and therefore good but short alignments werediscarded.

Page 15: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

Page 13

I - ExoenzymesSA0022 hypothetical protein, similar to 5'-nucleotidaseplc 1-phosphatidylinositol phosphodiesterase precurosrcoa staphylocoagulase precursorgeh glycerol ester hydrolaseSA0610 hypothetical protein, similar to lipase LipASA0743 hypothetical protein, similar to staphylocoagulase precursorSA0746 staphylococcal nucleasehtrA serine protease HtrAsspB cysteine protease precursornuc thermonucleasesplF serine protease SplFsplD serine protease SplDsplC serine protease SplCsplB serine protease SplBsplA serine protease SplASA1725 Staphopain, Cysteine Proteinasesak STAPHYLOKINASE PRECURSORhysA hyaluronate lyase precursorSA2323 conserved hypothetical proteinaur zinc metalloproteinase aureolysinlip triacylglycerol lipase precursor

II - ToxinersSA0276 conserved hypothetical protein, similar to diarrheal toxinSA0357 hypothetical protein, similar to exotoxin 2set6 exotoxin 6set7 exotoxin 7set8 exotoxin 8set9 exotoxin 9set10 exotoxin 10set11 exotoxin 11set12 exotoxin 12set13 exotoxin 13set14 exotoxin 14set15 exotoxin 15SA0657 hypothetical protein, similar to hemolysin homologueSA0780 hypothetical protein, similar to hemolysinSA1007 Alpha-Hemolysin precursorSA1009 hypothetical protein, similar to exotoxin 1SA1010 hypothetical protein, similar to exotoxin 4SA1011 hypothetical protein, similar to exotoxin 3SA1430 hypothetical protein, similar to enterotoxin A precursorlukD leukotoxin, LukD [Pathogenicity island SaPIn3]lukE leukotoxin LukEseg extracellular enterotoxin type G precursorsen enterotoxin SeNyent2 enterotoxin YENT2yent1 enterotoxin Yent1sei extracellular enterotoxin type I precursorsem enterotoxin SEMseo enterotoxin SeOsep enterotoxin PSA1812 hypothetical protein, similar to synergohymenotropic toxin

precursor - Staphylococcus intermediusSA1813 hypothetical protein, similar to leukocidin chain lukM precursorsel extracellular enterotoxin Lsec3 enterotoxin typeC3tst toxic shock syndrome toxin-1hld delta-hemolysinSA1973 hypothetical protein, similar to hemolysin IIIhlgA gamma-hemolysin chain II precursorhlgC gamma-hemolysin component ChlgB gamma-hemolysin component B

III - AdhesinsclfA fibrinogen-binding protein A, clumping factorclfB Clumping factor BsdrC Ser-Asp rich fibrinogen-binding, bone sialoprotein-binding proteinsdrD Ser-Asp rich fibrinogen-binding, bone sialoprotein-binding proteinsdrE Ser-Asp rich fibrinogen-binding, bone sialoprotein-binding proteinSA0587 lipoprotein, Streptococcal adhesin PsaA homologuessp extracellular ECM and plasma binding proteinSA0745 hypothetical protein, similar to extracellular matrix and plasma

bindingSA1000 hypothetical protein, similar to fibrinogen-binding proteinSA1003 hypothetical protein, similar to fibrinogen-binding proteinSA1004 hypothetical protein, similar to fibrinogen-binding proteinebhA hypothetical protein, similar to streptococcal adhesin embebhB hypothetical protein, similar to streptococcal adhesin embebpS elastin binding proteinfnbB fibronectin-binding protein homologfnb fibronectin-binding protein homologicaA intercellular adhesion protein AicaD intercellular adhesion protein DicaB intercellular adhesion protein BicaC intercellular adhesion protein C

IV - Presumed resistanceSA0878 toxic anion resistance protein homologuebleO bleomycin resistance proteinmecR1 methicillin resistance proteinmecI methicillin resistance regulatory proteinbacA bacitracin resistance protein (putative undecaprenol kinase)

homologuenorA quinolone resistance proteinllm lipophilic protein affecting bacterial lysis rate and methicillin

resistance levelfmt Fmt, autolysis and methicillin resistant-related proteinfmtC oxacillin resistance-related FmtC proteinfemA factor essential for expression of methicillin resistanceSA1580 multidrug resistance protein homologSA1591 arsenical resistance operon repressor homologfosB fosfomycin resistance protein fofB - Staphylococcus sp. plasmid

V - Hypothetical resistanceSA2250 hypothetical protein, similar to antibiotic resistance proteinSA2142 hypothetical protein, similar to multidrug resistance proteinSA2203 hypothetical protein, simialr to multidrug resistance proteinSA2222 hypothetical protein, similar to bicyclomycin resistance protein

TcaBSA2241 hypothetical protein, similar to chloramphenicol resistance proteinSA1970 hypothetical protein, simialr to multidrug resistance proteinSA2056 hypothetical protein, similar to acriflavin resistance proteinSA1238 hypothetical protein, similar to tellurite resistance proteinSA1148 hypothetical protein, similar to aluminum resistance proteinSA0681 hypothetical protein, similar to multidrug resistance proteinSA0132 hypothetical protein, similar to tetracyclin resistance proteinSA0874 hypothetical protein, similar to multidrug resistance protein-related

proteinSA0115 hypothetical protein, similar to multi-drug resistance efflux pump

VI - Known resistancebleO bleomycin resistance proteinmecA penicillin binding protein 2 primeermA rRNA methylaseSA1448 conserved hypothetical proteinermA rRNA methylase Erm(A)crtN squalene synthaseaadD kanamycin nucleotidyltransferaseant(9) O-nucleotidylltransferase(9)SAV0399 hypothetical protein

Table 2: The selection of genes used in the project

Page 16: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

Page 14

"Staphylococcus aureus N315","1_SA0022"," 1450 ","1","31006","33321","0.0""Staphylococcus aureus N315","1_SA0022"," 96.7 ","155","160436","161908","3e-20""Staphylococcus aureus N315","1_SA0022"," 66.7 ","151","890856","891857","4e-11"

Figure 3: Structure of converted BLAST-results: Specie, Genecode, Score, Query Start, Subject start, Subject end, E-value. Note that the genecode contains a number indicating which group it belongs to.

3.3 Sorting the BLAST resultsThe output from the BLAST search was a file with all the alignments. A Visual Basic program wasconstructed to convert the data to a format illustrated in figure 3, that could be easily read andcompared:

It should be noted, that 12 genes out of the 119 genes investigated was not found in N315 (whichcorrespond to a BLAST of the genome against itself). This indicates, that some deletions in the blast-results might have been too discriminative. Alignments of shorter proteins seemed to result in high E-values compared to alignments of longer proteins. Since many of the shorter alignments gave high E-values (above 10-90) they were discarded in the data selection although the alignment gave high percentidentities. An example of this is given in appendix F. This problem could have been solved by takinginto account the percent identities. Unfortunately time limitation has made this impossible.

A total of 7805 alignments were extracted from the BLAST results data file. Only 969 of thesedisplayed E-values at or below 10-90. From these 969 alignments 332 were double alignments, and wastherefore excluded. This was done manually, which is off course not the ideal solution. A bettersolutionwould have been towrite a script which identifies the overlapping alignments, and then discardthem. At this point a total of 637 alignments will make a basis for the analysis in the present work. Aninvestigation was done showing which genes was located in each strains (see table 3, note the missinggenes for N315 due to the discriminating data sort).

Page 17: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

Page 15

Mu50

N315

COL

NCTC8325

MRSA

RP62A

Mu50

N315

COL

NCTC8325

MRSA

RP62A

# Group I - ExoenzymesSA0022 ! ! ! ! ! " splD ! ! ! ! ! "plc ! ! ! ! ! " splC ! ! ! ! ! "coa ! ! ! ! ! " splB ! ! ! ! ! "geh ! ! ! ! ! ! splA ! ! ! ! " "SA0610 ! ! ! ! ! ! SA1725 ! ! ! ! ! !SA0743 ! ! ! ! ! " sak ! ! " ! ! "SA0746 ! ! ! ! ! " hysA ! ! ! ! ! "htrA ! ! ! ! ! ! SA2323 ! ! ! ! ! !sspB ! ! ! ! ! ! aur ! ! ! ! ! !nuc ! ! ! ! ! " lip ! ! ! ! ! !splF ! ! ! ! ! "

# Group II - ToxinesSA0276 ! ! ! ! ! " lukE ! ! ! ! ! "SA0357 ! ! ! ! " " seg ! ! " " ! "set6 ! ! ! ! ! " sen ! ! " " ! "set7 ! ! ! ! ! " yent2 " " " " " "set8 ! ! ! ! ! " yent1 " " " " " "set9 ! ! ! ! ! " sei ! ! ! " ! "set10 ! ! " ! ! " sem ! ! " " ! "set11 ! ! " ! ! " seo ! ! " " ! "set12 ! ! " ! " " sep ! ! " " ! "set13 ! ! ! ! ! " SA1812 ! ! ! ! ! "set14 ! ! ! ! ! " SA1813 ! ! ! ! ! "set15 ! ! ! ! " " sel ! ! " " " "SA0657 ! ! ! ! ! ! sec3 ! ! ! " " "SA0780 ! ! ! ! ! ! tst ! ! " " " "SA1007 ! ! ! ! ! " hld " " " " " "SA1009 ! ! ! ! ! " SA1973 ! ! ! ! ! !SA1010 ! ! ! ! ! " hlgA ! ! ! ! ! "SA1011 ! ! ! ! ! " hlgC ! ! ! ! ! "SA1430 " " " " " " hlgB ! ! ! ! ! "lukD ! ! ! ! ! "

# Group III - AdhesinsclfA ! ! ! ! ! " SA1004 " " " " " "clfB ! ! ! ! ! " ebhA ! ! ! ! ! !sdrC ! ! ! ! ! ! ebhB ! ! ! ! ! !sdrD ! ! ! ! ! ! ebpS ! ! ! ! ! "sdrE ! ! ! ! ! ! fnbB ! ! ! ! ! "SA0587 ! ! ! ! ! ! fnb ! ! ! ! ! "ssp ! ! ! ! ! " icaA ! ! ! ! ! !SA0745 ! ! " " " " icaD " " " " " "SA1000 " " " " " " icaB ! ! ! ! ! !SA1003 ! ! ! ! " " icaC ! ! ! ! ! !

# Group IV-Presumes ResistanceSA0878 ! ! ! ! ! ! fmt ! ! ! ! ! !mecR1 ! ! ! " ! ! fmtC ! ! ! ! ! !mecI " " " " " " femA ! ! ! ! ! !bacA ! ! ! ! ! ! SA1580 ! ! ! ! ! !norA ! ! ! ! ! ! SA1591 " " " " " "llm ! ! ! ! ! ! fosB " " " " " "

# Group V -Hypothetical ResistanceSA2250 ! ! ! ! ! ! SA1238 ! ! ! ! ! !SA2142 ! ! ! ! ! ! SA1148 ! ! ! ! ! !SA2203 ! ! ! ! ! ! SA0681 ! ! ! ! ! !SA2222 ! ! ! ! ! ! SA0132 ! ! ! ! ! "SA2241 ! ! ! ! ! ! SA0874 ! ! ! ! ! !SA1970 ! ! ! ! ! ! SA0115 ! ! ! ! ! "SA2056 ! ! ! ! ! !

#### Group VI - Known ResistancebleO " " " " " " ant9 ! ! " " ! !mecA ! ! ! " ! ! ant9 ! ! " " ! !ermA ! ! " " ! ! ant9 ! ! " " ! !SA1448 ! ! ! ! ! ! ant9 ! ! " " ! !ermA ! ! " " ! ! ant9 ! ! " " ! !crtN ! ! ! ! ! " SAV0399 " " " " " "aadD ! ! " " ! "

Table 3: Selected BLAST results

Page 18: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

Page 16

Aprogramwas constructed (source code not included) which enabled us to get an overview of the genelocations in the six strains, based on the locations in the chromosome given by BLAST. This is shownon the image in figure 4.

Gene locations in S. aureus COL1bp 2,809,421bp

Gene locations in S. aureus MRSA1bp 2,902,619bp

Gene locations in S. aureus Mu501bp 2,878,134bp

Gene locations in S. aureus N3151bp 2,813,641bp

Gene locations in S. aureus NCTC83251bp 2,814,733bp

Gene locations in S. epidermis RP62A1bp 2,619,000bp

Figure 4: Gene locations found by BLAST

It was discovered that NCTC8325 had its genes lined up in a reverse order compared to the four otherS. aureus examined. This indicates, that the opposite strand has been sequenced. With the help of PederWorning at CBS it was determined, that the point of origin for NCTC8325 was located at 2.260.000bp. For further information, see section 4.1.2. Below at figure 5 N315 is aligned against NCTC8325.In the illustration all genes for NCTC8325 have been rotated and corrected from point of origin.

Gene locations in S. aureus N3151bp 2,813,641bp

Gene locations in S aureus NCTC83251bp 2,619,000bp

Figure 5: Alignment of the oposite stand of NTCT8325

This indicates that the NCTC8325 has many similarities with the other S. aureus strains investigated. Inthe later comparison of the generated atlases it must be taken into account, that the NCTC8325sequence is reversed and displayed with a false point of origin. This has not been corrected in eitherof the atlases shown in the present project due to time limitations.

Page 19: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

Page 17

4 Atlas study4.1 About the genome AtlasIn the investigation of pathogenicity in Staphylococcus, custom genome atlases were used. A genome atlasis a representation of three very important properties of DNA, which are as follows: repeats, structuralparameters and parameters that are directly related to the base composition [Ussery, D.W., 2001]. In the atlasesused, global direct repeats, global inverted repeats, position preference, stacking energy, GC-skew andAT content are investigated. Each property is displayed as colour intensity on its own circle.

4.1.1 RepeatsRepeating sequences are those that are found on multiple locations in the genome (called globalrepeats) or in a limited region (called local repeats). It is not discussed in details how these searches areperformed. But in nature, these two problems differs in some extent in their demand on computerpower. It would be very time consuming investigating global repeats, although smart searchingalgorithms might reduce the task. In the atlas shown in this report, repeats are based on BLAST-searches against the genomes themselves.

Global Direct RepeatsThe global direct repeats are found by BLAST. Only 100bp alignments are included, and the scale isconnected to the E-value so that significant repeats are shown as intense colours. [Ussery, D.W., 2001]

Global Inverted RepeatsThe calculation of the inverted repeats is similar to the direct repeats, but only alignments in theopposite strand are included [Ussery, D.W., 2001]. To avoid noise on the atlas plots, the scale for boththe global inverted repeats and the global direct repeats are limitied to a region of 5 to 7.5.

4.1.2 Base compositionAT contentThe AT content (percent AT) is of great interest, since the binding energy for adenine-thymine is lowerthan what is observed for the guanine-cytosine. This results in lower melting temperature in AT-richregions than the GC-rich. When working with primers and primerdesign, a rule of thumb gives themelting point in /C from:

Tm = 4A(C+G)+2A(A+T)

This is only valid for short primers and is just a approximation. But it indicates thatmelting temperature(hybridization energy) is correlated to the GC-content.

In addition AT rich regions exhibit a higher amount of intrinsic curvature and tend to be less flexible.

Page 20: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

Page 18

Figure 6: GC-skew (divided by the total C+G content) of E. coli. [Mrázek, J., 1998]

GC-skewTheGC-skew is the difference between the number ofG’sminus the number of C’s in the same strand.This is calculated over a 10kbwindow. A noticeably asymmetri in the GC-skew exists in many bacterialgenomes [Mrázek, J.,1998] and this is indeed true for the Staphylococcus genome. The strandcomposistion variation may be due to replication synthesis of leading and laggin strand. Withinformation about GC-skew, it is possible to calculate the point of origin:

4.1.3 Structural parametersIntrinsic curvatureThe intrinsic curvature is a measure of the bend of the DNA strand. The algorithm used is theCURVATURE algorithm, which calculates roll, tilt and twist angles on a segment of DNA. Thealgorithm looks at the path of the DNA segments’ axis, which approximates to an arc with a givenradius and angle. The intrinsic curvature is of great importance: it has been shown, that E. colipromoters is more curved than its coding sequences from the same genome [Ussery, D. W].

Stacking energyThe stacking energy gives as value of how easily the helix will melt. This value is correlated with theAT-content.

Position preferenceThis property is related toDNA strands flexibility. Correlation between rate of expression and the levelof the position preference.

Page 21: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

Page 19

5 Investigation of gene clustersThe main atlases are shown in appendix B. Three pathogenic island, have been found in the N315strain, namely Exotoxic island, Enterotoxic island and toxic-shock syndrome toxin 1 island. Some of the genesfrom these islands have been included in our investigation, and they have therefore been included inthe BLAST search on the other five strains of Staphylococcus. Atlas zooms have been constructed forall the six strains, which depict their containment of the three islands (Appendix C). Furthermore aresistance cluster and a adhesin cluster are investigated. See appendix D and E.

5.1 The enterotoxic islandThe Enterotoxic island consists of three clusters with genes coding for the following proteins:

1. Exoenzymes2. Leukotoxins and haemolysins 3. Enterotoxins

Exoenzymes Leukotoxins and haemolysins EnterotoxinsN315 All genes conserved All genes conserved All genes conservedMu50 All genes conserved All genes conserved All genes conservedCOL All genes conserved All genes conserved One gene located elsewhere in the genome

NCTC8325 All genes conserved All genes conserved NoneMRSA Most genes conserved Located elsewhere in the genome All genes conservedRP62A None None None

Table 4: Entetoxins island

The zooms of the enterotoxic island are shown in appendix C1. It seems that enterotoxic island is wellconserved in all S. aureus strains, especially in the N315 and Mu50. But there are no sign of the islandin the RP62A strain. See appendix B6. The exoenzyme cluster displays a high degree of direct repeats,and does not have any extreme values for the position preference, stacking energy, GC skew or ATcontent. The cluster shows the same pattern in all of the five strains.

Most of the genes in the Leukotoxin and hemolysin cluster have a large amount global inverted repeats.Compared to its surroundings, the cluster shows a decreased level of AT%, stacking energy andposition preference.

All genes in the enterotoxic clusters are located in an AT rich region, which also has a high stackingenergy and position preference.

5.2 The exotoxic islandIt has been shown previously by Kuroda which genes are located in the exotoxic island [Kuroda et al.2001]. Only the genes from the exotoxic cluster are included in the present study since the rest do notexhibit any specific pathogenic properties.

N315 All genes conservedMu50 All genes conservedCOL Most genes conserved

NCTC8325 All genes conservedMRSA Most genes conservedRP62A None

Table 5: Exotoxic island

Page 22: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

Page 20

In appendix C2 the atlas zooms for the exotoxic cluster are included. At appendix B6 it is alsoconfirmed that there are no sign of the exotoxic island in the RP62A strain.

The whole region around the exotoxic cluster has a negative GC-skew. The two genes set8 and set9(which are well conserved) have a positiveGC-skew. This is the case for the four strains of N315, COL,Mu50, and MRSA. This phenomenon is completely the opposite in the NCTC8325. This might berelated to the fact that the NCTC8325 strain has been sequenced on the opposite strand. Near andaround the cluster high levels of global direct repeats are present. This feature is especially presentinside the genes coding set8 and set9.

5.3 The toxic-shock syndrome toxin 1 islandAs it has been shown previously by Kuroda which genes are located in the toxic-shock syndrome toxin1 island, only a few genes (sel, sec, tst) have been selected for this examination. [Kuroda et al. 2001]Three of the TSS-island genes investigated are located in the Mu50 and N315.

N315 All genes conserved

Mu50 All genes conserved

COL 1 gene conserved (sec3), but located elsewhere in the genome

NCTC8325 None

MRSA None

RP62A None

Table 6: TSS island

Only one gene of the TSS island is found to be present in the COL and none of the genes investigatedwas observed in the NCTC8325, MRSA or RP62A. This indicates that only N315 and Mu50 woulddisplay TSS properties.

When looking at the region around the three islands, a striking resemblance is observed: In a distanceabout+100kb from the island extreme values are found for the structural properties, base compositionand repeating sequences. AT-content is very low, GC-skew is more extreme compared to thesurroundings, stacking energy is very low, low position preference and high inverted and direct repeats.In the epidermis, these properties can be seen near the toxin SA1973.

5.4 Investigation of resistance clustersIt can be seen from table 3 that the multi resistant strain (MRSA) do not contain all the resistant genesidentified in Mu50 and N315. This indicates that MRSA might display fewer resistant properties. Thismight not be the truth since annotations from the MRSA are not yet known and can therefore not beinvestigated in the N315. Zooms which depict a resistance cluster have been made, including the gene coding for methicillinresistance.

Page 23: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

5Infection of the heart's inner lining (endocardium) or the heart valves

Page 21

aadD MecA mecR1 ermA ant9

N315 Conserved Conserved Conserved Conserved Conserved

Mu50 Conserved Conserved Conserved Conserved Conserved

COL None Conserved Conserved None None

NCTC8325 None None None None None

MRSA Conserved Conserved Conserved Conserved Conserved

RP62A None Conserved Conserved Conserved Conserved

Table 7: Resistance clusters

This indicates that the NCTC8325 has fewer resistant properties, and this can in fact be observed infigure 4 and 5.

Inside and around the ermA and ant9 genes, large regions of repeats are observed. In the N315,intensive direct and inverted repeats are seen, whereas the RP62A displays strong inverted repeats.Mu50 has strong direct repeats.

5.5 Investigation of adhesins clusterThe two large clustered genes, ebhA and ebhB and the clustered genes icaA, icaB and icaC, are examinedin order to show an atlas pattern of the adhesins group. A large number of small direct repeats are observed for the ebhA, wheres practically no repeats arefound in the ebhB. Very low stacking energy, very low AT-content and low position preference arefound for all strains at this region, except for the RP62A. This could indicate that the ebhA and ebhBgenes are not very expressed in the S. epidermis. Other studies by Kuroda suggest that the ebhA and ebhBare associated with bacterial endocarditis5. It has been suggested by Kuroda that the ica's are related to biofilm production, but no characteristicproperties are observed in the region near this gene cluster. All ica's are found in each of the six strain.[Kuroda et al. 2001]

Page 24: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

Page 22

6 ConclusionThe approach used to group and locate interesting genes seemed useful and gave an overview of whichgenes should be examined further. Using a program to draw the gene locations gave a graphicalrepresentation of the similarity of the six genomes. In addition, this helped us reveal that the oppositestrand of NCTC8325 has been sequenced. With help from Peder Worning at CBS, the true point oforigin has be calculated to be located at 2.260.000 bp.

A major drawback of this approach was the discrimination of the BLAST results. Some well-alignedgenes might have been discarded which might have lead to difficulties explaining some of the atlasfeatures revealed. This was due to the use of the limit for E-value.

It has been concluded that the N315 and Mu50 genomes are almost identical. The MRSA lack only afew of the virulent genes investigated, whereas COL and NCTC8325 in addition lacks some of theresistance genes. The S. epidermis RP62A is the most diverse of the six strains: it lacks all of thepathogenic islands studied, and only contain genes coding for three of the 39 toxin investigated. Thisexplains its lower pathogenicity.

Only N315 and Mu50 genome contain gene, tst, coding toxic shock syndrome.

Although some interesting properties of the S. aureus and S. epidermis atlas have been emphasized, noevident atlas features where found to be related to the pathogenicity properties of the bacteria.

The most important feature of the pathogenic islands are the extreme values, located 100kb from theisland. The GC-skew, AT-content, stacking energy and position preference.

Page 25: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

Page 23

7 ReferencesAndersen S. Antibiotikaresistens. Gyldendal uddannelse, 1. Udgave. p. 38 - 42 (1999)

Frontpage: Staphylococcus. Scanning electron micrograph of Staphylococcus aureus, homepage: http://www.bact.wisc.edu/Bact330/lecturestaph (2002)

Glosdby, R. A., Kindt, T. J., Osborne, B. A., Kuby Immunology, W. H. Freeman and Company (2000)

Højby, N., et al., Basal og klinisk mikrobiologi. FADLs Forlag, 2. Udgave (1998)

Kuroda, M. et. al., Whole genome sequencing of meticilling-resistant Staphylococcus aureus, The Lancet vol. 357 p. 1225-1240 (2001)

Landsman, D., Gabrielian, A. E., Bolshoy, A., Curved DNA in promoter sequences, homepage: http://www.bioinfo.de/isb/1999/01/0017/ (2002)

Madigan, M. T., Martinko, J.M. and Parker, J., Biology of microorganisms, 9th edidtion, Prentice Hall, New Jersey, USA. p306-308, 314-318, 319-321, 324-325, 503, 695, 753-759, 766, 775, 783, 786-788, 927-929. (2000)

Moss, J., B., Iglewski, M., Vaughan Anthony T. Tu, Handbook of natural toxins volumes 4 and 8. Marcel Dekker (1995).

Mrázek, J., Karlin, S., Strand compositional asymmetry in bacterial and large viral genomes. Proc. Natl. Acad. Sci. USA Vol. 95, pp. 3720–3725, (March 1998)

Skovgaard, M., Jensen, L. J., Friis, C., Stærfeldt, H. H., Worning, P., Brunak, S., Ussery, D. W., The Atlas visualisation of Genome-wide Information, Center for biological SequenceAnalysis BioCentrum-DTU. Binder for course 27011, spring 2002. (2002)

Todar, K. (1) Bacteriology at UW-Madison, Staphylococci, homepage:http://www.people.virginia.edu/~zs9q/zsfig/toxins.html. (2002).

Todar, K. (2) Bacteriology at UW-Madison, Colonization and Invasion , homepage:http://www.bact.wisc.edu/Bact330/lecturecolin

Ussery, D. W., An Introduction to Genome Atlases, homepage: http://www.cbs.dtu.dk/dave/MScourse/GenomeAtlas_intro.html (2001)

Wassenaar, T. M., Gaastra, W., Bacterial virulence: can we draw the line?, FEMS Microbiol. Letters 201 1-7 (2001)

Page 26: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Introduction to Bioinformatics 27011

Page 24

8 Appendix listA Program for BLAST data conversion

B Genome AtlasB1: N315B2: Mu50B3: COLB4: NCTC8325B5: MRSAB6: RP62A

C Zoom

C1 Enterotoxic islandC1-1: N315C1-2: Mu50C1-3: COLC1-4: NCTC8325C1-5: MRSA

C2 Exotoxic islandC2-1: N315C2-2: Mu50C2-3: COLC2-4: NCTC8325C2-5: MRSA

C3 TSS toxin 1 islandC3-1: N315C3-2: Mu50

D Resistance clustersD1: N315D2: Mu50D3: COLD4: MRSAD5: RP62A

E Adhesins clusters

E1: Hypothetical proteins ebhA and ebhBE1-1: N315E1-2: Mu50E1-3: COLE1-4: NCTC8325E1-5: MRSAE1-6: RP62A

E2: Intercellular proteinsE2-1: N315E2-2: Mu50E2-3: COLE2-4: NCTC8325E2-5: MRSAE2-6: RP62A

F Short/long alignments

Page 27: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Appendix A

Page 28: Pathogenic investigation of Staphylococcus aureus …pfh/files/mrsa.pdf · Pathogenic investigation of and Staphylococcus epidermis Staphylococcus aureus Introduction to Bioinformatics

Query= fosB (139 letters)

>Staphylococcus aureus N315 Length = 2813641

Score = 270 bits (682), Expect = 3e-73 Identities = 127/139 (91%), Positives = 127/139 (91%) Frame = +3

Query: 1 MLKSINHICFSVRNLNDSIHFYRDIXXXXXXXXXXXXAYFELAGLWIALNEEKDIPRNEI 60 MLKSINHICFSVRNLNDSIHFYRDI AYFELAGLWIALNEEKDIPRNEISbjct: 2389305 MLKSINHICFSVRNLNDSIHFYRDILLGKLLLTGKKTAYFELAGLWIALNEEKDIPRNEI 2389484

Query: 61 HFSYTHIAFTIDDSEFKYWHQRLKDNNVNILEGRVRDIRDRQSIYFTDPDGHKLELHTGT 120 HFSYTHIAFTIDDSEFKYWHQRLKDNNVNILEGRVRDIRDRQSIYFTDPDGHKLELHTGTSbjct: 2389485 HFSYTHIAFTIDDSEFKYWHQRLKDNNVNILEGRVRDIRDRQSIYFTDPDGHKLELHTGT 2389664

Query: 121 LENRLNYYKEAKPHMTFYK 139 LENRLNYYKEAKPHMTFYKSbjct: 2389665 LENRLNYYKEAKPHMTFYK 2389721

Query= rbsK (304 letters)

>Staphylococcus epidermidis RP62A Length = 2619000

Score = 381 bits (968), Expect = e-106 Identities = 195/299 (65%), Positives = 230/299 (76%) Frame = -2

Query: 3 NKVVILGSTNVDQFLTVERYAQPGETLHVEEAQKAFGGGKGANQAIATARMQADTTFITK 62 NKV+++GSTNVD+FL V+R+ +PGETLH+ +AQK FGGGKGANQAIA +R+ ADTTFI+KSbjct: 173612 NKVIVIGSTNVDKFLNVKRFPKPGETLHINQAQKEFGGGKGANQAIAASRLAADTTFISK 173433

Query: 63 IGTDGVADFILEDFKAAHIDTSYIIKTTEAKTGQAFITVNAEGQNTIYVYGGANMTMTPE 122 +G DG A+FILEDFK A I T YI+ + +TGQAFITV+ GQNTI VYGGANMT++ Sbjct: 173432 VGKDGNANFILEDFKKAGIHTQYILTSESEETGQAFITVDEAGQNTILVYGGANMTLSAT 173253

Query: 123 DVINAKDAIINADFVVAQLEVPIPAIISAFEIAKAHGVTTVLNPAPAKALPNELLSLIDI 182 DV + DA I ADFVVAQLEVP AI AF+IA+ +TTVLNPAPA LP LL L DISbjct: 173252 DVEMSVDAFIGADFVVAQLEVPFEAIEQAFKIARKQNITTVLNPAPAIELPKSLLELTDI 173073

Query: 183 IVPNETEAELLSGIKVTNEQSMKDNANYFLSLGIKTVLITLGKQGTYFATKNQSQHIEAY 242 I+PNETEAELL+GI + NE MK+ A YFL LGI VLITLG+QGTY A + Q + I A Sbjct: 173072 IIPNETEAELLTGISINNESDMKETATYFLDLGISAVLITLGEQGTYCAYQEQYKMIPAC 172893

Query: 243 KVNAIDTTAAGDTFIGAFVSRLNKSQDNLADAIDFGNKASSLTVQKHGAQASIPLLEEV 301 V AIDTTAAGDTFIGAF+S LNK NL AI N+ASSLTVQ+ GAQASIP +EVSbjct: 172892 NVKAIDTTAAGDTFIGAFLSELNKDLSNLESAIRLANQASSLTVQRKGAQASIPTRKEV 172716

Appendix F1) Example of a good alignment resulting in a high E-value. The relatively short fosB gene is alignedwith 91% identities, giving an E-value above 10-90. (10-73)

2) Example of a less good alignment resulting in a low E-value. The relatively long rbsK gene (notincluded in the pathogenic investigation) is aligned with only 65% identities, giving an E-value below 10-90 (10-106).