Determining kinship relationships using genetic markers Tom Wenseleers Laboratorium voor Entomologie KULeuven [email protected] Lecture can

Determining kinship relationships using genetic markers

Tom WenseleersLaboratorium voor Entomologie

[email protected]

Lecture can be downloaded from bio.kuleuven.be/ento/wenseleers/twpub.htm#courses

GA75 'Moderne Onderzoeksmethoden in de Biologie‘, maart 2006

Use of genetic markers to test hypotheses regarding the kinship relationships among specific individuals

Kinship analysis: estimating the genetic relatedness among individuals

Parentage analysis: identification of parents of specific offspring

Parentage and kinship analysis

ApplicationsEVOLUTIONARY STUDIES

what is the relatedness between pairs of individuals? test role of kin selection in the evolution of cooperative behaviour

how common are extra-pair fertilizations?

how common is intraspecific broodparasitism (egg dumping)?

BREEDING PROGRAMMES

design captive breeding programs for endangered species

assign F1 generation fish to a particular fatherin a mass-spawning experiment

HUMAN APPLICATIONS

parentage testing, forensic work at crime scenesvictim identification in mass disasters

What is Relatedness?

“The probability that a gene in one individual is an identical copy, by

descent, of a gene in another individual”

Relatedness is a measure not of the absolute genetic similarity between two individuals, BUT of the degree to which this similarity exceeds the background similarity between individuals randomly drawn from the population.

Marker considerations

Which markers can be used?

best: nuclear, codominant markers

(allozymes) – low variability, not ideal

microsatellites – tandem repeats of 2-5 bp motifs best choice

(high resolution, little DNA required)

also: dominant markers, but analysis methods not as powerful

minisatellites

AFLP

RAPD

AA AB BB

AA AB BB

Non-repetitiveflanking sequence

Non-repetitiveflanking sequence

Microsatellite array(~20-100bp)

Length depends onrepeat unit size andnumber of repeats

CA CA CA CA CA CA CA CA CA CA CACA CA

Sequences of DNA consisting of repeats of 2-5 base pair motifs almost any combination possible (e.g. CA, GA, GGGAA )discovered in 1980s, see e.g. Tautz, Trick & Dover (1986)

Microsatellites = short tandem repeats

Microsatellites

present in genome (nucleus + mitochondria + chloroplasts) of all

eukaryotes

easily amplified using PCR and separated on DNA sequencer

even highly degraded DNA can be used, e.g. single hair, faeces

highly variable, usually between 5 and 20 alleles

di-repeats most common; tri-, tetra- and penta-repeats rarer

human genome: 35,000 (CA)n repeats

wasps: 1% of 500bp fragments contain tri-repeat sats

most common in mammals and insects, in birds 10 x rarer

Microsatellitesprimers to amplify sats:

- already developed for many organisms- can sometimes be developed by searching Genbank for sat motifs- cross-amplify using loci developed for other species usually works within same genus and sometimes same family

if no loci are available isolate and sequence new onessteps: isolate total DNA restriction digest ligate small fragments into plasmid or phage vector transform E. coli cells plate out colonies lift colonies onto filters hybridize with probe containing sat repeats pick & sequence positive clones design primers

if sats are rare and you need many loci: enrichment step

alternatively: have them developed commercially, e.g. Amplicon, ca. 10.000 € / 10 loci

PCR conditions: universal touchdown program usually works for all loci

good isolation protocol: http://www.uga.edu/srel/DNA_Lab/protocols.htm

[ MENotes DB search ] [ MENotes DB home ] [ MENotes home ] ------------------------------------------------------------------------

This is the sister database for Molecular Ecology Notes, containing the details for reported loci (i.e., primer sequences, amplification conditions, polymorphism levels, cross-species amplification, and literature citations) in a searchable format. The database contains all Primer Note submissions to Molecular Ecology, as well as primer submissions to Molecular Ecology Notes. In the future, relevant submissions from other journals will be included, as it is hoped that this database will become the on-line resource for molecular markers developed for "non-commercial" and non-model species. The database may be searched using either the easy search page (searches based on family, genus or species names) or, in the future, an advanced search page, which will allow more flexible and detailed queries as the database grows. Authors whose data have been accepted for publication in the database should familiarize themselves with the database submission instructions and then use the database submission form to add their data. For your convenience, links to the search page, this page, and the Molecular Ecology Notes homepage at the Blackwell site are positioned at the top and bottom of each page. If you have any questions or comments about this site, please email the database/website administrator.

------------------------------------------------------------------------

[ MENotes DB search ] [ MENotes DB home ] [ MENotes home ]

Please direct comments and questions about this database to the administrator.

http://snook.bio.indiana.edu/MENotes/home.html

For most purposes a small bit of tissue can be boiled for 10 mins in 10% chelex resin, and this works fine as a template

cheap & easy

1. DNA extraction

genomic DNA+

primers+

Taq DNA polymerase+

dNTPs (ACGT)+

buffer

2. PCR amplification

Process repeated 30-40 times

Polymerase Chain Reaction

after 36 cycles: 236=68 billion copies

3. Detection:Radioactive (P33) end-labelling

3. Detection:Fluorescent labelling – gel based sequencer

3. Detection:Fluorescent labelling – capillary sequencer

4 or 5 labels + 1 labelfor internal sizestandard

allows running up to 20 loci simultaneously

DNA minisatellites (“fingerprints”)

- tandem repeats of core sequences 15-30 bp in length (variable number tandem repeats)- most minisats occur 10 or 20 times in the whole genome- human genome: ca. 50,000 VNTRs- detected using Southern blotting after restriction digest- disadvantages: DNA quality and amount needed; scoring problems

DNA fingerprints can identify individuals and determine parentage

E.g., DNA fingerprints confirmed Dolly the sheep was cloned from an adult udder cell

Donor udder (U), cell culture from udder (C), Dolly’s blood cell DNA (D), and control sheep 1-12

AFLP Amplified Fragment Length Polymorphism

Mutations at restriction enzyme cutting sites result in fragment length polymorphism

Ligation of adapters to genomic restriction fragments

Selective PCR amplification with adapter-specific primers

Advantage

low development cost

Disadvantages

dominant marker

scoring

RAPDRandom Amplified Fragment Polymorphism

Arbitrary primers8-10 bp long

V. little development

PCR amplification at low stringency (Ta 35-45oC)

VariabilityPoint mutations

Insertions / deletions

Disadvantages: dominant marker

repeatability, scoring

The bottom line

Microsatellite markers are the best !

Kinship analysis

Measure of Relatedness: Queller and Goodnight estimator

RPy P

* l

k

x

Px P

* l

k

x

R = relatedness between individuals x and y where Px = frequency within individual x of allele l at locus k (must be 0, 0.5 or 1.0 in diploid organisms)Py = frequency of same allele in individuals to which x is comparedP* = frequency of the allele in population at large (background

allele frequency)

(Queller and Goodnight 1989)

Other estimators

Queller & Goodnight estimator works with codominant markersassumes loci are unlinked

Also other estimators, e.g. Ritland (1996), Lynch & Ritland (1999)for codominant markers

Reeve et al. (1992), Lynch & Milligan (1994), Wang (2004)for dominant markers

Different pros & cons in terms of how efficient & biased they aresee Van de Casteele et al. (2001), Wang (2004)

Programs:

Relatedness calculates average genetic relatedness among sets of individuals defined by demographic variables, either on average or by pairs. It finds standard errors and confidence intervals for signifiance testing using a jackknife resampling method.

Features in Relatedness 5.0:

*Data sets with up to 127 loci of 127 alleles each, and number of individuals limited only by computer memory.

*Up to 32 demographic variables with complete control over the order in which they are checked.

*Pairwise values of relatedness.

*95% confidence intervals as well as standard errors.

The distribution package includes the program, a manual in Microsoft Word 6.0 format, and a sample data set. (A copy of the manual in Word 5.1 format is available on request.)

All Goodnight Software programs are for Macintosh PPC computers only.

All Goodnight Software programs are for Macintosh PPC computers only.

RELATEDNESS 5.0http://www.gsoftnet.us/GSoft.html

Input data file

(1) Allele frequency block – you can also have the program calculate this

*Relatedness data file, population: Sample Data*Saved 1/14/1998 10:12:28*config Guide F-Delim Deme-Col ID-Col Grp 1-Col Grp 2-Col Demog-Col*#config: T/ F1 T1 T2 F3 2@3*Allele frequencieslocus1 Freq locus2 Freq locus3 Freq locus4d 0.234 e 0.155 b 0.266 bb 0.191 c 0.241 c 0.321 ee 0.194 b 0.161 a 0.215 cc 0.254 d 0.309 e 0.143 aa 0.128 a 0.134 d 0.055 dend

*Relatedness data file, population: Sample Data*Saved 1/14/1998 10:12:28*config Guide F-Delim Deme-Col ID-Col Grp 1-Col Grp 2-Col Demog-Col*#config: T/ F1 T1 T2 F3 2@3*Allele frequencieslocus1 Freq locus2 Freq locus3 Freq locus4d 0.234 e 0.155 b 0.266 bb 0.191 c 0.241 c 0.321 ee 0.194 b 0.161 a 0.215 cc 0.254 d 0.309 e 0.143 aa 0.128 a 0.134 d 0.055 dend

Input data file

(2) Genotypes – you can also add demographic variables, e.g. nest

Ind ID K-nest color sibship locus1 locus2 locus31—1 1 red f1 d/d e/c b/c1—2 1 red f1 d/b c/e b/c1—3 1 red f1 b/d c/e c/a1—4 1 red f1 d/d c/e c/b1—5 1 red f1 b/d c/e a/c1—6 1 red f2 d/d b/c e/c1—7 1 red f2 d/d c/b c/e1—8 1 red f2 d/d c/b c/e1—9 1 red f2 d/d b/c c/a

Ind ID K-nest color sibship locus1 locus2 locus31—1 1 red f1 d/d e/c b/c1—2 1 red f1 d/b c/e b/c1—3 1 red f1 b/d c/e c/a1—4 1 red f1 d/d c/e c/b1—5 1 red f1 b/d c/e a/c1—6 1 red f2 d/d b/c e/c1—7 1 red f2 d/d c/b c/e1—8 1 red f2 d/d c/b c/e1—9 1 red f2 d/d b/c c/a

Analysis

1. Define Px and Pyi.e. define the sets of individuals you like to calculate the relatedness between

e.g. Px: all individualsPy: nest=X relatedness is calculated between

individuals of the same nest

Px: sex=femalePy: nest=X AND sex=female

relatedness is calculated between females of the same nest

2. Define whether to calculate pairwise and/or average relatedness

3. And how to calculate standard errors (by jacknifing over loci or over nests)

Results - exampleWhole population relatedness results:R: 0.5142 Nx: 341 Ny: 341Jackknife: By locus: By colony:Std. Err.: 0.0163 0.039495% Conf.: 0.0520 0.0839Pseud.: 4 16

Relatedness by colonyValue: R: Nx,Ny: J/loci: C.I.:MA1 0.7835 21,21 0.0808 0.2572MA2 0.4106 21,21 0.0385 0.1225MA3 0.3652 20,20 0.0907 0.2885R1 0.3907 26,26 0.0602 0.1916R3 0.5681 15,15 0.1031 0.3281R5 0.3994 25,25 0.0707 0.2250R6 0.3600 20,20 0.0291 0.0927R7m1 0.6503 4,4 0.0372 0.1185R7m2 0.4792 12,12 0.0612 0.1948T1 0.2852 24,24 0.0621 0.1976T2 0.4596 30,30 0.0841 0.2676T3 0.6132 25,25 0.0259 0.0825T4 0.5969 33,33 0.0828 0.2634T5 0.4373 18,18 0.0405 0.1287T7 0.6306 31,31 0.0460 0.1465T8 0.7126 16,16 0.1060 0.3372

Red wasp Vespula rufa

Average relatedness amongworkers from the same nest= 0.51

Less than the value expectedif they were full-sisters (0.75)

Implies mother queen mates with an average of (1/2.(0.51-0.25))=1.9 males

Wenseleers et al. Evolution 2005

DNA fingerprinting exampleMueller et al. (1994) PNASWhat is the average relatedness among females in nests of the Halictid bee Augochlorella striata ?

Used DNA fingerprinting – multilocus, dominant marker

Reeve et al. (1992): relatedness among individuals within a nest can be estimated as

R = (w-b) / (1-b)

where w = proportion of bands shared between individuals of same nestb = proportion of bands shared between individuals of different

nests

band sharing = 2Nab/(Na+Nb), where Nab is the total number of bands shared by individuals a and b and Na and Nb are the total number of bands present in a and b

Results: R=0.78, not significantly different from full-sister relationship (0.75)

Interesting application:estimate heritabilities in natural

populations

Thomas et al. (2000) Heredity

The heritability of a trait is usually determined using breeding experiments

But it can also be estimated in natural populations as the regression of the pairwise estimate of phenotypic similarity against r

Relatednessestimators

Platform pros cons

Relatedness 5.0 Q&G, pairwise + group-average

Mac + user interfacehaploid+diploid

Mac only

http://www.gsoftnet.us/GSoft.html

Identix Q&G, L&R, Idpairwise

PC 3 different estimators

flexibility

http://www.univ-lille1.fr/gepv/english/perso_pages_en/PagepersoVincent_c.htm

Spagedi 6 estimators, one for dominant markerspairwise

PC use of spatial info, can use dominant markers (AFLP/RAPD/minisat.)

flexibility

http://www.ulb.ac.be/sciences/ecoevol/spagedi.html

Delrious L&R,pairwise

PC Mathematica

- user inferface

http://www.zoo.utoronto.ca/stone/DELRIOUS/delrious.htm

Other kinship analysis programs

Parentage analysis

Parentage Analysis: Exclusion

Female Offspring Male1 Male 2 Male 3

Unsampled male with paternal allele

Is not the offspring ofMale 2

Question: which male is the father of a particular offspring?

Parentage Analysis: Exclusion Based on Compatibility of Genotypes Between Males and Females

Female1 Offspring Male1 Male 2Female 2

With no a priori knowledge F1/M1 or F2/M2 are equally likely sets of parents

Question: what are the parents of a particular individual?

Version 2.0© Copyright Tristan Marshall 1998-2001

------------------------------------------------------------------------

About CERVUS

CERVUS is a Windows 95-based program designed for large-scale parentage analysis using co-dominant loci. Analysis is broken down into three sequential stages. Using genotype data in text file format, the program can analyse allele frequencies, run appropriate simulations and carry out likelihood-based parentage analysis, testing the confidence of each parentage using the results of the simulation. Simulations may also be used to estimate the power of a series of loci for parentage analysis, using real or imaginary allele frequencies.

http://helios.bto.ed.ac.uk/evolgen/cervus/cervus.html

References

Marshall, TC, Slate, J, Kruuk, LEB & Pemberton, JM (1998) Statistical confidence for likelihood-based paternity inference in natural populations. Molecular Ecology 7(5): 639-655.

Slate J, Marshall TC & Pemberton JM (2000) A retrospective assessment of the accuracy of the paternity inference program CERVUS. Molecular Ecology 9(6): 801-808.

http://helios.bto.ed.ac.uk/evolgen/cervus/cervus.html

Use of Cervus

Uses likelihood methods to find the most likely parents. Useful when more than one possible parent remains non-excluded.

Cervus calculates the likelihood ratio, or Paternity Index (the likelihood that the candidate parent is the true parent divided by the likelihood that the candidate parent is not the true parent), and LOD scores (the log base e of the product of the likelihood ratios at each locus).

Delta (difference in LOD scores between the most likely parent and the second most likely parent) assesses the reliability of the assignment.

LOD score of 0 means that the candidate parent is equally likely as a random individual.

The most likely parent is the one with the most positive LOD score.

Statistical Power of loci for parentage analyses

Assessed via Probability of Exclusion: P(E) - probability of excluding a male who is not the genetic father of a given offspring

Calculated for each locus and then values pooled across all loci

Pi(E): for a given paternal allele probability that another male has that allele

Examples of Values for PE

Eight microsatellite loci cloned from Northern Watersnakes (Nerodia sipedon) (Prosser

et al. 1999)

Locus P(E)

Nsµ2 0.65

Nsµ3 0.82

Nsµ4 0.64

Nsµ6 0.55

Nsµ9 0.79

Nsµ10 0.66

Nsµ110 0.78

Nsµ119 0.86

Overall > 0.999

Individual PiE [C] values:0.99 - 0.9999999 ; mean = 0.999

Typing Errors

Perfect data is usually not the reality.A mismatch due to a typing error will exclude a

true parent in a simple exclusion analysis.In a likelihood analysis a single mismatch does

not exclude a parent, it simply decreases the likelihood, but a true parent will probably still be identified.

Also good for other kinds of errors – null alleles and mutations.

Input files

Genotype – genotypes of all individuals

Allele frequencies

Offspring relationships to known parents and candidate parents

Example: Noninvasive paternity assignment in Gombe chimps

Constable et al. (2001) Mol. Ecol.

39 female and male chimps genotyped at 16 loci using faecal and hair samples

Then determined paternity of 14 offspring

Mother known, but not the father

Using Cervus, 13 out of 14 could be assigned to a particular father with a confidence of 99%, one could be assigned with a confidence of 95%

Positive relationship between male rank and reproductive success

No evidence of extra-group paternity

Parentage testing: settle disputes over who is the father of a child & is thus responsible for child support

Immigration cases: establishing that individuals are the true children/parents/siblings in cases of family reunification

Other application: parentage testing

DNA Diagnostics, Auckland

Parentage testing

Paternity index

The index in this man’s analysis shows that the DNA evidence is 25 million times more likely that he is the biological father versus he is not (odds 25 million:1)

DNA Diagnostics, Auckland

http://gsoft.smu.edu/GSoft.htm

runs on Mac

http://www.it.jcu.edu.au/kingroup/JAVA, runs on PC+Macsame functionality as Kinship

Kinship 1.0

KinGroup 2.0

KF Goodnight, DC Queller (1999) Computer software for performing likelihood tests of pedigree relationship using genetic markers. Mol Ecol 8, 1231-1234

Use of Kinship

Uses likelihood methods to test hypotheses about kinship relationships, e.g. father-son (R=0.5) as opposed to unrelated (R=0)

Generates expected distributions of R values for given kin relationships given a specific data set. This yields confidence intervals for expected R values.

Can e.g. be used to group offspring in full-sib groups, i.e. sharing the same father, or allocate offspring to particular candidate parents

Example

Dierkes et al. (2005) Ecol. Lett.

Cooperatively breeding cichlid Neolamprologus pulcher

Young stay in the nest and help their parents rear more offspring

Relatedness 5.0 was used to estimate the relatedness betweenhelpers and breeders

KinGroup was used to group individuals into full-sib groups anddetermine the timing of breeder replacements

Ability to accurately distinguish between classes of relatives requires > 20 moderately variable loci

Need lots of loci

Except in special situations…

e.g. haplodiploidy: greatly simplifies parentage assignment, since father is haploid

Example: Wenseleers et al. 2005study of the red wasp Vespula rufa

who produces the colony’s males,the queen or the workers?

If queen is AB mated to a C male if queen produces the malesthen half will be A and half will be B, if the workers produce all themales then half will carry the paternal C allele

Workers, males and the mother queens genotyped at 4 lociResults: 33 out of 342 males carried the paternal allele,mean power to detect workers’ sons was 87% (33/342)/0.87=11% of the males were workers’ sons

Frequency of extra-pair fertilizations: 47% of all nests had 1+ chick from EPF

EPFs made up an average of 21% of the male’s repr. success

Example of parentage analysisusing DNA fingerprinting Gibbs et al. 1990

Red-winged Blackbird population (Agelaius phoeniceus) in eastern Ontario

DNA fingerprints ofRed-winged Blackbirdfamilies showing examples where resident male is excluded as the parent of chicks found in nests on his territory. Arrows indicate bands (alleles) that exclude the resident male

Other parentage analysis programs

Famoz: calculated likelihoods of particular relationship, can also use sex-linked loci and dominant markers

http://www.pierroton.inra.fr/genetics/labo/Software/Famoz/

Gerud: estimates minimum number of sires for a family given one known parent, reconstructs parental genotypes

http://www.biology.gatech.edu/professors/labsites/joneslab/parentage.html

DNA view: forensics, paternity testinghttp://dna-view.com/

Good reviews

Michael S. Blouin (2003) DNA-based methods for pedigree reconstruction and kinship analysis in natural populations. Trends in Ecology & Evolution 18: 503-511.

Adam G. Jones & William R. Ardren (2003) Methods of parentage analysis in natural populations. Molecular Ecology 12: 2511-2523.

Documents

Determining kinship relationships using genetic markers Tom Wenseleers Laboratorium voor Entomologie KULeuven [email protected] Lecture can