Upload
brandon-jordan
View
227
Download
0
Tags:
Embed Size (px)
Citation preview
Determining kinship relationships using genetic markers
Tom WenseleersLaboratorium voor Entomologie
Lecture can be downloaded from bio.kuleuven.be/ento/wenseleers/twpub.htm#courses
GA75 'Moderne Onderzoeksmethoden in de Biologie‘, maart 2006
Use of genetic markers to test hypotheses regarding the kinship relationships among specific individuals
Kinship analysis: estimating the genetic relatedness among individuals
Parentage analysis: identification of parents of specific offspring
Parentage and kinship analysis
ApplicationsEVOLUTIONARY STUDIES
what is the relatedness between pairs of individuals? test role of kin selection in the evolution of cooperative behaviour
how common are extra-pair fertilizations?
how common is intraspecific broodparasitism (egg dumping)?
BREEDING PROGRAMMES
design captive breeding programs for endangered species
assign F1 generation fish to a particular fatherin a mass-spawning experiment
HUMAN APPLICATIONS
parentage testing, forensic work at crime scenesvictim identification in mass disasters
What is Relatedness?
“The probability that a gene in one individual is an identical copy, by
descent, of a gene in another individual”
Relatedness is a measure not of the absolute genetic similarity between two individuals, BUT of the degree to which this similarity exceeds the background similarity between individuals randomly drawn from the population.
Marker considerations
Which markers can be used?
best: nuclear, codominant markers
(allozymes) – low variability, not ideal
microsatellites – tandem repeats of 2-5 bp motifs best choice
(high resolution, little DNA required)
also: dominant markers, but analysis methods not as powerful
minisatellites
AFLP
RAPD
AA AB BB
AA AB BB
Non-repetitiveflanking sequence
Non-repetitiveflanking sequence
Microsatellite array(~20-100bp)
Length depends onrepeat unit size andnumber of repeats
CA CA CA CA CA CA CA CA CA CA CACA CA
Sequences of DNA consisting of repeats of 2-5 base pair motifs almost any combination possible (e.g. CA, GA, GGGAA )discovered in 1980s, see e.g. Tautz, Trick & Dover (1986)
Microsatellites = short tandem repeats
Microsatellites
present in genome (nucleus + mitochondria + chloroplasts) of all
eukaryotes
easily amplified using PCR and separated on DNA sequencer
even highly degraded DNA can be used, e.g. single hair, faeces
highly variable, usually between 5 and 20 alleles
di-repeats most common; tri-, tetra- and penta-repeats rarer
human genome: 35,000 (CA)n repeats
wasps: 1% of 500bp fragments contain tri-repeat sats
most common in mammals and insects, in birds 10 x rarer
Microsatellitesprimers to amplify sats:
- already developed for many organisms- can sometimes be developed by searching Genbank for sat motifs- cross-amplify using loci developed for other species usually works within same genus and sometimes same family
if no loci are available isolate and sequence new onessteps: isolate total DNA restriction digest ligate small fragments into plasmid or phage vector transform E. coli cells plate out colonies lift colonies onto filters hybridize with probe containing sat repeats pick & sequence positive clones design primers
if sats are rare and you need many loci: enrichment step
alternatively: have them developed commercially, e.g. Amplicon, ca. 10.000 € / 10 loci
PCR conditions: universal touchdown program usually works for all loci
good isolation protocol: http://www.uga.edu/srel/DNA_Lab/protocols.htm
[ MENotes DB search ] [ MENotes DB home ] [ MENotes home ] ------------------------------------------------------------------------
This is the sister database for Molecular Ecology Notes, containing the details for reported loci (i.e., primer sequences, amplification conditions, polymorphism levels, cross-species amplification, and literature citations) in a searchable format. The database contains all Primer Note submissions to Molecular Ecology, as well as primer submissions to Molecular Ecology Notes. In the future, relevant submissions from other journals will be included, as it is hoped that this database will become the on-line resource for molecular markers developed for "non-commercial" and non-model species. The database may be searched using either the easy search page (searches based on family, genus or species names) or, in the future, an advanced search page, which will allow more flexible and detailed queries as the database grows. Authors whose data have been accepted for publication in the database should familiarize themselves with the database submission instructions and then use the database submission form to add their data. For your convenience, links to the search page, this page, and the Molecular Ecology Notes homepage at the Blackwell site are positioned at the top and bottom of each page. If you have any questions or comments about this site, please email the database/website administrator.
------------------------------------------------------------------------
[ MENotes DB search ] [ MENotes DB home ] [ MENotes home ]
Please direct comments and questions about this database to the administrator.
http://snook.bio.indiana.edu/MENotes/home.html
For most purposes a small bit of tissue can be boiled for 10 mins in 10% chelex resin, and this works fine as a template
cheap & easy
1. DNA extraction
genomic DNA+
primers+
Taq DNA polymerase+
dNTPs (ACGT)+
buffer
2. PCR amplification
Process repeated 30-40 times
Polymerase Chain Reaction
after 36 cycles: 236=68 billion copies
3. Detection:Radioactive (P33) end-labelling
3. Detection:Fluorescent labelling – gel based sequencer
3. Detection:Fluorescent labelling – capillary sequencer
4 or 5 labels + 1 labelfor internal sizestandard
allows running up to 20 loci simultaneously
DNA minisatellites (“fingerprints”)
- tandem repeats of core sequences 15-30 bp in length (variable number tandem repeats)- most minisats occur 10 or 20 times in the whole genome- human genome: ca. 50,000 VNTRs- detected using Southern blotting after restriction digest- disadvantages: DNA quality and amount needed; scoring problems
DNA fingerprints can identify individuals and determine parentage
E.g., DNA fingerprints confirmed Dolly the sheep was cloned from an adult udder cell
Donor udder (U), cell culture from udder (C), Dolly’s blood cell DNA (D), and control sheep 1-12
AFLP Amplified Fragment Length Polymorphism
Mutations at restriction enzyme cutting sites result in fragment length polymorphism
Ligation of adapters to genomic restriction fragments
Selective PCR amplification with adapter-specific primers
Advantage
low development cost
Disadvantages
dominant marker
scoring
RAPDRandom Amplified Fragment Polymorphism
Arbitrary primers8-10 bp long
V. little development
PCR amplification at low stringency (Ta 35-45oC)
VariabilityPoint mutations
Insertions / deletions
Disadvantages: dominant marker
repeatability, scoring
The bottom line
Microsatellite markers are the best !
Kinship analysis
Measure of Relatedness: Queller and Goodnight estimator
RPy P
* l
k
x
Px P
* l
k
x
R = relatedness between individuals x and y where Px = frequency within individual x of allele l at locus k (must be 0, 0.5 or 1.0 in diploid organisms)Py = frequency of same allele in individuals to which x is comparedP* = frequency of the allele in population at large (background
allele frequency)
(Queller and Goodnight 1989)
Other estimators
Queller & Goodnight estimator works with codominant markersassumes loci are unlinked
Also other estimators, e.g. Ritland (1996), Lynch & Ritland (1999)for codominant markers
Reeve et al. (1992), Lynch & Milligan (1994), Wang (2004)for dominant markers
Different pros & cons in terms of how efficient & biased they aresee Van de Casteele et al. (2001), Wang (2004)
Programs:
Relatedness calculates average genetic relatedness among sets of individuals defined by demographic variables, either on average or by pairs. It finds standard errors and confidence intervals for signifiance testing using a jackknife resampling method.
Features in Relatedness 5.0:
*Data sets with up to 127 loci of 127 alleles each, and number of individuals limited only by computer memory.
*Up to 32 demographic variables with complete control over the order in which they are checked.
*Pairwise values of relatedness.
*95% confidence intervals as well as standard errors.
The distribution package includes the program, a manual in Microsoft Word 6.0 format, and a sample data set. (A copy of the manual in Word 5.1 format is available on request.)
All Goodnight Software programs are for Macintosh PPC computers only.
All Goodnight Software programs are for Macintosh PPC computers only.
RELATEDNESS 5.0http://www.gsoftnet.us/GSoft.html
Input data file
(1) Allele frequency block – you can also have the program calculate this
*Relatedness data file, population: Sample Data*Saved 1/14/1998 10:12:28*config Guide F-Delim Deme-Col ID-Col Grp 1-Col Grp 2-Col Demog-Col*#config: T/ F1 T1 T2 F3 2@3*Allele frequencieslocus1 Freq locus2 Freq locus3 Freq locus4d 0.234 e 0.155 b 0.266 bb 0.191 c 0.241 c 0.321 ee 0.194 b 0.161 a 0.215 cc 0.254 d 0.309 e 0.143 aa 0.128 a 0.134 d 0.055 dend
*Relatedness data file, population: Sample Data*Saved 1/14/1998 10:12:28*config Guide F-Delim Deme-Col ID-Col Grp 1-Col Grp 2-Col Demog-Col*#config: T/ F1 T1 T2 F3 2@3*Allele frequencieslocus1 Freq locus2 Freq locus3 Freq locus4d 0.234 e 0.155 b 0.266 bb 0.191 c 0.241 c 0.321 ee 0.194 b 0.161 a 0.215 cc 0.254 d 0.309 e 0.143 aa 0.128 a 0.134 d 0.055 dend
Input data file
(2) Genotypes – you can also add demographic variables, e.g. nest
Ind ID K-nest color sibship locus1 locus2 locus31—1 1 red f1 d/d e/c b/c1—2 1 red f1 d/b c/e b/c1—3 1 red f1 b/d c/e c/a1—4 1 red f1 d/d c/e c/b1—5 1 red f1 b/d c/e a/c1—6 1 red f2 d/d b/c e/c1—7 1 red f2 d/d c/b c/e1—8 1 red f2 d/d c/b c/e1—9 1 red f2 d/d b/c c/a
Ind ID K-nest color sibship locus1 locus2 locus31—1 1 red f1 d/d e/c b/c1—2 1 red f1 d/b c/e b/c1—3 1 red f1 b/d c/e c/a1—4 1 red f1 d/d c/e c/b1—5 1 red f1 b/d c/e a/c1—6 1 red f2 d/d b/c e/c1—7 1 red f2 d/d c/b c/e1—8 1 red f2 d/d c/b c/e1—9 1 red f2 d/d b/c c/a
Analysis
1. Define Px and Pyi.e. define the sets of individuals you like to calculate the relatedness between
e.g. Px: all individualsPy: nest=X relatedness is calculated between
individuals of the same nest
Px: sex=femalePy: nest=X AND sex=female
relatedness is calculated between females of the same nest
2. Define whether to calculate pairwise and/or average relatedness
3. And how to calculate standard errors (by jacknifing over loci or over nests)
Results - exampleWhole population relatedness results:R: 0.5142 Nx: 341 Ny: 341Jackknife: By locus: By colony:Std. Err.: 0.0163 0.039495% Conf.: 0.0520 0.0839Pseud.: 4 16
Relatedness by colonyValue: R: Nx,Ny: J/loci: C.I.:MA1 0.7835 21,21 0.0808 0.2572MA2 0.4106 21,21 0.0385 0.1225MA3 0.3652 20,20 0.0907 0.2885R1 0.3907 26,26 0.0602 0.1916R3 0.5681 15,15 0.1031 0.3281R5 0.3994 25,25 0.0707 0.2250R6 0.3600 20,20 0.0291 0.0927R7m1 0.6503 4,4 0.0372 0.1185R7m2 0.4792 12,12 0.0612 0.1948T1 0.2852 24,24 0.0621 0.1976T2 0.4596 30,30 0.0841 0.2676T3 0.6132 25,25 0.0259 0.0825T4 0.5969 33,33 0.0828 0.2634T5 0.4373 18,18 0.0405 0.1287T7 0.6306 31,31 0.0460 0.1465T8 0.7126 16,16 0.1060 0.3372
Red wasp Vespula rufa
Average relatedness amongworkers from the same nest= 0.51
Less than the value expectedif they were full-sisters (0.75)
Implies mother queen mates with an average of (1/2.(0.51-0.25))=1.9 males
Wenseleers et al. Evolution 2005
DNA fingerprinting exampleMueller et al. (1994) PNASWhat is the average relatedness among females in nests of the Halictid bee Augochlorella striata ?
Used DNA fingerprinting – multilocus, dominant marker
Reeve et al. (1992): relatedness among individuals within a nest can be estimated as
R = (w-b) / (1-b)
where w = proportion of bands shared between individuals of same nestb = proportion of bands shared between individuals of different
nests
band sharing = 2Nab/(Na+Nb), where Nab is the total number of bands shared by individuals a and b and Na and Nb are the total number of bands present in a and b
Results: R=0.78, not significantly different from full-sister relationship (0.75)
Interesting application:estimate heritabilities in natural
populations
Thomas et al. (2000) Heredity
The heritability of a trait is usually determined using breeding experiments
But it can also be estimated in natural populations as the regression of the pairwise estimate of phenotypic similarity against r
Relatednessestimators
Platform pros cons
Relatedness 5.0 Q&G, pairwise + group-average
Mac + user interfacehaploid+diploid
Mac only
http://www.gsoftnet.us/GSoft.html
Identix Q&G, L&R, Idpairwise
PC 3 different estimators
flexibility
http://www.univ-lille1.fr/gepv/english/perso_pages_en/PagepersoVincent_c.htm
Spagedi 6 estimators, one for dominant markerspairwise
PC use of spatial info, can use dominant markers (AFLP/RAPD/minisat.)
flexibility
http://www.ulb.ac.be/sciences/ecoevol/spagedi.html
Delrious L&R,pairwise
PC Mathematica
- user inferface
http://www.zoo.utoronto.ca/stone/DELRIOUS/delrious.htm
Other kinship analysis programs
Parentage analysis
Parentage Analysis: Exclusion
Female Offspring Male1 Male 2 Male 3
Unsampled male with paternal allele
Is not the offspring ofMale 2
Question: which male is the father of a particular offspring?
Parentage Analysis: Exclusion Based on Compatibility of Genotypes Between Males and Females
Female1 Offspring Male1 Male 2Female 2
With no a priori knowledge F1/M1 or F2/M2 are equally likely sets of parents
Question: what are the parents of a particular individual?
Version 2.0© Copyright Tristan Marshall 1998-2001
------------------------------------------------------------------------
About CERVUS
CERVUS is a Windows 95-based program designed for large-scale parentage analysis using co-dominant loci. Analysis is broken down into three sequential stages. Using genotype data in text file format, the program can analyse allele frequencies, run appropriate simulations and carry out likelihood-based parentage analysis, testing the confidence of each parentage using the results of the simulation. Simulations may also be used to estimate the power of a series of loci for parentage analysis, using real or imaginary allele frequencies.
http://helios.bto.ed.ac.uk/evolgen/cervus/cervus.html
References
Marshall, TC, Slate, J, Kruuk, LEB & Pemberton, JM (1998) Statistical confidence for likelihood-based paternity inference in natural populations. Molecular Ecology 7(5): 639-655.
Slate J, Marshall TC & Pemberton JM (2000) A retrospective assessment of the accuracy of the paternity inference program CERVUS. Molecular Ecology 9(6): 801-808.
http://helios.bto.ed.ac.uk/evolgen/cervus/cervus.html
Use of Cervus
Uses likelihood methods to find the most likely parents. Useful when more than one possible parent remains non-excluded.
Cervus calculates the likelihood ratio, or Paternity Index (the likelihood that the candidate parent is the true parent divided by the likelihood that the candidate parent is not the true parent), and LOD scores (the log base e of the product of the likelihood ratios at each locus).
Delta (difference in LOD scores between the most likely parent and the second most likely parent) assesses the reliability of the assignment.
LOD score of 0 means that the candidate parent is equally likely as a random individual.
The most likely parent is the one with the most positive LOD score.
Statistical Power of loci for parentage analyses
Assessed via Probability of Exclusion: P(E) - probability of excluding a male who is not the genetic father of a given offspring
Calculated for each locus and then values pooled across all loci
Pi(E): for a given paternal allele probability that another male has that allele
Examples of Values for PE
Eight microsatellite loci cloned from Northern Watersnakes (Nerodia sipedon) (Prosser
et al. 1999)
Locus P(E)
Nsµ2 0.65
Nsµ3 0.82
Nsµ4 0.64
Nsµ6 0.55
Nsµ9 0.79
Nsµ10 0.66
Nsµ110 0.78
Nsµ119 0.86
Overall > 0.999
Individual PiE [C] values:0.99 - 0.9999999 ; mean = 0.999
Typing Errors
Perfect data is usually not the reality.A mismatch due to a typing error will exclude a
true parent in a simple exclusion analysis.In a likelihood analysis a single mismatch does
not exclude a parent, it simply decreases the likelihood, but a true parent will probably still be identified.
Also good for other kinds of errors – null alleles and mutations.
Input files
Genotype – genotypes of all individuals
Allele frequencies
Offspring relationships to known parents and candidate parents
Example: Noninvasive paternity assignment in Gombe chimps
Constable et al. (2001) Mol. Ecol.
39 female and male chimps genotyped at 16 loci using faecal and hair samples
Then determined paternity of 14 offspring
Mother known, but not the father
Using Cervus, 13 out of 14 could be assigned to a particular father with a confidence of 99%, one could be assigned with a confidence of 95%
Positive relationship between male rank and reproductive success
No evidence of extra-group paternity
Parentage testing: settle disputes over who is the father of a child & is thus responsible for child support
Immigration cases: establishing that individuals are the true children/parents/siblings in cases of family reunification
Other application: parentage testing
DNA Diagnostics, Auckland
Parentage testing
Paternity index
The index in this man’s analysis shows that the DNA evidence is 25 million times more likely that he is the biological father versus he is not (odds 25 million:1)
DNA Diagnostics, Auckland
http://gsoft.smu.edu/GSoft.htm
runs on Mac
http://www.it.jcu.edu.au/kingroup/JAVA, runs on PC+Macsame functionality as Kinship
Kinship 1.0
KinGroup 2.0
KF Goodnight, DC Queller (1999) Computer software for performing likelihood tests of pedigree relationship using genetic markers. Mol Ecol 8, 1231-1234
Use of Kinship
Uses likelihood methods to test hypotheses about kinship relationships, e.g. father-son (R=0.5) as opposed to unrelated (R=0)
Generates expected distributions of R values for given kin relationships given a specific data set. This yields confidence intervals for expected R values.
Can e.g. be used to group offspring in full-sib groups, i.e. sharing the same father, or allocate offspring to particular candidate parents
Example
Dierkes et al. (2005) Ecol. Lett.
Cooperatively breeding cichlid Neolamprologus pulcher
Young stay in the nest and help their parents rear more offspring
Relatedness 5.0 was used to estimate the relatedness betweenhelpers and breeders
KinGroup was used to group individuals into full-sib groups anddetermine the timing of breeder replacements
Ability to accurately distinguish between classes of relatives requires > 20 moderately variable loci
Need lots of loci
Except in special situations…
e.g. haplodiploidy: greatly simplifies parentage assignment, since father is haploid
Example: Wenseleers et al. 2005study of the red wasp Vespula rufa
who produces the colony’s males,the queen or the workers?
If queen is AB mated to a C male if queen produces the malesthen half will be A and half will be B, if the workers produce all themales then half will carry the paternal C allele
Workers, males and the mother queens genotyped at 4 lociResults: 33 out of 342 males carried the paternal allele,mean power to detect workers’ sons was 87% (33/342)/0.87=11% of the males were workers’ sons
Frequency of extra-pair fertilizations: 47% of all nests had 1+ chick from EPF
EPFs made up an average of 21% of the male’s repr. success
Example of parentage analysisusing DNA fingerprinting Gibbs et al. 1990
Red-winged Blackbird population (Agelaius phoeniceus) in eastern Ontario
DNA fingerprints ofRed-winged Blackbirdfamilies showing examples where resident male is excluded as the parent of chicks found in nests on his territory. Arrows indicate bands (alleles) that exclude the resident male
Other parentage analysis programs
Famoz: calculated likelihoods of particular relationship, can also use sex-linked loci and dominant markers
http://www.pierroton.inra.fr/genetics/labo/Software/Famoz/
Gerud: estimates minimum number of sires for a family given one known parent, reconstructs parental genotypes
http://www.biology.gatech.edu/professors/labsites/joneslab/parentage.html
DNA view: forensics, paternity testinghttp://dna-view.com/
Good reviews
Michael S. Blouin (2003) DNA-based methods for pedigree reconstruction and kinship analysis in natural populations. Trends in Ecology & Evolution 18: 503-511.
Adam G. Jones & William R. Ardren (2003) Methods of parentage analysis in natural populations. Molecular Ecology 12: 2511-2523.