Upload
thanos
View
18
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Computational Molecular Biology. Group Testing – Pooling Designs. Group Testing (GT). Definition : Given n items with at most d positive ones Identify all positive ones by the minimum number of tests Each test is on a subset of items - PowerPoint PPT Presentation
Citation preview
Computational Molecular Biology
Group Testing – Pooling Designs
My T. [email protected]
2
Group Testing (GT)
Definition: Given n items with at most d positive ones
Identify all positive ones by the minimum number of tests
Each test is on a subset of items Positive test outcome: there exists a positive item
in the subset
My T. [email protected]
4
Example 1 – Sequential Method
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9
1 2 3 4 5
4 5 4 5
My T. [email protected]
5
Example 2 – Non-adaptive Method
P4 p5 p6
p1 1 2 3
p2 4 5 6
p3 7 8 9
Non-adaptive group testing is called pooling design in biology
My T. [email protected]
6
Sequential and Non-adaptive
Sequential GT needs less number of tests, but longer time.
Non-adaptive GT needs more tests, but shorter time.
In molecular biology, non-adaptive GT is usually taken. Why?
My T. [email protected]
7
Because…
The same library is screened with many different probes. It is expensive to prepare a pool for testing first time. Once a pool is prepared, it can be screened many times with different probes.
Screening one pool at a time is expensive. Screening pools in parallel with same probe is cheaper.
There are constrains on pool sizes. If a pool contains too many different clones, then positive pools can become too dilute and could be mislabeled as negative pools.
My T. [email protected]
8
Pooling Designs
Problem Definition Given a set of n clones with at most d positive
clones Identify all positive clones with the minimum
number of tests
Pool: a subset of clones Positive pool: a pool contains at least one positive
clone Clones = Items
My T. [email protected]
9
Relation to Pooling Designsclones
c1 c2 cj cn
p1 0 0 … 0 … 0 … 0 … 0 0 p2 0 1 … 0 … 0 … 0 … 0 1
pools . .. .
pi 0 0 … 0 … 1 … 0 … 0 1. .. .
pt 0 0 … 0 … 0 … 0 … 0 0 txn tx1
M[i, j] = 1 iff the ith pool contains the jth clone
Decoding Algorithm: Given M and V, identify all positive clones
Testing
V
Mtxn =
My T. [email protected]
10
Observationclones
c1 c2 c3 cj p1 1 1 1 0 0 0 0 0 0 p2 0 0 0 1 1 1 0 0 0 p3 0 0 0 0 0 0 1 1 1
pools 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 00 0 1 0 0 1 0 0 1
Observation: All columns are distinct.
To identify up to d positives, all unions of up to d columns should be distinct!
Union of d columns: Boolean sum of these d columns
My T. [email protected]
11
Challenges
Challenge 1: How to construct the binary matrix M such that: Outputs of any union of d columns are distinct
Challenge 2: How to design a decoding algorithm with efficient time complexity [O(tn)]
My T. [email protected]
12
d-separable Matrixclones
c1 c2 c3 cj cn
p1 0 0 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 p2 0 1 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0
p3 1 0 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 pools 0 0 1 … 0 … 0 … 0 … 0 … 0 … 0 … 0
.
. pi 0 0 0 … 0 … 0 … 1 … 0 … 0 … 0 … 0
.
. pt 0 0 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0
All unions of d columns are distinct.
My T. [email protected]
13
d-separable Matrixclones
c1 c2 c3 cj cn
p1 0 0 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 p2 0 1 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0
p3 1 0 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0 pools 0 0 1 … 0 … 0 … 0 … 0 … 0 … 0 … 0
.
. pi 0 0 0 … 0 … 0 … 1 … 0 … 0 … 0 … 0
.
. pt 0 0 0 … 0 … 0 … 0 … 0 … 0 … 0 … 0
All unions of up to d columns are distinct.Decoding: O(nd)
My T. [email protected]
14
d-disjunct Matrix
Definition: An binary matrix Mtxn is a d-disjunct matrix (d < t) if: The union of any d columns does not contain any
other column
Example:
1 0 0
0 1 0
0 0 1
A 2-disjunct matrix M =
My T. [email protected]
15
d-disjunct Matrix (cont)
d-disjunct matrix can efficiently identify up to d positive clones. Why?
Theorem 1: All unions of d distinct columns are distinct (thus d-disjunct implies d-separable)
Theorem 2: The number of clones not in negative pools is always at most d Corollary 1: The tests of negative outputs determine all
negative clones Decoding time complexity: O(tn)
My T. [email protected]
16
Proof of Theorem 2
Note that an item does not appearing in any negative pool iff its corresponding column is contained by the union of d positive columns
Therefore, the number of items not appearing in any negative pool is more than d iff there are at least a non-positive item whose column is contained by the d positive columns
But M is d-disjunct, hence Theorem 2 follows
My T. [email protected]
17
Decoding AlgorithmInput: d-disjunct matrix M and output vector V
Output: All positive clones
for each clone c in n clones
if c is in a negative pool
remove c
return remaining clones c1 c2 c3 c4 c5 c6
p1 1 1 1 0 0 0 1
P2 1 0 0 1 1 0 0
P3 0 1 0 1 0 1 0
P4 0 0 1 0 1 1 1
My T. [email protected]
18
Fields
Field: is any set of elements that satisfies the field axioms for both addition and multiplication and is a division algebra
Eg: Compex, Rational, Real
My T. [email protected]
20
Finite Fields
Finite Field: is a field with a finite field order, i.e., number of
elements. The order of a finite field is always a prime or a
prime power (power of a prime) Eg: 16 = 2^4 is a prime power where 6, 15 are not
Eg: in GF(5), 4+3=7 is reduced to 2 modulo 5
My T. [email protected]
21
Consider a finite field GF(q). Choose s, q, k satisfying:
Step 1: Construct matrix Asxn as follows:
for x from 0 to s -1
for each polynomials pj of degree k
A[x,pj] = pj(x) p1 p2 pj pn
0
1
A =
x p2(x) pj(x)
s-1
How to construct a d-disjunct matrix
kqnqskd and
My T. [email protected]
22
Step 2: Construct matrix Btxn from Asxn as follows:for x from 0 to s -1 for y from 0 to q -1
for each polynomials pj of degree k if A[x,pj] = = y
B[(x,y),pj] = 1 else B[(x,y),pj] = 0
p1 p2 pj pn
0
1
A =
x p2(x) pj(x)
s-1
Algorithm (cont)
p1 p2 pj pn
(0,0)
(0,1)
B =
(x,y)
(s-1,q-1)
0
p2(x) ≠ y
pj(x) = y
1
My T. [email protected]
23
Algorithm Analysis
Theorem 3: (Correctness) If kd ≤ s ≤ q, then Btxn is d-disjunct.
Theorem 4: The number of tests t obtained from this algorithm is t = qs = O(q2) where:
)log(log
log))1(2(
22
2
nd
ndoq
My T. [email protected]
24
Errors in Experiments
False negative: Pool contains some positive clones But return the negative outcome
False positive: Pool contains all negative clones But return the positive outcome
My T. [email protected]
25
An e-Error Correcting Model
Definition: Assume that there is at most e errors in testing All positive clones can still be identified
Hamming distance: the Hamming distance of two column vectors is the number of different components between them
e-error-correcting: A matrix is said to be e-error-correcting if the Hamming distance of any two unions of d columns is at least 2e + 1
My T. [email protected]
26
(d,e)-disjunct Matrix
Definition: An t × n binary matrix M is (d, e)-disjunct if for any one column j and any other d columns j1, j2, . . . , jd, there exist e + 1 rows i0, i2, … , ie such that Miuj = 1 and Miujv = 0 for u = 0, 1,…, e and v = 1, 2, . . . , d
My T. [email protected]
27
E-error Correcting
Theorem 5: For every (d,k)-disjunct matrix, the Hamming distance between any two unions of d columns is at least 2k + 2
My T. [email protected]
28
Theorem 6
Theorem 6: Suppose testing is based on a (d,e)-disjunct matrix. If the number of errors is at most e, then the number of negative pools containing a positive item is always smaller than the number of negative pools containing a negative item
My T. [email protected]
29
Proof of Theorem 6
Let i be a positive item, j be a negative item. Suppose #negative pools containing i = m. Then m pools must receive errors. Hence, there are at most e – m error tests turning negative outcome to positive outcome. Moreover, if no error exists, # negative pools containing j is at least e + 1 due to (d,e)-disjunct. Hence #negative pools containing j is at least (e+1)-(e-m) = m +1>m
My T. [email protected]
30
Decoding in e-error-correcting
Corollary: From Theorem 6, we see that to decode positives from testing based on (d,e)-disjuct matrix, we only need to compute the number of negative pools containing each item and select d smallest one. This runs in time O(nt)
My T. [email protected]
31
Decoding Algorithm with e Errors
T = empty set
for each clone ci (i = 1…n)
t(ci) = # negative pools containing ci
T = T t(ci)
end for
Let Td = set of d smallest t(ci) in T
return ci if t(ci) in Td
Time complexity: O(tn)