View
228
Download
0
Category
Preview:
Citation preview
8/2/2019 Genomic signatures for metagenomic data analysis
1/62
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
G e n o m i c S i g n a t u r e s f o r M e t a g e n o m i c D a t a
A n a l y s i s : E x p l o i t i n g t h e R e v e r s e C o m p l e m e n t a r i t y
o f T e t r a n u c l e o t i d e s
F a b i o G o r i
1
D i m i t r i o s M a v r o e d i s
1
M i k e S . M . J e t t e n
2
E l e n a M a r c h i o r i
1
1
R a d b o u d U n i v e r s i t y N i j m e g e n , I n s t i t u t e f o r C o m p u t i n g a n d I n f o r m a t i o n S c i e n c e s ,
T h e N e t h e r l a n d s
2
R a d b o u d U n i v e r s i t y N i j m e g e n , D e p a r t m e n t o f M i c r o b i o l o g y , T h e N e t h e r l a n d s
H o n g K o n g U n i v e r s i t y , 1 2 S e p t e m b e r 2 0 1 1
g o r i @ s c i e n c e . r u . n l
8/2/2019 Genomic signatures for metagenomic data analysis
2/62
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
T a b l e o f C o n t e n t s
M e t a g e n o m i c s a n d B i n n i n g
G e n o m i c S i g n a t u r e s f o r B i n n i n g
E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
3/62
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
T a b l e o f C o n t e n t s
M e t a g e n o m i c s a n d B i n n i n g
G e n o m i c S i g n a t u r e s f o r B i n n i n g
E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
4/62
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
W h a t i s M e t a g e n o m i c s ?
M e t a g e n o m i c s :
s t u d y o f m i c r o b i a l
c o m m u n i t i e s a n a l y s i n g
t h e i r g e n e t i c m a t e r i a l
W h y ?
9 9 % m i c r o b e s
c a n n o t b e s t u d i e d i n
l a b o r a t o r i e s
U n d e r s t a n d o r g a n i s m s
i n t e r a c t i o n s
8/2/2019 Genomic signatures for metagenomic data analysis
5/62
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
W h a t i s M e t a g e n o m i c s ?
M e t a g e n o m i c s :
s t u d y o f m i c r o b i a l
c o m m u n i t i e s a n a l y s i n g
t h e i r g e n e t i c m a t e r i a l
W h y ?
9 9 % m i c r o b e s
c a n n o t b e s t u d i e d i n
l a b o r a t o r i e s
U n d e r s t a n d o r g a n i s m s
i n t e r a c t i o n s
8/2/2019 Genomic signatures for metagenomic data analysis
6/62
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
W h a t i s M e t a g e n o m i c s ?
M e t a g e n o m i c s :
s t u d y o f m i c r o b i a l
c o m m u n i t i e s a n a l y s i n g
t h e i r g e n e t i c m a t e r i a l
W h y ?
9 9 % m i c r o b e s
c a n n o t b e s t u d i e d i n
l a b o r a t o r i e s
U n d e r s t a n d o r g a n i s m s
i n t e r a c t i o n s
8/2/2019 Genomic signatures for metagenomic data analysis
7/62
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
H o w ? D N A S e q u e n c i n g T e c h n o l o g y
E n v i r o n m e n t a l
S a m p l e
D N A s
S m a l l - I n s e r t L i b r a r y C l o n i n g
= T A C C A C A G A T A T C A G . . .
A m e t a g e n o m i c d a t a s e t i s m a d e b y t h e s e D N A s e q u e n c e s
8/2/2019 Genomic signatures for metagenomic data analysis
8/62
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
W h a t k i n d o f d a t a ? A m e t a . . . j i g s a w - p u z z l e
F r a g m e n t s o f D N A s
P i e c e s a r e s i m i l a r
O r i g i n a l p i c t u r e s a r e
u n k n o w n
M i s s i n g P i e c e s
8/2/2019 Genomic signatures for metagenomic data analysis
9/62
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
A i n t e r e s t i n g p r o b l e m : M e t a g e n o m i c B i n n i n g
C l u s t e r i n g t o g e t h e r s e q u e n c e s s a m p l e d f r o m t h e s a m e g e n o m e
8/2/2019 Genomic signatures for metagenomic data analysis
10/62
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
11/62
M e t a g e n o m i c b i n n i n g
C l u s t e r i n g t o g e t h e r s e q u e n c e s s a m p l e d f r o m t h e s a m e g e n o m e
( u n s u p e r v i s e d a p p r o a c h )
{A , C , G , T }
Rn
C l u s t e r i n g
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
12/62
M e t a g e n o m i c b i n n i n g
C l u s t e r i n g t o g e t h e r s e q u e n c e s s a m p l e d f r o m t h e s a m e g e n o m e
( u n s u p e r v i s e d a p p r o a c h )
{A , C , G , T }
Rn
C l u s t e r i n g
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
13/62
M e t a g e n o m i c b i n n i n g
C l u s t e r i n g t o g e t h e r s e q u e n c e s s a m p l e d f r o m t h e s a m e g e n o m e
( u n s u p e r v i s e d a p p r o a c h )
{A , C , G , T }
Rn
C l u s t e r i n g
I n t h i s s t u d y : f o c u s o n
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
14/62
T a b l e o f C o n t e n t s
M e t a g e n o m i c s a n d B i n n i n g
G e n o m i c S i g n a t u r e s f o r B i n n i n g
E x p e r i m e n t s
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
15/62
W h a t s h o u l d d o
zs r
Rn
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
16/62
W h a t s h o u l d d o
zs r
Rn
(s)(z)
(r)
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
17/62
W h a t s h o u l d d o
zs r
n e e d s t o b e a
g e n o m i c s i g n a t u r e :
[ K a r l i n
e t a l .
, T r e n d s i n G e n e t i c s , 1 9 9 5 ]
(s ) (z )
(s ) = ( r )
Rn
(s)(z)
(r)
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
18/62
T y p i c a l ' s u s e d i n b i n n i n g
T
(s
):=
f r e q u e n c i e s o f t h e 4
k
s e q u e n c e s o f l e n g t h k
( k - m e r s ) .
U s u a l l y k=
4= 4 k = 2 5 6 f e a t u r e s : T ( s ) N2 5 6
[ M o h a m m e d
e t a l .
, B i o i n f o r m a t i c s , 2 0 1 1 ] , [ D i a z
e t a l .
, B M C B i o i n f o r m a t i c s , 2 0 0 9 ]
[ C h a n
e t a l .
, J . B i o m e d . B i o t e c h . , 2 0 0 8 ] , [ T e e l i n g
e t a l .
, E n v i r o n . M i c r o b . , 2 0 0 4 ]
E x a m p l e :
s =A G C A T G C A G C A T A T G T G G A G C A
T (
s) =( . . .
)
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
19/62
T y p i c a l ' s u s e d i n b i n n i n g
T
(s
):=
f r e q u e n c i e s o f t h e 4
k
s e q u e n c e s o f l e n g t h k
( k - m e r s ) .
U s u a l l y k=
4= 4 k = 2 5 6 f e a t u r e s : T ( s ) N2 5 6
[ M o h a m m e d
e t a l .
, B i o i n f o r m a t i c s , 2 0 1 1 ] , [ D i a z
e t a l .
, B M C B i o i n f o r m a t i c s , 2 0 0 9 ]
[ C h a n
e t a l .
, J . B i o m e d . B i o t e c h . , 2 0 0 8 ] , [ T e e l i n g
e t a l .
, E n v i r o n . M i c r o b . , 2 0 0 4 ]
E x a m p l e :
s =A G C A T G C A G C A T A T G T G G A G C A
T (
s) =(#
A A A A =
0, . . .
)
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
20/62
T y p i c a l ' s u s e d i n b i n n i n g
T (s) := f r e q u e n c i e s o f t h e 4 k s e q u e n c e s o f l e n g t h k
( k - m e r s ) .
U s u a l l y k=
4= 4 k = 2 5 6 f e a t u r e s : T ( s ) N2 5 6
[ M o h a m m e d
e t a l .
, B i o i n f o r m a t i c s , 2 0 1 1 ] , [ D i a z
e t a l .
, B M C B i o i n f o r m a t i c s , 2 0 0 9 ]
[ C h a n
e t a l .
, J . B i o m e d . B i o t e c h . , 2 0 0 8 ] , [ T e e l i n g
e t a l .
, E n v i r o n . M i c r o b . , 2 0 0 4 ]
E x a m p l e :
s =A G C A T G C A G C A T A T G T G G A G C A
T (
s) =(#
A A A A =
0, . . . , #
A G C A =
3, . . .
)
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
21/62
T y p i c a l ' s u s e d i n b i n n i n g
T (s) := f r e q u e n c i e s o f t h e 4 k s e q u e n c e s o f l e n g t h k
( k - m e r s ) .
U s u a l l y k=
4= 4 k = 2 5 6 f e a t u r e s : T ( s ) N2 5 6
[ M o h a m m e d
e t a l .
, B i o i n f o r m a t i c s , 2 0 1 1 ] , [ D i a z
e t a l .
, B M C B i o i n f o r m a t i c s , 2 0 0 9 ]
[ C h a n
e t a l .
, J . B i o m e d . B i o t e c h . , 2 0 0 8 ] , [ T e e l i n g
e t a l .
, E n v i r o n . M i c r o b . , 2 0 0 4 ]
E x a m p l e :
s =A G C A T G C A G C A T A T G T G G A G C A
T (
s) =(#
A A A A =
0, . . . , #
A G C A =
3, . . . , #
A T A T =
1, . . .
)
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
22/62
T y p i c a l ' s u s e d i n b i n n i n g
T (s) := f r e q u e n c i e s o f t h e 4 k s e q u e n c e s o f l e n g t h k
( k - m e r s ) .
U s u a l l y k=
4= 4 k = 2 5 6 f e a t u r e s : T ( s ) N2 5 6
[ M o h a m m e d
e t a l .
, B i o i n f o r m a t i c s , 2 0 1 1 ] , [ D i a z
e t a l .
, B M C B i o i n f o r m a t i c s , 2 0 0 9 ]
[ C h a n
e t a l .
, J . B i o m e d . B i o t e c h . , 2 0 0 8 ] , [ T e e l i n g
e t a l .
, E n v i r o n . M i c r o b . , 2 0 0 4 ]
E x a m p l e :
s =A G C A T G C A G C A T A T G T G G A G C A
T (
s) =(#
A A A A =
0, . . . , #
A G C A =
3, . . . , #
A T A T =
1, . . .
. . . , #G C A T = 2 , . . . )
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
23/62
M e t a c l u s t e r a n d s i g n a t u r e R a n k [ B . Y a n g
e t a l .
, A C M - B C B , 2 0 1 0 ]
S p e a r m a n F o o r t u l e d i s t a n c e
b e t w e e n s a n d z
M a n h a t t a n d i s t a n c e
b e t w e e n R a n k ( s ) a n d R a n k
(z )
S y m m e t r i z e d R a n k S i g n a t u r e R a n k : S S1 3 6
R a n k (s
) :=r a n k i n g i n d u c e d b y s o r t i n g t h e e l e m e n t s o f
S (s
).
F o r i n s t a n c e , i f S
( s ) = ( 7 , 0 , 3 ) t h e n R a n k
( s ) = (1 , 3 , 2 ) .
S y m m e t r i z e d S i g n a t u r e S : S N1 3 6
Si
( s ) = #wi
+ # w Ci
, i = 1 , . . . , 1 3 6
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
24/62
W h a t i s m i s s i n g ?
' s u s e d i n b i n n i n g :
N o t d e s i g n e d a s s i g n a t u r e s
f o r m e t a g e n o m i c d a t a
N o t h o r o u g h c o m p a r a t i v e a n a l y s i s
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
25/62
W h a t i s m i s s i n g ?
' s u s e d i n b i n n i n g :
N o t d e s i g n e d a s s i g n a t u r e s
f o r m e t a g e n o m i c d a t a
N o t h o r o u g h c o m p a r a t i v e a n a l y s i s
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
26/62
I n t h i s s t u d y
1 I n t r o d u c e n e w g e n o m i c s i g n a t u r e s
f o r b i n n i n g
2 T e s t & C o m p a r e p e r f o r m a n c e s o f
n e w a n d k n o w n s i g n a t u r e s
3 . . . a n d s i g n a t u r e c o m b i n a t i o n s ( e x t r a )
4 R e l a t i o n t a x o n o m i c d i v e r g e n c e & s i g n a t u r e d i s s i m i l a r i t y
( e x t r a )
T E S T : i s a s i g n a t u r e o n m e t a g e n o m i c s d a t a :
( s ) ( z )
( s ) = ( r )
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
27/62
I n t h i s s t u d y
1 I n t r o d u c e n e w g e n o m i c s i g n a t u r e s
f o r b i n n i n g
2 T e s t & C o m p a r e p e r f o r m a n c e s o f
n e w a n d k n o w n s i g n a t u r e s
3 . . . a n d s i g n a t u r e c o m b i n a t i o n s ( e x t r a )
4 R e l a t i o n t a x o n o m i c d i v e r g e n c e & s i g n a t u r e d i s s i m i l a r i t y
( e x t r a )
T E S T : i s a s i g n a t u r e o n m e t a g e n o m i c s d a t a :
( s ) ( z )
( s ) = ( r )
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
28/62
I n t h i s s t u d y
1 I n t r o d u c e n e w g e n o m i c s i g n a t u r e s
f o r b i n n i n g
2 T e s t & C o m p a r e p e r f o r m a n c e s o f
n e w a n d k n o w n s i g n a t u r e s
3 . . . a n d s i g n a t u r e c o m b i n a t i o n s ( e x t r a )
4 R e l a t i o n t a x o n o m i c d i v e r g e n c e & s i g n a t u r e d i s s i m i l a r i t y
( e x t r a )
T E S T : i s a s i g n a t u r e o n m e t a g e n o m i c s d a t a :
( s ) ( z )
(s
) = (r
)
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
29/62
S p e c i a l r e q u i r e m e n t s f o r m e t a g e n o m i c s
G e n o m i c s i g n a t u r e n e e d s t o :
W o r k o n s e q u e n c e s
1 , 0 0 0 b p
( s t a n d a r d t e s t 1 0 , 0 0 0 b p )
N o t r e l y o n s o u r c e g e n o m e
S t r a n d i n d e p e n d e n t
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
30/62
S e q u e n c e s c a n b e s a m p l e d f r o m b o t h s t r a n d s
s =
A G C A T G C A G C A T A T G T G G A G C A
T C G T A C G T C G T A T A C A C C T C G T = s C
W e w a n t :
( s ) = ( s C )
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
31/62
S e q u e n c e s c a n b e s a m p l e d f r o m b o t h s t r a n d s
s =
A G C A T G C A G C A T A T G T G G A G C A
T C G T A C G T C G T A T A C A C C T C G T = s C
W e w a n t :
( s ) = ( s C )
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
32/62
T a b l e o f C o n t e n t s
M e t a g e n o m i c s a n d B i n n i n g
G e n o m i c S i g n a t u r e s f o r B i n n i n g
E x p e r i m e n t s
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
33/62
T e s t e d s i g n a t u r e s :
S i g n a t u r e s e x p l o i t f r e q u e n c i e s o f s u b s e q u e n c e s ( l e n g t h = 4 )
3 k n o w n s i g n a t u r e s , S n o t u s e d i n m e t a g e n o m i c s
6 n e w s t r a n d i n d e p e n d e n t s i g n a t u r e s
D a t a :
1 , 2 8 4 p r o k a r y o t i c g e n o m e s ( N C B I )
S e q u e n c e l e n g t h : 1 , 0 0 0 b p [ B . Y a n g
e t a l .
, A C M - B C B , 2 0 1 0 ]
M a x o u t p u t o f 4 5 4 G S F L X + S y s t e m
D i s s i m i l a r i t y m e a s u r e d w i t h s i g n a t u r e d i s t a n c e ( M a n h a t t a n ) :
d((
s), (
z)) := ( s ) (z )
1
=n
i
=1
|i
(s
) i
(z
)|
[ M o h a m m e d
e t a l .
, B i o i n f o r m a t i c s , 2 0 1 1 ] , [ M r z e k
e t a l .
, M o l . B i o l . E v o l . , 2 0 0 9 ]
[ B o h l i n
e t a l .
, S c i e n t i c W o r l d J o u r n a l , 2 0 1 1 ] , [ K a r l i n
e t a l .
, A n n u . R e v . G e n e t . , 1 9 9 8 ]
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
34/62
P e r f o r m a n c e e v a l u a t i o n
d ((sh
), ( si
))W I T H I N - g e n o m e
d i s t a n c e
t
# b e t w e e n - d i s t a n c e s
S p e c i c i t y ( t ) =# w i t h i n - d i s t a n c e s t
# w i t h i n - d i s t a n c e s
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
38/62
H o w w e c o m p a r e : R O C c u r v e
F o r e a c h d i s t a n c e t h r e s h o l d t :
S e n s i t i v i t y (
t) =
# b e t w e e n - d i s t a n c e s > t
# b e t w e e n - d i s t a n c e s
S p e c i c i t y ( t ) =# w i t h i n - d i s t a n c e s t
# w i t h i n - d i s t a n c e s
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
39/62
H o w w e c o m p a r e : R O C c u r v e
F o r e a c h d i s t a n c e t h r e s h o l d t :
S e n s i t i v i t y (
t) =
# b e t w e e n - d i s t a n c e s > t
# b e t w e e n - d i s t a n c e s
S p e c i c i t y ( t ) =# w i t h i n - d i s t a n c e s t
# w i t h i n - d i s t a n c e s
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
40/62
H o w w e c o m p a r e : R O C c u r v e
a
a l w a y s b e t t e r t h a n
b
i f a n d o n l y i f
R O C o f a a b o v e R O C o f b
A l t e r n a t i v e i n d e x :
A r e a U n d e r t h e C u r v e ( A U C )
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
41/62
H o w w e c o m p a r e : R O C c u r v e
a
a l w a y s b e t t e r t h a n
b
i f a n d o n l y i f
R O C o f a a b o v e R O C o f b
A l t e r n a t i v e i n d e x :
A r e a U n d e r t h e C u r v e ( A U C )
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
42/62
R e s u l t s
T a b l e : C o m p a r i s o n o f
g e n o m i c s i g n a t u r e s
S i g n a t u r e A U C F e a t .
S 0 . 9 1 2 1 3 6
m a x0 . 9 0 0 1 2 0
T 0 . 8 8 4 2 5 6
m i n 0 . 8 8 1 1 2 0
I 0 . 8 5 1 1 6
R a n k 0 . 7 9 4 1 3 6
R a t i o 10 . 7 0 7 1 2 0
R a t i o 2 0 . 6 8 6 1 2 0
J S0 . 5 7 3 1 2 0
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
43/62
C o n c l u s i o n
W h a t w e d i d
F i r s t c o m p a r a t i v e t e s t o f f o r m e t a g e n o m i c s
1 , 2 8 4 p r o k a r y o t i c g e n o m e s s t u d i e d
N e w s i g n a t u r e s d e s i g n e d f o r m e t a g e n o m i c s
R e s u l t s
S u p p o r t s o m e k n o w n s i g n a t u r e s
( S b e t t e r t h a n T b u t n o t u s e d )
N e w s i g n a t u r e s : c o m p a r a b l e r e s u l t s w i t h l e s s f e a t u r e s
F u t u r e w o r k
T e s t o n s h o r t e r s e q u e n c e s ( 1 5 0 - 5 0 0 b p ) - p r e l i m i n a r y r e s u l t s
A n a l y z e p e r f o r m a n c e s a t o t h e r t a x o n o m i c l e v e l s ( f a m i l y ,
g e n u s , . . . )
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
44/62
C o n c l u s i o n
W h a t w e d i d
F i r s t c o m p a r a t i v e t e s t o f f o r m e t a g e n o m i c s
1 , 2 8 4 p r o k a r y o t i c g e n o m e s s t u d i e d
N e w s i g n a t u r e s d e s i g n e d f o r m e t a g e n o m i c s
R e s u l t s
S u p p o r t s o m e k n o w n s i g n a t u r e s
( S b e t t e r t h a n T b u t n o t u s e d )
N e w s i g n a t u r e s : c o m p a r a b l e r e s u l t s w i t h l e s s f e a t u r e s
F u t u r e w o r k
T e s t o n s h o r t e r s e q u e n c e s ( 1 5 0 - 5 0 0 b p ) - p r e l i m i n a r y r e s u l t s
A n a l y z e p e r f o r m a n c e s a t o t h e r t a x o n o m i c l e v e l s ( f a m i l y ,
g e n u s , . . . )
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
45/62
C o n c l u s i o n
W h a t w e d i d
F i r s t c o m p a r a t i v e t e s t o f f o r m e t a g e n o m i c s
1 , 2 8 4 p r o k a r y o t i c g e n o m e s s t u d i e d
N e w s i g n a t u r e s d e s i g n e d f o r m e t a g e n o m i c s
R e s u l t s
S u p p o r t s o m e k n o w n s i g n a t u r e s
( S b e t t e r t h a n T b u t n o t u s e d )
N e w s i g n a t u r e s : c o m p a r a b l e r e s u l t s w i t h l e s s f e a t u r e s
F u t u r e w o r k
T e s t o n s h o r t e r s e q u e n c e s ( 1 5 0 - 5 0 0 b p ) - p r e l i m i n a r y r e s u l t s
A n a l y z e p e r f o r m a n c e s a t o t h e r t a x o n o m i c l e v e l s ( f a m i l y ,
g e n u s , . . . )
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
46/62
C o n c l u s i o n
W h a t w e d i d
F i r s t c o m p a r a t i v e t e s t o f f o r m e t a g e n o m i c s
1 , 2 8 4 p r o k a r y o t i c g e n o m e s s t u d i e d
N e w s i g n a t u r e s d e s i g n e d f o r m e t a g e n o m i c s
R e s u l t s
S u p p o r t s o m e k n o w n s i g n a t u r e s
( S b e t t e r t h a n T b u t n o t u s e d )
N e w s i g n a t u r e s : c o m p a r a b l e r e s u l t s w i t h l e s s f e a t u r e s
F u t u r e w o r k
T e s t o n s h o r t e r s e q u e n c e s ( 1 5 0 - 5 0 0 b p ) - p r e l i m i n a r y r e s u l t s
A n a l y z e p e r f o r m a n c e s a t o t h e r t a x o n o m i c l e v e l s ( f a m i l y ,
g e n u s , . . . )
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
47/62
C o n c l u s i o n
W h a t w e d i d
F i r s t c o m p a r a t i v e t e s t o f f o r m e t a g e n o m i c s
1 , 2 8 4 p r o k a r y o t i c g e n o m e s s t u d i e d
N e w s i g n a t u r e s d e s i g n e d f o r m e t a g e n o m i c s
R e s u l t s
S u p p o r t s o m e k n o w n s i g n a t u r e s
( S b e t t e r t h a n T b u t n o t u s e d )
N e w s i g n a t u r e s : c o m p a r a b l e r e s u l t s w i t h l e s s f e a t u r e s
F u t u r e w o r k
T e s t o n s h o r t e r s e q u e n c e s ( 1 5 0 - 5 0 0 b p ) - p r e l i m i n a r y r e s u l t s
A n a l y z e p e r f o r m a n c e s a t o t h e r t a x o n o m i c l e v e l s ( f a m i l y ,
g e n u s , . . . )
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
C o n c l u s i o n
8/2/2019 Genomic signatures for metagenomic data analysis
48/62
W h a t w e d i d
F i r s t c o m p a r a t i v e t e s t o f f o r m e t a g e n o m i c s
1 , 2 8 4 p r o k a r y o t i c g e n o m e s s t u d i e d
N e w s i g n a t u r e s d e s i g n e d f o r m e t a g e n o m i c s
R e s u l t s
S u p p o r t s o m e k n o w n s i g n a t u r e s
( S b e t t e r t h a n T b u t n o t u s e d )
N e w s i g n a t u r e s : c o m p a r a b l e r e s u l t s w i t h l e s s f e a t u r e s
F u t u r e w o r k
T e s t o n s h o r t e r s e q u e n c e s ( 1 5 0 - 5 0 0 b p ) - p r e l i m i n a r y r e s u l t s
A n a l y z e p e r f o r m a n c e s a t o t h e r t a x o n o m i c l e v e l s ( f a m i l y ,
g e n u s , . . . )
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
C o n c l u s i o n
8/2/2019 Genomic signatures for metagenomic data analysis
49/62
W h a t w e d i d
F i r s t c o m p a r a t i v e t e s t o f
f o r m e t a g e n o m i c s
1 , 2 8 4 p r o k a r y o t i c g e n o m e s s t u d i e d
N e w s i g n a t u r e s d e s i g n e d f o r m e t a g e n o m i c s
R e s u l t s
S u p p o r t s o m e k n o w n s i g n a t u r e s
( S b e t t e r t h a n T b u t n o t u s e d )
N e w s i g n a t u r e s : c o m p a r a b l e r e s u l t s w i t h l e s s f e a t u r e s
F u t u r e w o r k
T e s t o n s h o r t e r s e q u e n c e s ( 1 5 0 - 5 0 0 b p ) - p r e l i m i n a r y r e s u l t s
A n a l y z e p e r f o r m a n c e s a t o t h e r t a x o n o m i c l e v e l s ( f a m i l y ,
g e n u s , . . . )
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
50/62
T h a n k y o u !
Q u e s t i o n s ?
g o r i @ s c i e n c e . r u . n l
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
8/2/2019 Genomic signatures for metagenomic data analysis
51/62
T h a n k y o u !
Q u e s t i o n s ?
g o r i @ s c i e n c e . r u . n l
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
W h i t h i n g e n o m e d i s t a n c e - v a l u e s d e r i v a t i o n
8/2/2019 Genomic signatures for metagenomic data analysis
52/62
F o r e a c h
g e n o m e
( 1 , 2 8 4 )
1 0 , 0 0 0
s e q u e n c e s
C o m p u t e
1 0 , 0 0 02
d i s t a n c e s
( a l l p a i r s )
M e a n
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
B e t w e e n g e n o m e d i s t a n c e - v a l u e s d e r i v a t i o n
8/2/2019 Genomic signatures for metagenomic data analysis
53/62
F o r 8 , 0 0 0 g e n o m e p a i r s
1 0 , 0 0 0
s e q u e n c e p a i r s
C o m p u t e
d i s t a n c e s
a n d t a k e t h e
M e a n
1 , 0 0 0 g e n o m e p a i r s f o r e a c h l e v e l o f t a x o n o m i c d i v e r s i t y , r a n d o m l y
s e l e c t e d
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
T a x o n o m i c d i v e r s i t y
8/2/2019 Genomic signatures for metagenomic data analysis
54/62
T w o g e n o m e s g
i
, g j
h a v e t a x o n o m i c d i v e r s i t y a t r a n k r
i L o w e s t C o m m o n A n c e s t o r o f g
i
a n d g
j
i s a t r a n k r .
L C A
g
1
g
2
g
3
g
4
g
5
g
6
g
7
g
8
g
9
g
1 0
g
1 1
g
1 2
M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s
T a x o n o m i c d i v e r g e n c e a n d s i g n a t u r e d i s t a n c e
8/2/2019 Genomic signatures for metagenomic data analysis
55/62
F o r e a c h s i g n a t u r e :
F o r e a c h (
r
1
,r
2
)p a i r o f r a n k s :
C h e c k t h a t :
D i s t a n c e d i s t r i b u t i o n r
1
Recommended