Upload
lediep
View
216
Download
0
Embed Size (px)
Citation preview
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Big Data pour les sciences de la vie:
Accès aux données (performance,
flexibilité, securité et vie privée)
2016/06
DataDirect Networks, Inc.
James Coomer [email protected]
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Tendance dans le stockage haute performance
Global Distribution
Continued Pressure to
improve economics
Managing Scale and new
Data Paradigms
Managing performance for
diverse workloads Global Data Distribution
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Global Distribution
Managing performance for
diverse workloads Global Data Distribution
Continued Pressure to
improve economics
Managing Scale and new Data Paradigms
Sciences de la vie & stockage: tendances
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Global Distribution
Managing performance for
diverse workloads
Continued Pressure to
improve economics
Managing Scale and new Data Paradigms
Global Data
Distribution Applying Security
Sciences de la vie & stockage: tendances
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Global Distribution
Continued Pressure to
improve economics
Managing Scale and new Data Paradigms
Global Data
Distribution Applying Security
Managing performance for diverse workloads
Sciences de la vie & stockage: tendances
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Les sciences de la vie représentent un
défi pour le stockage aujourd'hui
Continued Pressure to
improve economics
Managing Scale and new Data Paradigms
Managing performance for diverse workloads Global Data
Distribution Applying Security
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
8 A propos de la Sécurité
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Un peu de contexte
►La sécurité n'a pas été une priorité majeure des
système de stockage HPC •Conception orientée vers la Performance et la Scalabilité
•Pas de connexion directe à l'Internet et accès physique
sécurisé
►Mais aujourd'hui, le stockage HPC N'EST PLUS
limité à du scratch et aux repertoires des
utilisateurs •Un même cluster sollicité pour plusieurs types d'utilisation
•Le matériel dédié n'est pas satisfaisant en terme
coût/performance
•Les données sécurisé ne doivent être accessibles et
visibles UNIQUEMENT au personnes autorisées
►Authentification par utilisateur/noeud
►Capacité d'audit
►Chiffrement des données
9
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Isolement des données avec Docker
►La virtualiation dans les
sciences de la vie apporte
des capacités d'isolement
(multi-tenancy)
10
COMPUTE
container
container
container
container
container
container
container
container
container
container
container
container
container
container
container
container
container
container
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Isolement des données avec Docker
►The Lustre Filesystem
provides good performance, but
breaks the isolation
►all containers would see all
data only being restricted by
POSIX permissions/ACLs
11
LUSTRE
COMPUTE
container
container
container
container
container
container
container
container
container
container
container
container
container
container
container
container
container
container
/
/fast /work /scratch
/jen /eva /bob /eva /bob
/scratch
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Isolement des données avec Docker
►Utilisation d'un Linux
“Kerberizé” pour
l'authentification des
containers Dockers auprès
du système de fichiers
12
LUSTRE
COMPUTE
KERBEROS
SERVER
container
container
container
container
container
container
container
container
container
container
container
container
container
container
container
container
container
container
/
/fast /work /scratch
/jen /eva /bob /eva /bob
/scratch
1. credentials(A)
3. mount(credA)
2. credA
4. /filesetA
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Isolement des données avec Docker
►Deux éléments clef
pour isoler les données
:
►Notion de nodemap,
associant des filesets
avec NIDs (Lustre
containers)
►Utilisation de
montage de sous-
répertoires
13
LUSTRE
COMPUTE
KERBEROS
SERVER
container
container
container
container
container
container
container
container
container
container
container
container
container
container
container
container
container
container
/
/fast /work /scratch
/jen /eva /bob /eva /bob
/scratch
1. credentials(A)
3. mount(credA)
2. credA
4. /filesetA
nodemap
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Performance des clients Lustre dans des
containers 14
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Les sciences de la vie représentent un
défi pour le stockage aujourd'hui
Comment relever ce défi ?
Managing performance for diverse workloads
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Performance des E/S dans les sciences de la vie
►Petites E/S
►Petits Fichiers
►mmap() E/S •Souvent suite à l'utilisation de Java
•GPFS peut se révéler sous-optimal dans ces conditions:
"La structure du noyau Linux implique pour GPFS un
volume additionnel d'opérations de synchronisation afin de
maintenir la sémantique de mmap. Ces opérations nuisent
aux performances."
16
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Exemple d'appel E/S mmap() 17
0
100
200
300
400
500
lustre-1.8.9 lustre-2.5.2 DDN branch
mmap() Read Performance (1MB Block Size)
0
100
200
300
400
500
32K 128K 512K 1024K
mmap() Read Performance (32K to 1 MB Block Size)
Lustre-1.8.9 DDN branch
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Sanger Institute Benchmarks 18
2.5.1 2.5.23-ddntest15
0
10
20
30
40
50
Human Genetics samtools workflow
Ru
ntim
e (
Ho
urs
)
Lustre 2.5 Client Performance
Samtools
20% plus rapide
grace aux
optimisations
Lustre de DDN
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Lustre Metadata Performance
Extensibilité des MDSs (Répertoire unique)
0
50000
100000
150000
200000
250000
16 32 64 128 256 512 1024
op
s/s
ec
Number of Threads
Lustre Metadata Scalability (Unique) (32 clients, 1024 mount points)
File Creation
File Removal
Dir Creation
Dir Removal
File Creation(DNE)
File Removal(DNE)
100% by DNE
50% of File
creation
multiple
mount points
Directory Creation
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Le problème des petites E/S
www.ellexus.com
0
5
10
15
20
25
30
35
0 30 60 90 120 150
Nu
mb
er
read
s
(in
milli
on
s)
pe
r 20 s
ec i
nte
rval
Time (minutes)
1B reads
2B-4KB reads
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Le deploiement de SSDs sous le PFS n'est pas
une réponse satisfaisante
0
200
400
600
800
1000
1200
1400
1600
1800
2000
1 10 100 1000
Th
rou
gh
pu
t (M
B/s
)
IO Request SIze (KB)
Parallel Filesystem on IME Demo Cluster 14x1U servers with SSDs
(50GB/s available)
Avg IOR Tput (MB/s)
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
PARALLEL
FILESYSTEM
COMPUTE
IME
IME
SERVER IME
SERVER
IME
SERVER
IME
SERVER
DDN | IME Application I/O Workflow
►Insertion d'une strate additionnelle de NVMs
distribués entre les applications et le PFS
►Architecture Client-Server
►Transparent pour les utilisateurs
►Transparent pour les applications
►Ré-organisation des données avant la reversement
des données sur le PFS
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
Structuration en Log dans IME Consider two different application I/O patterns (write)
‘Burst buffer blocks’ (BBB) sont les buffers
générés au niveau du client
Notez que les contenus des BBB peuvent être
alignés ou non.
La même méthode de stockage est utilisée pour
les deux BBB et ce en dépit de la différence
qualitative de contenu
Séquentiel
Non-séquentiel
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
IME vs Systèmes de fichiers parallèles : Performance pour un fichier partagé (SSF)
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.
Any statements or representations around future events are subject to change.
102-0081
東京都千代田区四番町6-2
東急番町ビル 8F
TEL:03-3261-9101
FAX:03-3261-9140
company/datadirect-networks
@ddn_limitless
Thank
You! Keep in touch with us