Upload
jorge-boucas
View
194
Download
4
Embed Size (px)
Citation preview
1 Sunday 11 December 16 Jorge Bouças, Bioinformatics Core Facility, MPI-AGE, Köln
Actionable data in life sciences
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16 2
Performance
request for data analysis reply with results
time
• background / scientific question
• metadata collection
• data transfer
• data analysis • validation
• data transfer
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16 3
Performance
request for data analysis reply with results
time
• background / scientific question
• metadata collection
• data transfer
• data analysis • validation
• data transfer
No build test No integration test Tailor cut validation
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16 4
Performance
request for data analysis reply with results
time
• background / scientific question
• metadata collection
• data transfer
• data analysis • validation
• data transfer
structured inplace actionable 24/7
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16 5
Performance
þ Network
þ Storage
þ CPUs
þ Memory
þ Software
þ Algorithms ¨ Human
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16 6
Performance
þ Network
þ Storage
þ CPUs
þ Memory
þ Software
þ Algorithms ¨ Human
"Nur 8,3 Prozent der Stellen für
Informatiker können problemlos besetzt
werden.”
http://www.golem.de
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16 7
Performance
þ Network
þ Storage
þ CPUs
þ Memory
þ Software
þ Algorithms ¨ Human
Data Science
Computer Science
Math & Statistics
Subject Matter Expertise
/ biology
Unicorn Trad.
Research Trad.
Software
Machine Learning
Copyright 2014 by Steven Geringer Raleigh, NC. Permission is granted to use, distribute, or modify this image, provided that this copyright remains intact
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16 8
Performance
þ Network
þ Storage
þ CPUs
þ Memory
þ Software
þ Algorithms ¨ Human
“… It appears that the development of effective human cooperation and the development of man-computer symbiosis are "chicken-and-egg" problems. It will take unusual human teamwork to set up a truly workable man-computer partnership, and it will take man-computer partnerships to engender and facilitate the human cooperation. …if the required solutions are not ready, it would not be good to wait for them.”
Licklieder JRC, Clark WE, On-line man-computer communication, Proceedings of the May 1-3, 1962, spring joint computer conference
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
“On-line man-computer communication”
HPC
git
datashare
9
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
“On-line man-computer communication”
HPC
git
datashare
10
Berlin
Garching
Köln
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
“On-line man-computer communication”
HPC
git
datashare
11
Berlin
Garching
Köln
TAPE
in-house
curl / wget md5sum
bit -g
www
rsync
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
“On-line man-computer communication”
HPC
git
datashare
12
Berlin
Garching
Köln
results 8kb .. 8gb
private link 21d public link
write upload log on wiki with perma links
push code
https://to.data
bit -i <myfile.txt> -m <code and data message>
customer
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
“On-line man-computer communication”
HPC
git
datashare
13
Berlin
Garching
Köln
results 8kb .. 8gb
private link 21d public link
write upload log on wiki with perma links
push code
https://to.data
bit -i <myfile.txt> -m <code and data message>
customer
Binding of Results & Code
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
“On-line man-computer communication”
HPC
git
datashare
14
Berlin
Garching
Köln
results 8kb .. 8gb
private link 21d public link
write upload log on wiki with perma links
push code
https://to.data
bit -i <myfile.txt> -m <code and data message>
customer
Binding of Results & Code
> 30 projects / 3 analysts
1 project: > 1000 GB data > 1000 files > 1000 lines of code (with dependencies)
> 10-40 change actions
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
“On-line man-computer communication”
15
HPC datashare git
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
“On-line man-computer communication”
16
HPC datashare git
bit --start <DP_project_name>
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
“On-line man-computer communication”
17
HPC datashare git
bit --start <DP_project_name>
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
“On-line man-computer communication”
18
HPC datashare git
bit -i <myfile.txt> -m <code and data message>
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
“On-line man-computer communication”
19
HPC datashare git
bit -i <myfile.txt> -m <code and data message>
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
“On-line man-computer communication”
20
HPC datashare git
bit -c <folder_to_create>
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
“On-line man-computer communication”
21
HPC datashare git
bit -g <folder_or_file_to_download>
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
“On-line man-computer communication”
22
HPC HPC2
bit --sync <folder_or_file_to_sync> --sync_to <Uname@HPC2>
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
“On-line man-computer communication”
23
HPC HPC2
bit --sync <folder_or_file_to_sync> --sync_from <Uname@HPC2>
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
“On-line man-computer communication”
24
HPC git
bit --adduser
Garching HPC
Köln HPC
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
“On-line man-computer communication”
git
datashare
25
Berlin
Garching
results 8kb .. 8gb
private link 21d public link
write upload log on wiki with perma links
push code
https://to.data customer
user1
user2
user3
pull code
Garching HPC
Köln HPC
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
github.com/owncloud/pyocclient
datashare
26
Garching
results 8kb .. 8gb
private link 21d public link
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
github.com/owncloud/pyocclient
27
REST API
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
github.com/owncloud/pyocclient
28
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
github.com/owncloud/pyocclient
29
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
Why?
30
ownCloud http tmp link >> download. simplicity
Github
“With statement-by-statement compiling and testing and with computer-aided book-keeping and program integration, a few very talented men may be able to handle in weeks programming tasks that ordinarily require many people and many months.”
Licklieder JRC, Clark WE, On-line man-computer communication, Proceedings of the May 1-3, 1962, spring joint computer conference
ownCloud + Github data & metadata management
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
Front-end
31
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
“Back-end”
32
register
http://www.mpcdf.mpg.de/userspace/forms/onlineregistrationform
Sys. Admin. (MPI-AGE)
Github (MPI-MOLGEN)
user
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16 33
Performance
request for data analysis reply with results
time
• background / scientific question
• metadata collection
• data transfer
• data analysis • validation
• data transfer
bit
Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16
[b]ermuda [i]nformation [t]riangle
34
github.com/mpg-age-bioinformatics/AGEpy