9
Data Processing with Ruby Brian Chapados http://chapados.org SDRuby April 3, 2008

Ruby for Data Processing

Embed Size (px)

DESCRIPTION

Brian Chapados talks about using ruby & rake to build a simple workflow to coordinate external processes.Watch a video at http://www.bestechvideos.com/2008/07/01/sd-rb-episode-048-ruby-for-data-processing

Citation preview

Page 1: Ruby for Data Processing

Data Processing with Ruby

Brian Chapadoshttp://chapados.org

SDRubyApril 3, 2008

Page 2: Ruby for Data Processing

> Archaeglobus PCNAMIDVIMTGELLKTVTRAIVALVSEARIHFLEKGLHSRAVDPANVAMVIVDIPKDSFEVYNIDEEKTIGVDMDRIFDISKSISTKDLVELIVEDESTLKVKFGSVEYKVALIDPSAIRKEPRIPELELPAKIVMDAGEFKKAIAAADKISDQVIFRSDKEGFRIEAKGDVDSIVFHMTETELIEFNGGEARSMFSVDYLKEFCKVAGSGDLLTIHLGTNYPVRLVFELVGGRAKVEYILAPRIESE

Understanding Proteins

sequence: 1-D linear chain

structure: 3-D after folding

Page 3: Ruby for Data Processing

Hard to do structures with several components

Page 4: Ruby for Data Processing

X-ray scattering

C. Trame, personal communication.Sousa et al. 2000. Cell 103: 633-643.

Page 5: Ruby for Data Processing

Raw Data Distance distribution function of

particle

R P(R) ERROR

0.0000E+00 0.0000E+00 0.0000E+00 0.5000E+00 0.3157E-02 0.0000E+00 0.1000E+01 0.6069E-02 0.0000E+00 0.1500E+01 0.8740E-02 0.0000E+00 0.2000E+01 0.1118E-01 0.0000E+00 0.2500E+01 0.1339E-01 0.0000E+00 0.3000E+01 0.1538E-01 0.0000E+00 0.3500E+01 0.1718E-01 0.0000E+00 0.4000E+01 0.1879E-01 0.0000E+00 0.4500E+01 0.2023E-01 0.0000E+00 0.5000E+01 0.2153E-01 0.0000E+00 0.5500E+01 0.2269E-01 0.0000E+00 0.6000E+01 0.2374E-01 0.0000E+00 0.6500E+01 0.2471E-01 0.0000E+00 0.7000E+01 0.2560E-01 0.0000E+00 0.7500E+01 0.2645E-01 0.0000E+00 0.8000E+01 0.2727E-01 0.0000E+00 0.8500E+01 0.2809E-01 0.0000E+00 0.9000E+01 0.2891E-01 0.0000E+00 0.9500E+01 0.2976E-01 0.0000E+00 0.1000E+02 0.3065E-01 0.0000E+00 0.1050E+02 0.3160E-01 0.0000E+00 0.1100E+02 0.3261E-01 0.0000E+00 0.1150E+02 0.3370E-01 0.0000E+00 0.1200E+02 0.3487E-01 0.0000E+00 0.1250E+02 0.3613E-01 0.0000E+00 0.1300E+02 0.3747E-01 0.0000E+00 0.1350E+02 0.3890E-01 0.0000E+00 0.1400E+02 0.4041E-01 0.0000E+00 0.1450E+02 0.4201E-01 0.0000E+00 0.1500E+02 0.4367E-01 0.0000E+00 0.1550E+02 0.4539E-01 0.0000E+00 0.1600E+02 0.4717E-01 0.0000E+00 0.1650E+02 0.4899E-01 0.0000E+00 0.1700E+02 0.5083E-01 0.0000E+00 0.1750E+02 0.5268E-01 0.0000E+00 0.1800E+02 0.5453E-01 0.0000E+00 0.1850E+02 0.5636E-01 0.0000E+00 0.1900E+02 0.5815E-01 0.0000E+00 0.1950E+02 0.5989E-01 0.0000E+00 0.2000E+02 0.6157E-01 0.0000E+00 0.2050E+02 0.6317E-01 0.0000E+00 0.2100E+02 0.6467E-01 0.0000E+00 0.2150E+02 0.6607E-01 0.0000E+00 0.2200E+02 0.6735E-01 0.0000E+00 0.2250E+02 0.6851E-01 0.0000E+00 0.2300E+02 0.6954E-01 0.0000E+00 0.2350E+02 0.7043E-01 0.0000E+00 0.2400E+02 0.7118E-01 0.0000E+00 0.2450E+02 0.7179E-01 0.0000E+00 0.2500E+02 0.7225E-01 0.0000E+00 0.2550E+02 0.7258E-01 0.0000E+00 0.2600E+02 0.7277E-01 0.0000E+00 0.2650E+02 0.7283E-01 0.0000E+00 0.2700E+02 0.7277E-01 0.0000E+00 0.2750E+02 0.7259E-01 0.0000E+00 0.2800E+02 0.7231E-01 0.0000E+00 0.2850E+02 0.7194E-01 0.0000E+00 0.2900E+02 0.7149E-01 0.0000E+00 0.2950E+02 0.7096E-01 0.0000E+00 0.3000E+02 0.7038E-01 0.0000E+00 0.3050E+02 0.6975E-01 0.0000E+00 0.3100E+02 0.6909E-01 0.0000E+00 0.3150E+02 0.6840E-01 0.0000E+00 0.3200E+02 0.6770E-01 0.0000E+00 0.3250E+02 0.6700E-01 0.0000E+00 0.3300E+02 0.6630E-01 0.0000E+00 0.3350E+02 0.6561E-01 0.0000E+00 0.3400E+02 0.6494E-01 0.0000E+00 0.3450E+02 0.6429E-01 0.0000E+00 0.3500E+02 0.6366E-01 0.0000E+00 0.3550E+02 0.6304E-01 0.0000E+00 0.3600E+02 0.6245E-01 0.0000E+00 0.3650E+02 0.6186E-01 0.0000E+00 0.3700E+02 0.6128E-01 0.0000E+00 0.3750E+02 0.6070E-01 0.0000E+00 0.3800E+02 0.6010E-01 0.0000E+00 0.3850E+02 0.5948E-01 0.0000E+00 0.3900E+02 0.5881E-01 0.0000E+00 0.3950E+02 0.5810E-01 0.0000E+00 0.4000E+02 0.5731E-01 0.0000E+00 0.4050E+02 0.5643E-01 0.0000E+00 0.4100E+02 0.5545E-01 0.0000E+00 0.4150E+02 0.5434E-01 0.0000E+00 0.4200E+02 0.5309E-01 0.0000E+00 0.4250E+02 0.5168E-01 0.0000E+00 0.4300E+02 0.5008E-01 0.0000E+00 0.4350E+02 0.4828E-01 0.0000E+00 0.4400E+02 0.4627E-01 0.0000E+00 0.4450E+02 0.4401E-01 0.0000E+00 0.4500E+02 0.4151E-01 0.0000E+00 0.4550E+02 0.3874E-01 0.0000E+00 0.4600E+02 0.3568E-01 0.0000E+00 0.4650E+02 0.3234E-01 0.0000E+00 0.4700E+02 0.2869E-01 0.0000E+00 0.4750E+02 0.2472E-01 0.0000E+00 0.4800E+02 0.2044E-01 0.0000E+00 0.4850E+02 0.1583E-01 0.0000E+00 0.4900E+02 0.1088E-01 0.0000E+00 0.4950E+02 0.5608E-02 0.0000E+00 0.5000E+02 0.0000E+00 0.0000E+00

Reciprocal space: Rg = 20.97 , I(0) = 0.2953E+02

Real space: Rg = 20.94 +- 0.026 I(0) = 0.2953E+02 +- 0.2278E+00

Page 6: Ruby for Data Processing

Existing SoftwareSvergun group @ EMBLhttp://www.embl-hamburg.de/ExternalInfo/Research/Sax/software.html

“interactive” interfacesnot easily scriptable

Works well, but...

requires running each program multiple times

no really... you have to see it to believe it

Page 7: Ruby for Data Processing

Help from Ruby

We want to use linux clusters with hundreds of CPUs

Ruby

Rake

wrap external programswrite shell scripts to run external programs

define relationships between inputs/outputs of different programs

launch external programs after dependencies are satisfied

Page 8: Ruby for Data Processing

Do more with Ruby

Define input parameters in a scriptDefine common tasks in a library

quick and dirty...

more robust...

Evolve towards a micro-framework

Ruby API for running commands

More sophisticated information processing

Page 9: Ruby for Data Processing

AcknowledgementsLab (Scripps Research Institute)

John TainerScott WilliamsChris Putnam

Data CollectionBeamline 12.3.1

The Advanced Light Source (ALS, LBNL)

FundingNIH, DOE, NCI