67
1 The API Uri Laserson | @laserson | [email protected] 21 May 2014

APIs and Synthetic Biology

Embed Size (px)

DESCRIPTION

Description of the API concept for engineering and how it can be useful. Particularly how it should be used with respect to genomics data. Finally, an analogy of the API concept in synthetic biology and how evolution allows encapsulation.

Citation preview

Page 1: APIs and Synthetic Biology

1

The API

Uri Laserson | @laserson | [email protected] May 2014

Page 2: APIs and Synthetic Biology

2

The API, or how to make your computational collaborators love you

Uri Laserson | @laserson | [email protected] May 2014

Page 3: APIs and Synthetic Biology

3

The API, or how to make your computational collaborators love you, and also some perspectives on engineering biology and immunologyUri Laserson | @laserson | [email protected] May 2014

Page 4: APIs and Synthetic Biology

4

Page 5: APIs and Synthetic Biology

5

NCBI Sequence Read Archive (SRA)

Today…1.14 petabytes

One year ago…609 terabytes

Page 6: APIs and Synthetic Biology

For every “-ome” there’s a “-seq”

Genome DNA-seq

TranscriptomeRNA-seqFRT-seqNET-seq

Methylome Bisulfite-seq

Immunome Immune-seq

ProteomePhIP-seqBind-n-seq

Page 7: APIs and Synthetic Biology

7

Crappy academic code

counts_dict = {}for chain in vdj.parse_VDJXML(inhandle): try: counts_dict[chain.junction] += 1 except KeyError: counts_dict[chain.junction] = 1

for count in counts_dict.itervalues(): print >>outhandle, np.int_(count)

Page 8: APIs and Synthetic Biology

8

Crappy academic code

counts_dict = {}for chain in vdj.parse_VDJXML(inhandle): try: counts_dict[chain.junction] += 1 except KeyError: counts_dict[chain.junction] = 1

for count in counts_dict.itervalues(): print >>outhandle, np.int_(count)

SELECT count(*) FROM antibodies GROUP BY junction

vs.

Page 9: APIs and Synthetic Biology

9

What is an API?

Page 10: APIs and Synthetic Biology

10

What is an API?

• Application Programming Interface• Contract (between machines)• Specifications for:

1. Procedures and methods2. Data structures/messages

Page 11: APIs and Synthetic Biology

11

Stripe API

Page 12: APIs and Synthetic Biology

12

Stripe API

Page 13: APIs and Synthetic Biology

13

Java API

public interface List<E> { int size(); boolean isEmpty(); boolean contains(Object o); boolean add(E e); void add(int index, E element); boolean remove(Object o);}

Page 14: APIs and Synthetic Biology

14

Python DB API v2.0 (PEP 249)

http://legacy.python.org/dev/peps/pep-0249/

Page 15: APIs and Synthetic Biology

15

Why use an API?

• Encapsulation/interfaces/abstraction• Loose-coupling of components• Reusable services• Service-oriented architecture

Page 16: APIs and Synthetic Biology

16

Linked-In’s Loose Coupling Architecture

Page 17: APIs and Synthetic Biology

17

Linked-In’s Loose Coupling Architecture

Page 18: APIs and Synthetic Biology

18

(If This Then That)Stitching APIs together

https://ifttt.com/recipes#popular

Page 19: APIs and Synthetic Biology

19

Page 20: APIs and Synthetic Biology

20

IMGT

Page 21: APIs and Synthetic Biology

21

IMGT “Spec”

http://www.imgt.org/IMGTScientificChart/

Page 22: APIs and Synthetic Biology

22

IMGT’s API is an FTP site

Page 23: APIs and Synthetic Biology

23

IMGT does not have an API

def __initVQUESTform(self): # get form request = urllib2.Request( 'http://imgt.cines.fr/IMGT_vquest/vquest?livret=0&Option=humanIg') response = urllib2.urlopen(request) forms = ClientForm.ParseResponse(response, form_parser_class=ClientForm.XHTMLCompatibleFormParser, backwards_compat=False) response.close() form = forms[0] # fill out base part of form - Synthesis view with no extra options - TEXT form['l01p01c03'] = ['inline'] form['l01p01c07'] = ['2. Synthesis'] form['l01p01c05'] = ['TEXT'] # may need to be 'TEXT' form['l01p01c09'] = ['60'] form['l01p01c35'] = ['F+ORF+ in-frame P'] form['l01p01c36'] = ['0'] form['l01p01c40'] = ['1'] # ['1'] for searching with indels form['l01p01c25'] = ['default’] ...

Page 24: APIs and Synthetic Biology

24

Haussler and genomics services

Page 25: APIs and Synthetic Biology

25

Google Genomics API

Page 26: APIs and Synthetic Biology

26

Google Genomics API

Page 27: APIs and Synthetic Biology

27

Flask/Bottle web server example

@route("/receptor/<id>")def lookup_receptor(id): # get the raw read

@route("/sample/<sample_id>")def sample_summary(sample_id): # impl for getting sample information; can return: # * summary of repertoire information # (num reads, VDJ distribution, etc.) # * demographic info

@route("/sample/<sample_id>/common_junctions")def common_junctions(sample_id): # impl for getting the most common CDR3s

Page 28: APIs and Synthetic Biology

28

Genomics ETL has converged on standards

.fastq .bam .vcf

short read alignment

genotype calling analysisbiochemistry

Page 29: APIs and Synthetic Biology

29

VCF##fileformat=VCFv4.1##fileDate=20090805##source=myImputationProgramV3.1##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>##phasing=partial##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">##FILTER=<ID=q10,Description="Quality below 10">##FILTER=<ID=s50,Description="Less than 50% of samples have data">##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">#CHR POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA0000320 14370 rs605 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:320 1110696 rs604 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.6 GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:420 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2

Page 30: APIs and Synthetic Biology

30

What about immune data?

.fastq .bam .vcf

short read alignment

genotype calling analysisbiochemistry

.???immune receptor alignment

Page 31: APIs and Synthetic Biology

31

Multiple models for same types: VDJFasta

sub new { my ($class) = @_; my $self = {}; $self->{filename} = ""; $self->{headers} = []; $self->{sequence} = []; $self->{germline} = []; $self->{nseqs} = 0; $self->{mids} = {};

$self->{accVsegQstart} = {}; # example: 124 $self->{accVsegQend} = {}; # example: 417 $self->{accJsegQstart} = {}; $self->{accJsegQend} = {}; $self->{accDsegQstart} = {};

Page 32: APIs and Synthetic Biology

32

Multiple models for same types: vdj

class ImmuneChain(SeqRecord): def cdr3(self): return len(self.junction)

def num_mutations(self): aln = self.letter_annotations['alignment'] return aln.count('S') + aln.count('I') def v(self): return self.__getattribute__('V-REGION') \ .qualifiers['allele'][0] def v_seq(self): return self.__getattribute__('V-REGION') \ .extract(self.seq.tostring())

Page 33: APIs and Synthetic Biology

33

Interoperability/services depend on being able to communicated data

Page 34: APIs and Synthetic Biology

34

CSV

9 CCTG_PRCONS=IGHC1_R1_IGM unproductive Homsap IGHV5-51*01 F, or Homsap IGHV5-51*03 F Homsap IGHJ4*02 F Homsap 12 GGGG_PRCONS=IGHC3_R1_IGA productive Homsap IGHV3-11*01 F Homsap IGHJ1*01 F Homsap IGHD2-2*03 F .......13 CTTC_PRCONS=IGHC5_R1_IGG unproductive Homsap IGHV1-2*02 F Homsap IGHJ5*02 F Homsap IGHD5-18*01 F .......18 ACTT_PRCONS=IGHC3_R1_IGA productive Homsap IGKV3-15*01 F, or Homsap IGKV3D-15*01 F or Homsap IGKV3D-15*02 P Homsap 20 GGAC_PRCONS=IGHC5_R1_IGG productive Homsap IGHV4-61*02 F Homsap IGHJ4*02 F Homsap IGHD1-26*01 F .......25 TCGT_PRCONS=IGHC2_R1_IGD productive Homsap IGHV3-23*01 F, or Homsap IGHV3-23*04 F or Homsap IGHV3-23D*01 F Homsap 26 GGTG_PRCONS=IGHC5_R1_IGG productive Homsap IGHV4-34*01 F, or Homsap IGHV4-34*02 F or Homsap IGHV4-34*08 F Homsap 28 GTGA_PRCONS=IGHC5_R1_IGG productive Homsap IGHV1-46*01 F, or Homsap IGHV1-46*02 F or Homsap IGHV1-46*03 F Homsap 31 ACCC_PRCONS=IGHC1_R1_IGM productive Homsap IGHV3-9*01 F, or Homsap IGHV3-9*02 F Homsap IGHJ3*02 F Homsap 36 GCAA_PRCONS=IGHC1_R1_IGM productive Homsap IGHV3-9*01 F, or Homsap IGHV3-9*02 F Homsap IGHJ2*01 F Homsap 39 GCAA_PRCONS=IGHC1_R1_IGM productive Homsap IGHV3-7*01 F Homsap IGHJ6*02 F Homsap IGHD1-7*01 F .......40 GGGT_PRCONS=IGHC1_R1_IGM productive Homsap IGHV4-34*01 F, or Homsap IGHV4-34*02 F or Homsap IGHV4-34*08 F Homsap 42 TAGG_PRCONS=IGHC5_R1_IGG productive Homsap IGHV4-39*01 F, or Homsap IGHV4-39*05 F Homsap IGHJ4*02 F Homsap 47 CAAA_PRCONS=IGHC1_R1_IGM productive Homsap IGHV3-15*01 F, or Homsap IGHV3-15*02 F Homsap IGHJ6*02 F Homsap 48 AGAA_PRCONS=IGHC5_R1_IGG unproductive Homsap IGHV3-30*04 F, or Homsap IGHV3-30-3*01 F or Homsap IGHV3-30-3*02 F or Ho52 GCAG_PRCONS=IGHC1_R1_IGM productive Homsap IGHV3-23*01 F, or Homsap IGHV3-23*04 F or Homsap IGHV3-23D*01 F Homsap 53 AACC_PRCONS=IGHC3_R1_IGA productive Homsap IGHV3-30*02 F Homsap IGHJ4*02 F Homsap IGHD5-18*01 F .......

Page 35: APIs and Synthetic Biology

35

XML

<ImmuneChain> <c>IGHD</c> <barcode>RL014</barcode> <j_start_idx>389</j_start_idx> <seq>TTGTGGCTATTTTAAA ... CTCGGACT</seq> <descr>003699_0091_0140</descr> <tag>coding</tag> <clone>IGHV3-43_IGHJ4|387</clone> <j>IGHJ4*02</j> <v_end_idx>314</v_end_idx> <v>IGHV3-43*01</v> <junction>TGTGCAAAAGATAATCT ... TCTTTGACTACTGG</junction> <d>IGHD5-24*01</d></ImmuneChain>

Page 36: APIs and Synthetic Biology

36

JSON

{ "v": "IGHV4-39*02", "seq": "CCTATCCCCCTGTGTGCCTT ... CTCCACCAAG", "num_mutations": 43, "name": "HG2DXMN01CY8UH", "letter_annotations": { "alignment": "..............S....S....3333333333333333........S.." }, "junction_nt": "GCGAGGGGCCGATGGGACTTTTATTACATGGACGTC", "j": "IGHJ6*03", "annotations": { "usearch_90_cluster": "6277", "experiment_date": "20120119", "donor": "17517", "sample_type": "memory_B_cells", "source": "SeqWright", "tags": ["revcomp", "coding"], "taxonomy": [] }, "d": "IGHD3-10*01", "features": [ { "strand": 1, "type": "V-REGION", "location": [51, 356], "qualifiers": { "CDR_length": ["[10.7.2]"], "codon_start": ["1"], "gene": ["IGHV4-39"], "allele": ["IGHV4-39*02"] } }, ... ]}

http://www.json.org/

Page 37: APIs and Synthetic Biology

37

JSON

{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000000" }, "annotations" : { "D-REGION" : "IGHD3-10*01", "accessions" : "HG2DXMN01CY8{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000001" }, "annotations" : { "D-REGION" : "IGHD3-9*01", "accessions" : "HG2DXMN01A3VH{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000002" }, "annotations" : { "D-REGION" : "IGHD3-10*01", "accessions" : "HG2DXMN01BC6{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000003" }, "annotations" : { "D-REGION" : "IGHD6-19*01", "accessions" : "HG2DXMN01DYU{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000004" }, "annotations" : { "D-REGION" : "IGHD6-19*01", "accessions" : "HG2DXMN01A8F{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000005" }, "annotations" : { "D-REGION" : "IGHD3-9*01", "accessions" : "HG2DXMN01BDI2{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000006" }, "annotations" : { "D-REGION" : "IGHD6-19*01", "accessions" : "HG2DXMN01BS2{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000007" }, "annotations" : { "D-REGION" : "IGHD6-19*01", "accessions" : "HG2DXMN01DLL{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000008" }, "annotations" : { "D-REGION" : "IGHD6-25*01", "accessions" : "HG2DXMN01BLF{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c6172308000009" }, "annotations" : { "D-REGION" : "IGHD3-3*01", "accessions" : "HG2DXMN01D4TL{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c617230800000a" }, "annotations" : { "D-REGION" : "IGHD3-10*01", "accessions" : "HG2DXMN01BU6{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c617230800000b" }, "annotations" : { "D-REGION" : "IGHD2-2*03", "accessions" : "HG2DXMN01BIMG{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c617230800000c" }, "annotations" : { "D-REGION" : "IGHD3-3*01", "accessions" : "HG2DXMN01BM9Z{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c617230800000d" }, "annotations" : { "D-REGION" : "IGHD2-2*03", "accessions" : "HG2DXMN01BH9Q{ "__SeqRecord__" : true, "_id" : { "$oid" : "4f1f5525e7c617230800000e" }, "annotations" : { "D-REGION" : "IGHD6-19*01", "accessions" : "HG2DXMN01BR3

Page 38: APIs and Synthetic Biology

38

Binary formats

• Protobuf, Thrift, or Avro• Flexible data model

• All common primitive types (e.g. int, double string)• Support nested types, including arrays and maps

• Efficient binary encoding• Code generation for many languages (binary

compatible)• Support for schema evolution• Support IDL for data types and services

Page 39: APIs and Synthetic Biology

39

Thrift example: Twitter

service Twitter { void ping(); bool postTweet(1:Tweet tweet); TweetSearchResult searchTweets(1:string query);}

struct Tweet { 1: required i32 userId; 2: required string userName; 3: required string text; 4: optional Location loc; 16: optional string language = "english"}

Page 40: APIs and Synthetic Biology

40

Thrift example: Immune receptor

cd ~/repos/kiwithrift --gen java kiwi-format/src/main/resources/thrift/kiwi.thriftthrift --gen py:new_style kiwi-format/src/main/resources/thrift/kiwi.thrift

See: https://github.com/laserson/kiwi

Page 41: APIs and Synthetic Biology

41

Questions?

Page 42: APIs and Synthetic Biology

42

Biological parts specifications

• Library of parts with well-characterized input-output characteristics

• In total, similar to API spec

Canton, Nat. Biotech. 26: 787 (2008)

Page 43: APIs and Synthetic Biology

43

Engineering signaling pathways at inputs/outputs

Lim, Nat. Rev. Mol. Cell 11: 393 (2010)

Page 44: APIs and Synthetic Biology

44

Bottom-up genetic circuit design

Brophy, Nature Meth. 11: 508 (2014)

Page 45: APIs and Synthetic Biology

45

Bottom-up genetic circuit design

Brophy, Nature Meth. 11: 508 (2014)

Page 46: APIs and Synthetic Biology

46

Predict composability of genetic elements

Kosuri, PNAS 110: 14024 (2013)

• 114 promoters x 111 RBS

“…rather than relying on prediction or standardization, we can screen synthetic libraries for desired behavior.”

Page 47: APIs and Synthetic Biology

47

Most addressableCheapest to create

ZFN => TALEN => CRISPR/CasLeast addressableMost expensive to create

Page 48: APIs and Synthetic Biology

48

Addressability for precision nanoscale engineering

Douglas, NAR 37: 5001(2009)

Page 49: APIs and Synthetic Biology

49

Addressability for precision nanoscale engineering

Douglas, Nature 459: 414 (2009)

Page 50: APIs and Synthetic Biology

50

Evolution for encapsulation: an evolved electronic thermometer

http://www.genetic-programming.com/hc/thermometer.html

Page 51: APIs and Synthetic Biology

51

Lycopene synthesis optimization

Wang, Nature 460: 894 (2009)

Page 52: APIs and Synthetic Biology

52

Evolutionary encapsulation for signaling pathway engineering

Peisajovich, Science 328: 368 (2010)

Page 53: APIs and Synthetic Biology

53

Evolutionary encapsulation for signaling pathway engineering

Peisajovich, Science 328: 368 (2010)

Page 54: APIs and Synthetic Biology

54

Genetic isolation with Re.coli

Lajoie, Science 342: 357 (2013)

Page 55: APIs and Synthetic Biology
Page 56: APIs and Synthetic Biology

So far, we discussed antibody-only data analysis

Page 57: APIs and Synthetic Biology

Antigen-only data generation

Page 58: APIs and Synthetic Biology

Larman, Nat. Biotech. 29: 535 (2011)

Ben Larman

Steve Elledge

Agilent OLS array

Page 59: APIs and Synthetic Biology

59

Phage immunoprecipitation sequencing (PhIP-seq)

Page 60: APIs and Synthetic Biology

60

Patient A Replica 1

Pat

ient

A R

epl

ica

2

SAPK4

NOVA1

TGIF2LX

log10(-log10 P-value)

PhIP-seq proof-of-principle

Page 61: APIs and Synthetic Biology

61

‘Forward vaccinology’

Page 62: APIs and Synthetic Biology

62

‘Reverse vaccinology’

Page 63: APIs and Synthetic Biology

63

‘Immunization without vaccination’

Page 64: APIs and Synthetic Biology

64

Encapsulation for cancer immunotherapy through TMG processing

Tran, Science 344: 641 (2014)

Page 65: APIs and Synthetic Biology

65

Other examples?

Page 66: APIs and Synthetic Biology

66

Conclusions

• The API perspective helps organize and communicate data

• Use sane file formats if possible:• JSON for lightweight work• Thrift/Avro for heavyweight serialization/communication

• Decouple data modeling for implementation details• Biological engineering: what abstractions are

available?• Evolution as nature’s encapsulator

Page 67: APIs and Synthetic Biology

67