Upload
solomon-caldwell
View
214
Download
0
Embed Size (px)
Citation preview
Lives of the Scientist
Genetic Basis of Differentiation
Events in time and space . . .
Events in time and space . . .. . . driven by patterned gene expression
Genetic Basis of Differentiation
Events in time and space . . . . . . driven by patterned gene expression
Genetic Basis of Differentiation
Events in time and space . . . . . . driven by patterned gene expression
Genetic Basis of Differentiation
NH3 N2
NH3
Nostoc
Genetic Basis of Differentiation
NH3
Environmental Signal Developmental Response
Histidine Kinase
How?
Genetic Basis of Differentiation
Developmental Response
Histidine Kinase
How?
NH3
Environmental Signal
PAT
Genetic Basis of Differentiation
Developmental Response
Histidine Kinase
P
Response Regulator
How?
NH3
Environmental Signal
Phistidine
Genetic Basis of Differentiation
Developmental Response
Histidine Kinase
P
Response Regulator
How?
NH3
Environmental Signal
P
? ? ?NpR3010
AATAAAGCTTTACAAACCAAACTCTGGCTTCAATTGTGTAACCCAAGCTTTGATTCTTTCCTCTGTTAAATCGGATTGATTATCTTCATCAAGGGCAAGACCTACAAATTTACCATCACGAACAGCTTTAGACTCACTGAATTCATAACCTTCTGTAGGCCAATAGCCAACTGTTTCACCACCATTTTCTGAAATTTTTTCCTCTAGAATACCGAGGGCATCTTGAAATGTATCAGGATAACCAACCTGGTCTCCAGGAGCAAAATAAGCAACTTTTTTGCCGATGAAGTCAATGTTATCTAACTCATCATAAAAATTTTCCCAATCACTTTGCAATTCTCCAACATTCCAGGTAGGACAACCAACAACGATATAATCGTAGTTATTGAAATCACTTGGTTCAGCTTGTGAAATATCATATAAAGTTACAACACTATCACCACCAAACTCCTTCTGAATTATTTCTGATTCAGTTTGGGTATTGCCTGTTTGAGTACCAAAAAATAAACCAATATTAGACATTTTTACTCCTTTTATGTATTTGCAAAATTATTTCAATTAAAATATTTAGTAATAATTAATTGTTAGCTAGCTAATAATTAAATTTTTATTACAATCATTGTAAAAGGCATTGAAAAAGTAAATAAAAATTTTTATTCTACGTTATTTCAAAAATATTTACTTACATATACTTAACCTTTATAGTGATGTAATATACTCTAATTCCTATTTTACTTATAAATACCATCTCAGCTTAATGTAACGAATTTTTCTGTTTATCTTTAAATACAAAAAATTCAACAAAACTACAGAAAATTAATCTTAATAACACAAAACAAGTATCAATCTGTAATACAACTAAGCTTAAATAAATTAATAGAAAGCTTCATCTATCTAATAGGTTGAGAATAGTTTATGTCTAATGACATAAATTCATTCGTGTTGATTTCATTTGGGTATATTCATCTGATTTAGGATTTACTCCATTAAGTTTGTACTCATCAATGCCCGCCTGTTGGTATCCACAATTCTCATACAGTGCGCGAGCAAAGTAATCAATCGTTCGTCGCCATATCTAACTTTGAGTCAAACAAACCAGTTGGATTACCAACCCTCAACTAATCGCTTCTTTAAGGCGAGCGATCGCACATTTAACTGTTGGTTGTCACAAGAGAACTAATACTACAGCAGTATATTTAACAACTAAGGGTGGTTCAACTTTCGCTGCGACTCCTCCAACGCGCTGAAATACACAGGACTGATGCGATCGCAAACTCTTTGACTAAATTCCATACATTATCATGACCATCTCCCAAACAAACAAGTGGGTTAACCAGATGCTGACTATTAACATCCCCTGAGTTCGGAGTTGTAGGTCTATTTGACTGGTTCAAAGCGATGATGGAACGGCTTTGTTGCATGAATTAAAAAAAGACACACCATCACCTACTTCTAGGATAGACACATCAAACGTCCCACCGCCTAAGTCAAATACCAAGATAATTTCGTTAGTTTTCTTGTCAAGTCCGTAAGCGAGGGCCGCCGCCGTGGGCTAGTTGATAATTCGCAGAACTTTAATCCCGGCAATTCTACTGGCATCTTTGGTAGCCTGCCGTTGAGAGTCATTGAAATAGGCAGGGGTGGTAATTACCGCTTGCCTCACTGGTTCCCCCAGATATGTGCTGGCATCATCTATCAGCTTGCGGACTACCTCATACCATTTCACGAAAAACCTGATACACATGTAAACTCTGAAACCCTTGCTGTATCAAAGTTTTGTAATTACGAATTACGAATTACGAATTGATATCAGCCGAGATTTCTTCGGGTGAAAATTCCTTGTTCAGAGCGGGACAGTGTAGCTTGACATTGCCATTACTGTCACGTACCACTTTGTAAGTAACTTGTTTTGCCTCTTGCGTAACTTCATCATACCTGCGCCCGATGAACCGCTTCACAGAATAAAAAGTGTTTTCTGGGTTCATTACACCCTGGCGCTT
Genetic Basis of Differentiation
Developmental Response
Histidine Kinase
P
Response Regulator
How?
NH3
Environmental Signal
P
? ? ?NpR3010
Histidine Kinase
NpR3010Nostoc punctiforme
Genes Functionally Related to His Kinase
Anabaena PCC 7120
Trichodesmium
Synechocystis PCC 6803
. . . (13 total) Find similar genes
BlastConserved
>npun_22dec03_Contig1_revised_geneNpR3010 MWHIQDSIITLSNHNQYLTFYKNQVKNPERFCRNVNQFDSQIDFVSCDIL ELKDGRFFEQYSKPLRLAEEIIGTVWSFRDITESQQAKEENRRIIQQEKQ LAEDRAYFTSMIFHEFRNPLNIISYSTSLLKRHSHHWSEEKKLQCLQNLQ TAVEQINQFTDEVLIIESVEAGKLQYELKPIDLNLFCREVLAEMSLYTKG ASQFLLFQNK*
MWHIQDSIITLSNHNQYLTFYKNQVKNPERFCRNVNQFDSQIDFVSCDIL
ELKDGRFFEQYSKPLRLAEEIIGTVWSFRDITESQQAKEENRRIIQQEKQ
LAEDRAYFTSMIFHEFRNPLNIISYSTSLLKRHSHHWSEEKKLQCLQNLQ
TAVEQINQFTDEVLIIESVEAGKLQYELKPIDLNLFCREVLAEMSLYTKG
ASQFLLFQNK
>npun_22dec03_Contig1_revised_geneNpR3008 LSPYLEACCLRISASVSYQRAAEDIEYLTGVEVSKSVQQRLVHRQNFELP QVESTVEELSVDGGNIRIRTIKGQVCDWKGYKATCLHEKQAIAASFQENS LVIDWVKSQSIAPILTCLGDGHDGIWNIVRDFAPEHQRREVLDWFHLMEN LHKIGGSNQRLNQAKILLWQGKVDDAIAVFADCQLKQAFNFCTYLEKHRH RIVNYQYYQAEQICSIGSGAIESTVKQIDRRTKISGAQWKSDNVPQVLAQ RQSLSQWINLCSLNKNWDAPMKSSVERLSDYPVAR*
A new family of proteins?!A type of transposase?
transposase
...ATTTCTCTAGAAAGGCTGAAGGGGGGACAAGCACCCGAAAGCCTTTGTGCT...
...TAAAGAGATCTTTCCGACTTCCCCCCTGTTCGTGGGCTTTCGGAAACACGA...
...ATACAGTCAGCTTTATAGGCTTCATGTCGCCCCTTCAGCTAGAAAGGTACATA......TATGTCAGTCGAAATATCCGAAGTACAGCGGGGAAGTCGATCTTTCCATGTAT...
TRANSPOSON
A new family of proteins?!A type of transposase?
transposase
...ATTTCTCTAGAAAGGCTGAAGGGGGGACAAGCACCCGAAAGCCTTTGTGCT...
...TAAAGAGATCTTTCCGACTTCCCCCCTGTTCGTGGGCTTTCGGAAACACGA...
...ATACAGTCAGCTTTATAGGCTTCATGTCGCCCCTTCAGCTAGAAAGGTACATA......TATGTCAGTCGAAATATCCGAAGTACAGCGGGGAAGTCGATCTTTCCATGTAT...
TRANSPOSON
A new family of proteins?!A type of transposase?
...ATTTCTCTAGAAAGGCTGAAGGGGGGACAAGCACCCGAAAGCCTTTGTGCT...
...TAAAGAGATCTTTCCGACTTCCCCCCTGTTCGTGGGCTTTCGGAAACACGA...
...ATACAGTCAGCTTTATAGGCTTCATGTCGCCCCTTCAGCTAGAAAGGTACATA......TATGTCAGTCGAAATATCCGAAGTACAGCGGGGAAGTCGATCTTTCCATGTAT...
transposase
TRANSPOSON
A new family of proteins?!A type of transposase?
transposase
TRANSPOSON
Is Npr3008 a transposase?
AATAAAGCTTTACAAACCAAACTCTGGCTTCAATTGTGTAACCCAAGCTTTGATTCTTTCCTCTGTTAAATCGGATTGATTATCTTCATCAAGGGCAAGACCTACAAATTTACCATCACGAACAGCTTTAGACTCACTGAATTCATAACCTTCTGTAGGCCAATAGCCAACTGTTTCACCACCATTTTCTGAAATTTTTTCCTCTAGAATACCGAGGGCATCTTGAAATGTATCAGGATAACCAACCTGGTCTCCAGGAGCAAAATAAGCAACTTTTTTGCCGATGAAGTCAATGTTATCTAACTCATCATAAAAATTTTCCCAATCACTTTGCAATTCTCCAACATTCCAGGTAGGACAACCAACAACGATATAATCGTAGTTATTGAAATCACTTGGTTCAGCTTGTGAAATATCATATAAAGTTACAACACTATCACCACCAAACTCCTTCTGAATTATTTCTGATTCAGTTTGGGTATTGCCTGTTTGAGTACCAAAAAATAAACCAATATTAGACATTTTTACTCCTTTTATGTATTTGCAAAATTATTTCAATTAAAATATTTAGTAATAATTAATTGTTAGCTAGCTAATAATTAAATTTTTATTACAATCATTGTAAAAGGCATTGAAAAAGTAAATAAAAATTTTTATTCTACGTTATTTCAAAAATATTTACTTACATATACTTAACCTTTATAGTGATGTAATATACTCTAATTCCTATTTTACTTATAAATACCATCTCAGCTTAATGTAACGAATTTTTCTGTTTATCTTTAAATACAAAAAATTCAACAAAACTACAGAAAATTAATCTTAATAACACAAAACAAGTATCAATCTGTAATACAACTAAGCTTAAATAAATTAATAGAAAGCTTCATCTATCTAATAGGTTGAGAATAGTTTATGTCTAATGACATAAATTCATTCGTGTTGATTTCATTTGGGTATATTCATCTGATTTAGGATTTACTCCATTAAGTTTGTACTCATCAATGCCCGCCTGTTGGTATCCACAATTCTCATACAGTGCGCGAGCAAAGTAATCAATCGTTCGTCGCCATATCTAACTTTGAGTCAAACAAACCAGTTGGATTACCAACCCTCAACTAATCGCTTCTTTAAGGCGAGCGATCGCACATTTAACTGTTGGTTGTCACAAGAGAACTAATACTACAGCAGTATATTTAACAACTAAGGGTGGTTCAACTTTCGCTGCGACTCCTCCAACGCGCTGAAATACACAGGACTGATGCGATCGCAAACTCTTTGACTAAATTCCATACATTATCATGACCATCTCCCAAACAAACAAGTGGGTTAACCAGATGCTGACTATTAACATCCCCTGAGTTCGGAGTTGTAGGTCTATTTGACTGGTTCAAAGCGATGATGGAACGGCTTTGTTGCATGAATTAAAAAAAGACACACCATCACCTACTTCTAGGATAGACACATCAAACGTCCCACCGCCTAAGTCAAATACCAAGATAATTTCGTTAGTTTTCTTGTCAAGTCCGTAAGCGAGGGCCGCCGCCGTGGGCTAGTTGATAATTCGCAGAACTTTAATCCCGGCAATTCTACTGGCATCTTTGGTAGCCTGCCGTTGAGAGTCATTGAAATAGGCAGGGGTGGTAATTACCGCTTGCCTCACTGGTTCCCCCAGATATGTGCTGGCATCATCTATCAGCTTGCGGACTACCTCATACCATTTCACGAAAAACCTGATACACATGTAAACTCTGAAACCCTTGCTGTATCAAAGTTTTGTAATTACGAATTACGAATTACGAATTGATATCAGCCGAGATTTCTTCGGGTGAAAATTCCTTGTTCAGAGCGGGACAGTGTAGCTTGACATTGCCATTACTGTCACGTACCACTTTGTAAGTAACTTGTTTTGCCTCTTGCGTAACTTCATCATACCTGCGCCCGATGAACCGCTTCACAGAATAAAAAGTGTTTTCTGGGTTCATTACACCCTGGCGCTT
AATAAAGCTTTACAAACCAAACTCTGGCTTCAATTGTGTAACCCAAGCTTTGATTCTTTCCTCTGTTAAATCGGATTGATTATCTTCATCAAGGGCAAGACCTACAAATTTACCATCACGAACAGCTTTAGACTCACTGAATTCATAACCTTCTGTAGGCCAATAGCCAACTGTTTCACCACCATTTTCTGAAATTTTTTCCTCTAGAATACCGAGGGCATCTTGAAATGTATCAGGATAACCAACCTGGTCTCCAGGAGCAAAATAAGCAACTTTTTTGCCGATGAAGTCAATGTTATCTAACTCATCATAAAAATTTTCCCAATCACTTTGCAATTCTCCAACATTCCAGGTAGGACAACCAACAACGATATAATCGTAGTTATTGAAATCACTTGGTTCAGCTTGTGAAATATCATATAAAGTTACAACACTATCACCACCAAACTCCTTCTGAATTATTTCTGATTCAGTTTGGGTATTGCCTGTTTGAGTACCAAAAAATAAACCAATATTAGACATTTTTACTCCTTTTATGTATTTGCAAAATTATTTCAATTAAAATATTTAGTAATAATTAATTGTTAGCTAGCTAATAATTAAATTTTTATTACAATCATTGTAAAAGGCATTGAAAAAGTAAATAAAAATTTTTATTCTACGTTATTTCAAAAATATTTACTTACATATACTTAACCTTTATAGTGATGTAATATACTCTAATTCCTATTTTACTTATAAATACCATCTCAGCTTAATGTAACGAATTTTTCTGTTTATCTTTAAATACAAAAAATTCAACAAAACTACAGAAAATTAATCTTAATAACACAAAACAAGTATCAATCTGTAATACAACTAAGCTTAAATAAATTAATAGAAAGCTTCATCTATCTAATAGGTTGAGAATAGTTTATGTCTAATGACATAAATTCATTCGTGTTGATTTCATTTGGGTATATTCATCTGATTTAGGATTTACTCCATTAAGTTTGTACTCATCAATGCCCGCCTGTTGGTATCCACAATTCTCATACAGTGCGCGAGCAAAGTAATCAATCGTTCGTCGCCATATCTAACTTTGAGTCAAACAAACCAGTTGGATTACCAACCCTCAACTAATCGCTTCTTTAAGGCGAGCGATCGCACATTTAACTGTTGGTTGTCACAAGAGAACTAATACTACAGCAGTATATTTAACAACTAAGGGTGGTTCAACTTTCGCTGCGACTCCTCCAACGCGCTGAAATACACAGGACTGATGCGATCGCAAACTCTTTGACTAAATTCCATACATTATCATGACCATCTCCCAAACAAACAAGTGGGTTAACCAGATGCTGACTATTAACATCCCCTGAGTTCGGAGTTGTAGGTCTATTTGACTGGTTCAAAGCGATGATGGAACGGCTTTGTTGCATGAATTAAAAAAAGACACACCATCACCTACTTCTAGGATAGACACATCAAACGTCCCACCGCCTAAGTCAAATACCAAGATAATTTCGTTAGTTTTCTTGTCAAGTCCGTAAGCGAGGGCCGCCGCCGTGGGCTAGTTGATAATTCGCAGAACTTTAATCCCGGCAATTCTACTGGCATCTTTGGTAGCCTGCCGTTGAGAGTCATTGAAATAGGCAGGGGTGGTAATTACCGCTTGCCTCACTGGTTCCCCCAGATATGTGCTGGCATCATCTATCAGCTTGCGGACTACCTCATACCATTTCACGAAAAACCTGATACACATGTAAACTCTGAAACCCTTGCTGTATCAAAGTTTTGTAATTACGAATTACGAATTACGAATTGATATCAGCCGAGATTTCTTCGGGTGAAAATTCCTTGTTCAGAGCGGGACAGTGTAGCTTGACATTGCCATTACTGTCACGTACCACTTTGTAAGTAACTTGTTTTGCCTCTTGCGTAACTTCATCATACCTGCGCCCGATGAACCGCTTCACAGAATAAAAAGTGTTTTCTGGGTTCATTACACCCTGGCGCTT
AATAAAGCTT
TACAAAC
CAAACTCTGG
CTTCAAT
TGTGTAACCC
AAGCTTT
GATTCTTTCC
TCTGTTA
AATCGGATTG
ATTATCT
TCATCAAGG
GCAAGAC
CTACAAATTT
ACCATCA
CGAACAGCTT
TAGACTC
ACTGAATTCA
TAACCTT
CTGTAGGCC
AATAGCC
AACTGTTTCA
CCACCAT
TTTCTGAAAT
TTTTTCCT
CTAGAATACC
GAGGGC
ATCTTGAAAT
GTATCAG
GATAACCAAC
CTGGTCT
CCAGGAGC
AAAATAAG
CAACTTTTTT
GCCGATG
AAGTCAATGT
TATCTAA
CTCATCATAA
AAATTTT
CCCAATCACT
TTGCAAT
TCTCCAACAT
TCCAGGT
AGGACAACC
AACAACG
ATATAATCGT
AGTTATT
GAAATCACTT
GGTTCAG
CTTGTGAAAT
ATCATAT
AAAGTTACAA
CACTATC
ACCACCAAAC
TCCTTCT
GAATTATTTC
TGATTCA
GTTTGGGTAT
TGCCTGT
TTGAGTACCA
AAAAATA
AACCAATATT
AGACATT
TTTACTCCTT
TTATGTAT
TTGCAAAATT
ATTTCAA
TTAAAATATT
TAGTAAT
AATTAATTGT
TAGCTAG
CTAATAATTA
AATTTTTA
TTACAATCAT
TGTAAAA
GGCATTGAA
AAAGTAA
ATAAAAATTT
TTATTCTA
CGTTATTTCA
AAAATAT
TTACTTACAT
ATACTTAA
CCTTTATAGT
GATGTAA
TATACTCTAA
TTCCTATT
TTACTTATAA
ATACCAT
CTCAGCTTAA
TGTAACG
AATTTTTCTG
TTTATCTT
TAAATACAAA
AAATTCA
ACAAAACTAC
AGAAAAT
TAATCTTAAT
AACACAA
AACAAGTATC
AATCTGT
AATACAACTA
AGCTTAA
ATAAATTAAT
AGAAAGC
TTCATCTATC
TAATAGG
TTGAGAATAG
TTTATGT
CTAATGACAT
AAATTCA
TTCGTGTTGA
TTTCATT
TGGGTATAT
TCATCTGA
TTTAGGATTT
ACTCCAT
TAAGTTTGTA
CTCATCA
ATGCCCGCC
TGTTGGT
ATCCACAATT
CTCATAC
AGTGCGCGA
GCAAAGT
AATCAATCGT
TCGTCGC
CATATCTAAC
TTTGAGT
CAAACAAACC
AGTTGG
ATTACCAACC
CTCAACT
AATCGCTTCT
TTAAGGC
GAGCGATCG
CACATTTA
ACTGTTGGTT
GTCACAA
GAGAACTAA
TACTACAG
CAGTATATTT
AACAACT
AAGGGTGG
TTCAACTTT
CGCTGCGAC
TCCTCCA
ACGCGCTG
AAATACAC
AGGACTGAT
GCGATCG
CAAACTCTTT
GACTAAA
TTCCATACAT
TATCATG
ACCATCTCCC
AAACAAA
CAAGTGGGT
TAACCAG
ATGCTGACTA
TTAACAT
CCCCTGAGTT
CGGAGT
TGTAGGTCTA
TTTGACT
GGTTCAAAGC
GATGAT
GGAACGGC
TTTGTTGC
ATGAATTAAA
AAAAGAC
ACACCATCAC
CTACTTC
TAGGATAGAC
ACATCAA
ACGTCCCACC
GCCTAA
GTCAAATACC
AAGATAA
TTTCGTTAGT
TTTCTTGT
CAAGTCCGTA
AGCGAG
GGCCGCCG
CCGTGGGC
TAGTTGATAA
TTCGCAG
AACTTTAATC
CCGGCAA
TTCTACTGG
CATCTTTG
GTAGCCTGCC
GTTGAG
AGTCATTGAA
ATAGGCA
GGGGTGGTA
ATTACCG
CTTGCCTCAC
TGGTTCC
CCCAGATATG
TGCTGG
CATCATCTAT
CAGCTTG
CGGACTACC
TCATACCA
TTTCACGAAA
AACCTGA
TACACATGTA
AACTCTG
AAACCCTTGC
TGTATCA
AAGTTTTGTA
ATTACGA
ATTACGAATT
ACGAATT
GATATCAGC
CGAGATTT
CTTCGGGTG
AAAATTCC
TTGTTCAGAG
CGGGAC
AGTGTAGCTT
GACATTG
CCATTACTGT
CACGTAC
CACTTTGTAA
GTAACTT
GTTTTGCCTC
TTGCGTA
ACTTCATCAT
ACCTGCG
CCCGATGAA
CCGCTTC
ACAGAATAAA
AAGTGTT
TTCTGGGTTC
ATTACAC
CCTGGCGCTT
Observation
* Photos courtesy of www.webshots.com and Peter Smallwood
Observation
* Photos courtesy of www.webshots.com and Peter Smallwood
Observation
* Photos courtesy of www.webshots.com and Peter Smallwood
Observation
* Photos courtesy of www.webshots.com and Peter Smallwood
Filters: Information reducersSquirrel filter
Filters: Information reducersMolecular filter
TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC TTAGATAAAC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCACGCCC CTCCGTAAAC CTCTAACATG ATGTCAGCAA ATATTAAAAA TGAATAAACT TTGTTAAAGG TACAAATGAA AATTAGCAAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT CATTCTAGGG AAACCTGTAT GGTTACATGA ACTGCCTAAA AAACAAGCTA TTATATATTT TAAGAAATTA ATTGCAATTA ATTTCCTGGG CCCCAGCTGT CATTAAAAAG AGGCAAATAC AGCCAAGGAC GACAGCACTG ACCCTCAAGA AGGCACCGGC TGACAGACAG GCTGAAATTC CGCTGAGAGC AGAGTGGTAC ATTGAACCCT CCCTGCACCA GGTCTTTCCT GTGGGCACTG AGTGCAGACA ATGAATGACT GAACGAACGA TTGAATGAAA AGAAATGAGA TATGAGGCAA TCACAGCATC AGGTGACCTT AGTATCTATT CTCGGGAGCG CACGGCTCTA AAGAGGCCCA TATCCAGGCA CCTTTAGATG CAAGAAGGAG GAAACAGCTC GAAATCCCTG AGGCCGGAGG GTCAAGAACT CTCCACCGGC GGCAGCGGCC CCCCGGCCTA AGGCTGCCTG TGCTATAAAT ACGCGGCCCA TTCCCTGGGC TCGGCGGGAC AGATAACATG AATGTGCCCT
CTCCGTAAAC CTCTAAC...
Filters: Information reducersSequence filter
How do Biologists use Bioinformation?
Candidate genes Predicted genes
Interpolated Markov model
Gene finder
TCTACTTATA TTCAATCCAC AGGGCTACACAAGAGTCTGT TGAATGAACA CATACATGGTTTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA
What genes are in my organism?
Predicted genesCandidate genes Predicted genes
Conform to standard modelChallenge
accepted beliefs
How do Biologists use Bioinformation?
Gene finder
TCTACTTATA TTCAATCCAC AGGGCTACACAAGAGTCTGT TGAATGAACA CATACATGGTTTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA
What genes are in my organism?
Interpolated Markov model
Predicted genesCandidate genes Predicted genes
Conform to standard model
How do Biologists use Bioinformation?
Gene finder
TCTACTTATA TTCAATCCAC AGGGCTACACAAGAGTCTGT TGAATGAACA CATACATGGTTTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA
What genes are in my organism?
Interpolated Markov model
Predicted genesCandidate genes Predicted genes
Conform to standard modelChallenge
accepted beliefs
How do Biologists use Bioinformation?
Gene finder
TCTACTTATA TTCAATCCAC AGGGCTACACAAGAGTCTGT TGAATGAACA CATACATGGTTTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA
What genes are in my organism?
Interpolated Markov model
Filters are powerful
globin
Highly filtered output • Easy to grasp• High-level insights
Filters Constrain New Discovery
globin
Highly filtered output • Easy to grasp• High-level insights
Unfiltered output• Confusing• Basic insights
Filters are tempting
Globin
Filters are tempting
The Death of Science
Current State of Affairs
1. Need high-level filters
1. Need high-level filters
2. Need access to raw phenomena
AATAAAGCTTTACAAACCAAACTCTGGCTTCAATTGTGTAACCCAAGCTTTGATTCTTTCCTCTGTTAAATCGGATTGATTATCTTCATCAAGGGCAAGACCTACAAATTTACCATCACGAACAGCTTTAGACTCACTGAATTCATAACCTTCTGTAGGCCAATAGCCAACTGTTTCACCACCATTTTCTGAAATTTTTTCCTCTAGAATACCGCAACACTATCACCACCAAACTCCTTCTGAATTATTTCTGATTCAGTTTGGGTATTGCCTGTTTGAGTACCAAAAAATAAACCAATATTAGAC
Current State of Affairs
1. Need high-level filters
2. Need access to raw phenomena
3. Need ability to build new tools
ASSIGN K12-set FROM Gene-finder (K12-DNA)
ASSIGN O157-set FROM Gene-finder (O157-DNA)
CONSIDER EACH protein IN O157-set
WHEN Constituent-of (K12-set, protein) = FALSE
COLLECT protein
Current State of Affairs
We need…
Biologists . . .
. . . and Programmers
1. Need high-level filters
2. Need access to raw phenomena
3. Need ability to build new tools
Current State of Affairs
Need biologist programmers
AATAAAGCTTTACAAACCAAACTCTGGCTTCAATTGTGTAACCCAAGCTTTGATTCTTTCCTCTGTTAAATCGGATTGATTATCTTCATCAAGGGCAAGACCTACAAATTTACCATCACGAACAGCTTTGARYGACTCACTGAATTCLARATAACCTTCTGTAGGCCASONATAGCCAACTGTTTCACCACCATTTTCTGAAATTTTTTCCTCT
TATTCAAAATGAATTATATCGGTAACTTTAGTACAGAAAATGACGTTAAGAATATCTGCAACTTTAAACCTGAATGATATTATTATTGGCGGGCCTCCATGCCAGGGATTTAGTATTGCTGGGCCAGCCCAAAEALAVGIASTCCTAAAGATCCTAGAAATGGTTTAGAATTTTCATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAAAGCGTTTGTCATGGAAAACGTGAATTCAAAAGGATTGCTATCAAGGAAAAATGCAGAAGGTTTTAAAGTTATAGATATTATTAAGAAAACATTTGGAATTCGAGAACTTGGTTATTTTGTCGAAGTATGGGTTTTAAATGCTGCGGAATATGGCATTCCGCAAATTAGAGAACGGAATTCGATTTTTATTGTTGGCAATAAAAAAGGTAAAGTACTAGGTATTCCTAAAAAAACACATTCTCTGCAATTTTTAAGAATTCGATTTAAATAGGTCTCAATTATCGATCTTCGATGAT
ATGAGTATTATACCTGCACTAACTTTGTGGGACGCAATATCAGACTTACGAATTCGACAGAACTTAATGCGCGTGAAGGAAGTGAAGAGCAACCCTATCATTTAAAACCTCAAAATACTTATCAGACTTGGGCTAGAAATGGTAGTGGAATTCGATACGCTTTACAATCATGTTGCAATGGAACATTCTGACCGTTTAGTAGAACGTTTCCGGCATATAAAATGGGGTGAATCCAGTTCGGATGTATCTAAAGAAGAATTCGACATGGAGCTAGACGACGT
AGTGGTAATGGTGAATTATCAAACAAATCATATGATCAGAATAATCGCCGTTTAAATCCTCATAAACCGGAATTCGAATTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTGTCCATCCTTTTCAACATCGAAATTTAACAGCCCGTGAAGGAGCTAGAATCCAATCTTTTCCAGATAACTATAGATTTTTTGGAAAAGAATTCGAATTCAAACTGTCGTATCTCATAAACTATTGCATCGAGAAGAAAGATTTGATGAAAAATTTCTTTGTCAATATAATCAAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATCATCTTCTAGAGAAATTAGGAATTCGAATTCAGTTATGCCAACAACTGATAGAAATCCTCTAGTGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTACAAAATACAGAGATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAAATGGCATAAAGCAAATATGAACCTGGAATTCGAATTCGAGTTGGACCAAAATCAGAAATTACTGACCA
AGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATATAAAGATTTTTTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATCCAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGACGACTAGACGACGTAGCGAATTCGAATTCGAATTCATAATACGAGTCATAACGGCATATATG
GCAGCCTCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTGGTTTAACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCTGAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATCAGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCAGATGACTTGAATTCGAATTCGAATTCGAACAACTTTTTCCAGTAATTCTGGACGCTCTTCTAACAGTTCCATCAAAGTATAGGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACAACCACTTTTAACAAAAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCGTCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTCAGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTGACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCAGTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGAATTCGAATTCGAATTCGAATTCGAATTCGACAACCTGTTTTCAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTCGGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCCCATTCTGAGTCATTAAGGTCTGTAGAATAAGACTTTCGTCTCATTGTTTCCTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTTAGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTTATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATTGGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTCTGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATTTCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCAGAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAAAAGCGAGAAATCCTAACAGTTT
ATACCTTGTGGTTATGGAATGGATAAAACTGACCAATGATGTAAATTTACGAAAATATAAAGTTGATCAAATTTATGTACTACGTCAGCAAAAAAATACTGATAGAGAGTTTAGGTATGAGTCAACTTACATAAAAAAT
Why hasn’t this happened?
Part of bioinformatic program written in C
if (pcInFile == NULL) pfInFile = stdin;
else pfInFile = fopen(pcInFile, "r");
pfOutFile = fopen( pcOutFile, "w" );
if (pfInFile == NULL) { fprintf( stderr, "ERROR opening %s\n", pcInFile ); exit(1); }
if (pfOutFile == NULL) { fprintf( stderr, "ERROR opening %s\n", pcOutFile ); exit(1); }
fputc( fgetc(pfInFile), pfOutFile ); /* deal with first '>' in file */
for ( ; ; )
{
if (processIdentifier( pfInFile, pfOutFile )) { }
else { break; }
if (processSequence( pfInFile, pfOutFile )) { }
else { break; }
}
fclose( pfInFile );
fclose( pfOutFile );
Why hasn’t this happened?
Part of bioinformatic program written in Perl
sub match_positions {
my $pattern;
local $_;
($pattern, $_) = @_;
my @results;
local $matchStart;
my $instrumentedPattern = qr/(?{ $matchStart = pos() })$pattern/;
while (/$instrumentedPattern/g) {
my $nextStart = pos();
push @results, "[$matchStart..$nextStart)";
pos() = $matchStart+1;
}
return @results;
Why hasn’t this happened?
Biologists will not come to programming
Programming must come to biologists
BioLingua
Genetic Basis of Differentiation
NH3
Environmental Signal Developmental Response
Histidine Kinase
P
Response Regulator
? ? ?NpR3010
Genetic Basis of DifferentiationNpR3010
RR HKHK-upstream HK-downstream
HK-upstream HK-downstreamHKRR
Genetic Basis of DifferentiationNpR3010
BioLingua:: (#$Npun.NpF0304 #$Npun.NpR0355 #$Npun.NpR0450 #$Npun.NpF0484 #$Npun.NpR0589 #$Npun.NpF0832 #$Npun.NpF0906 #$Npun.NpR0956 #$Npun.NpF1084 #$Npun.NpF1085 #$Npun.NpR1109 #$Npun.NpF1184 #$Npun.NpF1278 #$Npun.NpR1450 #$Npun.NpF1453 #$Npun.NpF1516 #$Npun.NpR1633 #$Npun.NpR1678 #$Npun.NpR1683 #$Npun.NpR1688 #$Npun.NpF1776 #$Npun.NpR1779 #$Npun.NpF1800 #$Npun.NpR1903 #$Npun.NpR2091 #$Npun.NpF2162 #$Npun.NpR2263 #$Npun.NpF2346 #$Npun.NpF2364 #$Npun.NpR2420 #$Npun.NpR2902 #$Npun.NpF2972 #$Npun.NpR3053 #$Npun.NpF3084 #$Npun.NpR3197 #$Npun.NpR3241 #$Npun.NpF3659 #$Npun.NpF3676 #$Npun.NpR3733 #$Npun.NpF3829 #$Npun.NpR3907 #$Npun.NpR3959 #$Npun.NpF3972 #$Npun.NpR4101 #$Npun.NpR4160 #$Npun.NpR4165 #$Npun.NpF4214 #$Npun.NpR4435 #$Npun.NpF4460 #$Npun.NpR4503 #$Npun.NpR4743 #$Npun.NpR4768 #$Npun.NpF4909 #$Npun.NpR5015 #$Npun.NpF5034 #$Npun.NpF5044 #$Npun.NpR5135 #$Npun.NpR5136 #$Npun.NpR5316 #$Npun.NpF5361 #$Npun.NpF5636 #$Npun.NpF5682 #$Npun.NpF5759 #$Npun.NpF5763 #$Npun.NpF5788 #$Npun.NpR6014 #$Npun.NpR6015 #$Npun.NpR6228 #$Npun.NpF6321 #$Npun.NpR6360 #$Npun.NpF6363 #$Npun.pNpAF075 #$Npun.pNpBR039 #$Npun.pNpBF139 #$Npun.pNpBF146 #$Npun.pNpBR169 #$Npun.pNpBR170 #$Npun.pNpBF205 #$Npun.pNpEF003)
(GENES-DESCRIBED-BY "response regulator" IN Npun)<1>>
(GENE-UPSTREAM-OF NpF0304)<2>>
BioLingua:: #$Npun.NpF0303
(GENE-UPSTREAM-OF NpF0304)<2>>
(DESCRIPTIONS-OF *)<4>>
<3>>(GENES-UPSTREAM-OF (RESULT 1)):: (#$Npun.NpF0303 #$Npun.NpF0356 #$Npun.NpF0451 #$Npun.NpF0483 #$Npun.NpR0590 #$Npun.NpF0831 #$Npun.NpF0905 #$Npun.NpF0957 #$Npun.NpR1083 #$Npun.NpF1084 #$Npun.NpR1110 #$Npun.NpF1183 #$Npun.NpF1277 #$Npun.NpR1451 #$Npun.NpR1452 #$Npun.NpR1515 #$Npun.NpF1634 #$Npun.NpR1679 #$Npun.NpF1684 #$Npun.NpR1689 #$Npun.NpF1775 #$Npun.NpF1780 #$Npun.NpF1799 #$Npun.NpR1904 #$Npun.NpR2092 #$Npun.NpF2161 #$Npun.NpR2264 #$Npun.NpR2345 #$Npun.NpF2363 #$Npun.NpR2421 #$Npun.NpR2903 #$Npun.NpR2971 #$Npun.NpR3054 #$Npun.NpR3083 #$Npun.NpR3198 #$Npun.NpF3242 #$Npun.NpR3658 #$Npun.NpF3675 #$Npun.NpR3734 #$Npun.NpR3828 #$Npun.NpF3908 #$Npun.NpR3960 #$Npun.NpF3971 #$Npun.NpF4102 #$Npun.NpR4161 #$Npun.NpF4166 #$Npun.NpR4213 #$Npun.NpR4436 #$Npun.NpF4459 #$Npun.NpR4504 #$Npun.NpR4744 #$Npun.NpR4769 #$Npun.NpR4908 #$Npun.NpF5016 #$Npun.NpF5033 #$Npun.NpF5043 #$Npun.NpR5136 #$Npun.NpF5137 #$Npun.NpF5317 #$Npun.NpF5360 #$Npun.NpR5635 #$Npun.NpF5681 #$Npun.NpF5758 #$Npun.NpR5762 #$Npun.NpR5787 #$Npun.NpR6015 #$Npun.NpR6016 #$Npun.NpR6229 #$Npun.NpR6320 #$Npun.NpF6361 #$Npun.NpF6362 #$Npun.pNpAF074 #$Npun.pNpBR040 #$Npun.pNpBF138 #$Npun.pNpBF145 #$Npun.pNpBR170 #$Npun.pNpBR171 #$Npun.pNpBR204 #$Npun.pNpER002)
BioLingua:: ("two-component sensor histidine kinase [Nostoc sp. PCC 7120] gi|25531611|pir||AD2200 two- "unknown protein [Nostoc sp. PCC 7120] gi|25534386|pir||AH1981 hypothetical protein alr1403 "tmRNA-binding protein [Nostoc sp. PCC 7120] gi|22096164|sp|Q8YM70|SSRP_ANASP SsrA-binding protein "GTP-binding protein era homolog" "unknown protein [Nostoc sp. PCC 7120] gi|25533156|pir||AF2229 hypothetical protein asr3389 "ORF_ID:tlr0160~similar to ferredoxin [Thermosynechococcus elongatus BP-1] "hypothetical protein [Nostoc sp. PCC 7120] gi|25367067|pir||AH2295 hypothetical protein alr3919 "two-component hybrid sensor and regulator [Nostoc sp. PCC 7120] gi|25532444|pir||AE2276 two- "hypothetical protein [Nostoc sp. PCC 7120] gi|25358966|pir||AG2158 hypothetical protein alr2822 "two-component response regulator [Nostoc sp. PCC 7120] gi|25533086|pir||AF2158 two-component "probable two-component sensor histidine kinase [Gloeobacter violaceus] gi|35214672|dbj|BAC92039.1| "phytochrome-like protein [Tolypothrix sp. PCC 7601]" "two-component sensor histidine kinase [Nostoc sp. PCC 7120] gi|25530471|pir||AC1860 two-component NIL NIL NIL "hypothetical protein [Nostoc sp. PCC 7120] gi|25535333|pir||AI2179 hypothetical protein all2992 NIL "unknown protein [Nostoc sp. PCC 7120] gi|25535440|pir||AI2275 hypothetical protein alr3760 "transcriptional regulator [Nostoc sp. PCC 7120] gi|25302898|pir||AB2544 transcription regulator "similar to two-component sensor histidine kinase [Nostoc sp. PCC 7120] gi|25531791|pir||AD2385 "putative gluconolactonase precursor [Sinorhizobium meliloti] gi|25369832|pir||G95274 probable "similar to two-component sensor histidine kinase [Nostoc sp. PCC 7120] gi|25531791|pir||AD2385 "hypothetical protein [Nostoc sp. PCC 7120] gi|25530521|pir||AC1903 hypothetical protein asr0773 . . .
DESCRIPTIONS-OF *)<4>>
BioLingua
:: "List of length 79 suppressed"
(DEFINE RR-class AS (GENES-DESCRIBED-BY "response regulator" IN Npun) DISPLAY off)
<5>>
(INTERSECTION-OF (HK-adjacent RR-class)) <10>>
(DEFINE HK-class AS (GENES-DESCRIBED-BY “histidine kinase" IN Npun) DISPLAY off)
<6>>
:: "List of length 89 suppressed"
(DEFINE HK-upstream AS (GENES-UPSTREAM-OF HK-class) DISPLAY off)
<7>>
:: "List of length 89 suppressed"
(DEFINE HK-downstream AS (GENES-DOWNSTREAM-OF HK-class) DISPLAY off)
<8>>
:: "List of length 89 suppressed"
(DEFINE HK-adjacent AS (UNION-OF (HK-upstream HK-downstream)) DISPLAY off)
<9>>
:: "List of length 178 suppressed"
BioLingua:: 22 elements in INTERSECTION> (#$Npun.pNpBF205 #$Npun.pNpBF139 #$Npun.NpR6228 #$Npun.NpR5316 #$Npun.NpF4214 #$Npun.NpF3676 #$Npun.NpF3084 #$Npun.NpR3053 #$Npun.NpR1779 #$Npun.NpR0589 #$Npun.NpF0304 #$Npun.NpR1109 #$Npun.NpF1278 #$Npun.NpF1776 #$Npun.NpF1800 #$Npun.NpR2420 #$Npun.NpR2902 #$Npun.NpR3197 #$Npun.NpR4503 #$Npun.NpF5763 #$Npun.NpF6363 #$Npun.pNpBF146)
(INTERSECTION-OF (HK-adjacent RR-class))<10>>
(DEFINE RR-candidates AS (SET-DIFFERENCE RR-class (RESULT 10)) DISPLAY off)
<11>>
:: "List of length 57 suppressed"
<12>>
Histidine Kinase
NpR3010Nostoc punctiforme
Genes Functionally Related to His Kinase
Anabaena PCC 7120
Trichodesmium
Synechocystis PCC 6803
. . . (13 total) Find similar genes
Conserved
BioLingua:: 24 elements in INTERSECTION> (#$Npun.pNpBF205 #$Npun.pNpBF139 #$Npun.NpR6228 #$Npun.NpR5316 #$Npun.NpF4214 #$Npun.NpF3676 #$Npun.NpF3084 #$Npun.NpR3053 #$Npun.NpR1779 #$Npun.NpR0589 #$Npun.NpF0304 #$Npun.NpR1109 #$Npun.NpF1278 #$Npun.NpF1776 #$Npun.NpF1800 #$Npun.NpR2420 #$Npun.NpR2902 #$Npun.NpR3197 #$Npun.NpR4503 #$Npun.NpF5763 #$Npun.NpF6363 #$Npun.pNpBF146)
(INTERSECTION-OF (RR-adjacent HK-class))<10>>
(DEFINE RR-candidates AS (SET-DIFFERENCE RR-class (RESULT 10)) DISPLAY off)
<11>>
:: "List of length 57 suppressed"
(CONTEXT-OF NpF0304)<12>>
(ALL-ORTHOLOGS-OF *)<13>>
:: (<- #$Npun.NpR0302 potassium-dependent ATPase sub) 523 (-> #$Npun.NpF0303 two-component sensor histidine) 85 (-> #$Npun.NpF0304 two-component response regulat) 473 (-> #$Npun.NpF0305 hypothetical protein glr0895 [) 85 (<- #$Npun.NpR0306 primosomal protein N' [Nostoc ) > (#$Npun.NpR0302 #$Npun.NpF0303 #$Npun.NpF0304 #$Npun.NpF0305 #$Npun.NpR0306)
BioLingua
:: ((#$S7942.sef0159 #$Npun.NpR0302 #$Gvi.glr0573 #$A29413.Av?3368 #$A7120.all3154) (#$S6803.sll1590 #$Npun.NpF0303 #$Gvi.gll0572 #$A29413.Av?1247 #$A7120.alr3155) (#$S6803.sll1592 #$P9313.PMT1405 #$Npun.NpF0304 #$Gvi.gll0571 #$A29413.Av?1248 #$A7120.alr3156) (#$Tery.Te?7017 #$Npun.NpF0305 #$Cwat.Cw?3050) (#$Tery.Te?2243 #$TeBP1.tll0415 #$S6803.sll0270 #$S8102.SynW1782 #$S7942.sef1895 #$PRO1375.Pro0497 #$P9313.PMT1271 #$PMED4.PMM0497 #$Npun.NpR0306 #$Gvi.gll0025 #$Cwat.Cw?3016 #$A29413.Av?5206 #$A7120.all4248))
(ALL-ORTHOLOGS-OF *)<13>>
<14>>
(CONTEXT-OF NpF0304)<12>>:: (<- #$Npun.NpR0302 potassium-dependent ATPase sub) 523 (-> #$Npun.NpF0303 two-component sensor histidine) 85 (-> #$Npun.NpF0304 two-component response regulat) 473 (-> #$Npun.NpF0305 hypothetical protein glr0895 [) 85 (<- #$Npun.NpR0306 primosomal protein N' [Nostoc ) > (#$Npun.NpR0302 #$Npun.NpF0303 #$Npun.NpF0304 #$Npun.NpF0305 #$Npun.NpR0306)
A new family of proteins?!A type of transposase?
transposase
TRANSPOSON
Is Npr3008 a transposase?
BioLingua
:: Query Q-Start Q-End Subject S-Start S-End E-value %ID 1. "Seq 1" 1 2258 #$Npun.chromosome 3706846 3704589 0.0 100.0 2. "Seq 1" 293 1511 #$Npun.chromosome 4008429 4009647 0.0 100.0 3. "Seq 1" 293 1512 #$Npun.chromosome 7932036 7930817 0.0 99.92 4. "Seq 1" 293 1510 #$Npun.chromosome 4228111 4229328 0.0 99.92 5. "Seq 1" 293 1510 #$Npun.chromosome 3971285 3972502 0.0 99.92 6. "Seq 1" 293 1510 #$Npun.chromosome 4027833 4029050 0.0 99.75 7. "Seq 1" 293 1511 #$Npun.chromosome 2121987 2123204 0.0 99.67 8. "Seq 1" 293 1510 #$Npun.chromosome 2136737 2135521 0.0 99.67 9. "Seq 1" 397 1510 #$Npun.chromosome 2030748 2031861 0.0 99.64 10. "Seq 1" 1537 2258 #$Npun.pNpB 42015 42737 4.6d-83 80.5 11. "Seq 1" 1331 1420 #$Npun.chromosome 8036134 8036045 1.8d-8 83.33 12. "Seq 1" 1319 1385 #$Npun.chromosome 5915424 5915358 2.7d-4 83.58 13. "Seq 1" 1319 1385 #$Npun.chromosome 2577387 2577453 2.7d-4 83.58> (#$Temp27 #$Temp28 #$Temp29 #$Temp30 #$Temp31 #$Temp32 #$Temp33 #$Temp34 #$Temp35 #$Temp36 #$Temp37 #$Temp38 #$Temp39)
(BLAST extended-NpR3008 Npun) <15>>
<16>>
(DEFINE extended-NpR3008 AS (SEQUENCE-OF NpR3008 FROM -700 TO-END +700) DISPLAY off)
<14>>
:: “Results suppressed"
BioLingua
:: Query Q-Start Q-End Subject S-Start S-End E-value %ID 1. "Seq 1" 1 2258 #$Npun.chromosome 3706846 3704589 0.0 100.0 2. "Seq 1" 293 1511 #$Npun.chromosome 4008429 4009647 0.0 100.0 . . .
(BLAST extended-NpR3008 Npun) <15>>
<16>>
(DEFINE extended-NpR3008 AS (SEQUENCE-OF NpR3008 FROM -700 TO-END +700) DISPLAY off)
<14>>
:: “Results suppressed"
(FOR-EACH hit IN * AS (subj S-start) = (GET-ELEMENTS (subject Subject-start) FROM hit) AS start = (- S-start 15) AS end = (+ S-start 40) AS left-end = (SEQUENCE-OF subj FROM start TO end) COLLECT left-end)
BioLingua
:: Query Q-Start Q-End Subject S-Start S-End E-value %ID 1. "Seq 1" 1 2258 #$Npun.chromosome 3706846 3704589 0.0 100.0 2. "Seq 1" 293 1511 #$Npun.chromosome 4008429 4009647 0.0 100.0 . . .
(BLAST extended-NpR3008 Npun) <15>>
<16>>
(DEFINE extended-NpR3008 AS (SEQUENCE-OF NpR3008 FROM -700 TO-END +700) DISPLAY off)
<14>>
:: “Results suppressed"
(FOR-EACH hit IN * AS (subj S-start) = (GET-ELEMENTS (subject Subject-start) FROM hit) AS start = (- S-start 15) AS end = (+ S-start 40) AS left-end = (SEQUENCE-OF subj FROM start TO end) COLLECT left-end)
:: > ("TACGCTCTATCTTCAGCAAGTTGTTTTTCTTGCTGTATAATTCGGCGATTCTCTTC" "AAAGAAACGCTAGAGGGGTGCATCCCAGTTTTTATTATTCCAAAACAAATAAATAA" "AAACTGGGATGCACCCCTTATTAATGCTCTTTGGAGTCAATACTAATTTTGCCAAA" "TACCTTTGTGATAGGGGGTGCATCCCAGTTTTTATTATTCCAAAACAAATAAATAA" "AAATTAGTTTATTATGGGTGCATCCCAGTTTTTATTATTCCAAAACAAATAAATAA" "CACCGATTCACTAATGGGTGCATCCCAGTTTTTATTATTCCAAAACAAATAAATAA" "ACTATTGTAGAGACTGGGTGCATCCCAGTTTTTATTATTCCAAAACAAATAAATAA" . . .
BioLingua(ALIGNMENT-OF * LINE-LENGTH 60 SEGMENT-LENGTH 60) <17>>
:: Seq 4 1 TACCTTTGT-GATAGGGGGTGCATCCCAGTTTTTATTAT--TCCAAAACAAATAAATAA--------------- Seq 7 1 -ACTATTGTAGAGACTGGGTGCATCCCAGTTTTTATTAT--TCCAAAACAAATAAATAA--------------- Seq 2 1 -AAAGAAACGCTAGAGGGGTGCATCCCAGTTTTTATTAT--TCCAAAACAAATAAATAA--------------- Seq 5 1 AAATTAGTTTATTA-TGGGTGCATCCCAGTTTTTATTAT--TCCAAAACAAATAAATAA--------------- Seq 6 1 -CACCGATTCACTAATGGGTGCATCCCAGTTTTTATTAT--TCCAAAACAAATAAATAA--------------- Seq 8 1 ----------AAACTGGGATGCA-CCCAGTCTCTACAATAGTTCTAGA-GAACACATAACGTAAATAC------ Seq 3 1 ----------AAACTGGGATGCACCCC--TTATTAATGCTCTTTGGAGTCAATAC-TAATTTTGCCAAA----- Seq 9 1 -----------CATTGTCGCCCCTTGAAGTCATCAAGAC-----TAGGTGTATCAATGACTCCTGAAGAAGA-- Seq 12 1 ------------------GTTCAGCTTGGTAATAGCTGTAGTTAATAATGCGAGAGCGATGTTTTTCGAGATAA Seq 1 1 ---------TACGCTCTATCTTCAGCAAGTTGTTTTTCT--TGCTGTATAATTCGGCGATTCTCTTC------- Seq 10 1 --------------GGTCGGGAAATTGCGAGATTATTCAGTGGCGAAGTAGTGGGAGAACTACCATTGAT---- Seq 11 1 ------------TTGAACAAATTTGTTCGTGGAAATGGTAATTGGAAATTTGCTGCGGAATGCGGTGA------ Seq 13 1 ------------ATTATTAACTACAGCTATTACCAAGCTGAACAACTGTGTTCTATTGGTTCTGGTTC------ consensus 1
Genetic Basis of Differentiation
NH3 N2
NH3
Nostoc + Anabaena
Not Synechocystis, Trichodesmium,…
BioLingua(DEFINE diff-cb AS (Npun Avar A7120) DISPLAY off)<18>>
:: "List of length 3 suppressed"
(DEFINE non-diff-cb AS (REMOVE-FROM-SET *loaded-organisms* diff-cb) DISPLAY off)
<19>>
:: "List of length 10 suppressed"
(DEFINE diff-cb-specific AS (COMMON-ORTHOLOGS-OF diff-cb NOT-IN non-diff-cb) DISPLAY off)
<20>>
:: "List of length 661 suppressed"
BioLingua
• Provides knowledge in accessible form
• Provides tools accessed in common way
• Provides results that can be manipulated
• Provides a programming language that speaks to biologists
The Death of Science
CreditsWest Coast
- Jeff Shrager - JP Massar - Mike Travers
VCU
- Austin Hess - James Mastros - Sarah Cousins - Yue Zhao
BioLingua: http://ramsites.net/~biolingua/help
Jeff Elhai: Center for the Study of Biological Complexity Virginia Commonwealth University
Phone: 828-0794 E-mail: [email protected]