29
David Evans http://www.cs.virginia.edu/ ~evans CS200: Computer Science University of Virginia Computer Science Class 38: Fixed Points and Biological Computing

David Evans cs.virginia/~evans

  • Upload
    judah

  • View
    19

  • Download
    0

Embed Size (px)

DESCRIPTION

Class 38: Fixed Points and Biological Computing. David Evans http://www.cs.virginia.edu/~evans. CS200: Computer Science University of Virginia Computer Science. Menu. Making Recursive Definitions without define Computing with DNA How Biology Programs. Lambda Calculus. - PowerPoint PPT Presentation

Citation preview

Page 1: David Evans cs.virginia/~evans

David Evanshttp://www.cs.virginia.edu/~evans

CS200: Computer ScienceUniversity of VirginiaComputer Science

Class 38:Fixed Pointsand Biological Computing

Page 2: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 2

Menu

• Making Recursive Definitions without define

• Computing with DNA• How Biology Programs

Page 3: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 3

Lambda Calculusterm ::= variable |term term | (term)| variable . term

-reduction (renaming) y. M v. (M [y v])

where v does not occur in M.

-reduction (substitution) (x. M)N M [ x N ]

Page 4: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 4

Lambda Calculus is a Universal Computer

z z z z z z z z z z z z z z z zz z z z

1

Start

HALT

), X, L

2: look for (

#, 1, -

), #, R

(, #, L

(, X, R

#, 0, -

Finite State Machine

• Read/Write Infinite Tape Mutable Lists• Finite State Machine Numbers to keep track of state• Processing Way of making decisions (if) Way to keep going

We have this, butwe cheated using to make recursive definitions!

Page 5: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 5

Fixed Point Theorem• The fixed point of a function f, is a

value x such that f(x) = x • If we can find the fixed point of

our Turing Machine simulator, then we have something that keeps going until it halts!

fixed-point TM input result of running TM on input

Page 6: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 6

All Lambda Calculus Terms have Fixed Points!

• For any Lambda Calculus term F, there exists a Lambda Calculus Term X such that FX = X

• Proof:Let W = x.F(xx) and X = WW.X = WW = ( x.F(xx))W F (WW) = FX

We canmake F a parameter!

Page 7: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 7

Why of Y?• Y is f. WW:

Y f. ( x.f (xx)) ( x. f (xx))• Y calculates a fixed point of any lambda

term!• Hence: we don’t need define to do

recursion!• Works in Scheme too - check the

“lecture” from the Adventure Game

Page 8: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 8

Lambda Calculus is Turing Universal!

• All you need is beta-reduction and you can compute anything

• This is just one way of representing numbers, if, etc. – many others are possible

• Integers, booleans, if, while, +, *, =, <, classes, define, inheritance, etc. are for wimps! Real programmers only use .

Page 9: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 9

Models of Computation

• Mechanical: Turing Machine• Symbolic: Lambda Calculus• Next: Biological

Page 10: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 10

Computing with DNA

Leonard Adleman (Mathematical Consultant for Sneakers), 1995

Page 11: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 11

DNA

• Sequence of nucleotides: adenine (A), guanine (G), cytosine (C), and thymine (T)

• Two strands, A must attach to T and G must attach to C

A

G

TCC

Page 12: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 12

Hamiltonian Path Problem

• Input: a graph, start vertex and end vertex• Output: either a path from start to end that

touches each vertex in the graph exactly once, or false indicating no such path exists

CHO

RIC

IAD

BWIstart: CHOend: BWI

Hamiltonian Pathis NP-Complete

Page 13: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 13

Encoding The Graph• Make up a two random 4-nucleotide sequences

for each city:CHO: CHO1 = ACTT CHO2 = gcagRIC: RIC1 = TCGG RIC2 = actgIAD: IAD1 = GGCT IAD2 = atgtBWI: BWI1 = GATC BWI2 = tcca

• If there is a link between two cities (AB), create a nucleotide sequence: A2B1 CHORIC gcagTCGGRICCHO actgACTT Based on Fred Hapgood’s notes

on Adelman’s talkhttp://www.mitre.org/research/nanotech/hapgood_on_dna.html

Page 14: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 14

Encoding The Problem• Each city nucleotide sequence binds with its

complement (A T, G C) :CHO: CHO1 = ACTT CHO2 = gcagCHO’: TGAA cgtcRIC: TCGGactgRIC’: AGCCtgacIAD: GGCTatgt IAD’ = CCGAtacaBWI: GATCtcca BWI’ = CTAGaggt

• Mix up all the link and complement DNA strands – they will bind to show a path!

Page 15: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 15

Path Binding

CHO

RIC

IAD

BWIACTTgcag

TCGGactg

GATCtcca

GGCTatgt

CHO’TGAAcgtc

gcagGGCTCHOIAD

IAD’CCGAtaca

atgtTCGG IADRIC

RIC’AGCCtgac

BWI’CTAGaggt

actgGATC RICBWI

Page 16: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 16

Getting the Solution• Extract DNA strands starting with CHO and

ending with BWI – Easy way is to remove all strands that do not

start with CHO, and then remove all strands that do not end with BWI

• Measure remaining strands to find ones with the right weight (7 * 8 nucleotides)

• Read the sequence from one of these strands

Page 17: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 17

Why don’t we solve NP-Complete problems this way?

• Speed: shaking up the DNA strands does 1014 operations per second ($400M supercomputer does 1010)

• Memory: we can store information in DNA at 1 bit per cubic nanometer

• How much DNA would you need?– Volume of DNA needed grows exponentially

with input size– To solve ~45 vertices, you need ~20M gallons

Page 18: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 18

DNA-Enhanced PC

Page 19: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 19

How does Nature program?

Page 20: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 20

How Big is the Make-a-Human Program?

• 3 Billion Base Pairs– Each nucleotide is 2 bits (4 possibilities)– 3 B pairs * 1 byte/4 pairs = 750 MB

1 CD ~ 650 MB

Page 21: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 21

Encoding is Redundant

• DNA encodes proteins• Every sequence of 3 base pairs one of 20

amino acids (or stop codon)– 21 possible codons, but 43 = 64 possible

values– So, really only 750MB * (21/64) ~ 250 MB

Page 22: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 22

People are almost all the Same

• Genetic code for 2 humans differs in only 2.1 million bases– 4 million bits = 0.5 MB

Page 23: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 23

How big is .5 MB?

• 1/3 of a floppy disk

• <1% of Windows 2000

• ~22 times the size of the PS6 adventure game code

Page 24: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 24

Is DNA Really a Programming Language?

Page 25: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 25

Nerdy Linguist’s Definition

A description of pairs (S, M), where S stands for sound, or any kind of surface forms, and M stands for meaning. A theory of language must specify the properties of S and M, and how they are related.

Page 26: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 26

Programming Language(Definition from Lecture 1)

A description of pairs (S, M), where S stands for sound, or any kind of surface forms, and M stands for meaning intended to be read and written by humans and processed by machines.

Page 27: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 27

Stuff Programming Languages are Made Of

• Primitives

• Means of Combination

• Means of Abstraction

codons (sequence of 3 nucleotides that encodes a protein)

?? Morphogenesis? Not well understood (by anyone).

DNA itself – separate proteins from their encodingGenes – group DNA by function (sort of)Chromosomes – package Genes togetherOrganisms – packages for reproducing Genes

This is where most of the expressiveness comes from!

Page 28: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 28

Biology is (becoming) a subfield of Computer Science

• Biological mechanisms are mostly understood (proteomics still has a way to go)

• What is not understood is how those are combined to create meaning

Page 29: David Evans cs.virginia/~evans

24 April 2002 CS 200 Spring 2002 29

Charge• Noon (now): President Casteen’s State of

the University in Old Cabal Hall– Extra credit question: “Given that Computer

Science is the most liberal art, how come UVa College students are not able to major in Computer Science?”

• Friday: review– Chance to ask questions about anything you

want