19
Application of Application of Unstructured Learning in Unstructured Learning in Computational Biology Computational Biology Tony C Smith Tony C Smith Department of Computer Science Department of Computer Science University of Waikato University of Waikato [email protected] [email protected]

Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato [email protected]

Embed Size (px)

Citation preview

Page 1: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Application of Unstructured Application of Unstructured Learning in Computational BiologyLearning in Computational Biology

Tony C SmithTony C SmithDepartment of Computer ScienceDepartment of Computer Science

University of WaikatoUniversity of [email protected]@cs.waikato.ac.nz

Page 2: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Unstructured learning in computational biology Tony C Smith

ComputabilityComputability

Before computers were built, Before computers were built, mathematicians knew what they could domathematicians knew what they could do

• arithmetic (e.g. missile trajectories)arithmetic (e.g. missile trajectories)• search (e.g. keys for secret codes)search (e.g. keys for secret codes)• sort (census information)sort (census information)• … … anything with a mathematical algorithmanything with a mathematical algorithm

Page 3: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Unstructured learning in computational biology Tony C Smith

Artificial IntelligenceArtificial Intelligence

Computers do things Computers do things only human brains only human brains can otherwise docan otherwise do

expert expert

Page 4: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Unstructured learning in computational biology Tony C Smith

Artificial IntelligenceArtificial Intelligence

Computers do things Computers do things only human brains only human brains can otherwise docan otherwise do

expertsystem

expert

Page 5: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Unstructured learning in computational biology Tony C Smith

Artificial IntelligenceArtificial Intelligence

Computers do things Computers do things only human brains only human brains can otherwise docan otherwise do

learningsystem

expertsystem

Page 6: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Unstructured learning in computational biology Tony C Smith

Machine learningMachine learning

creating computer programs that get better with experiencecreating computer programs that get better with experiencelearn how to make expert judgmentslearn how to make expert judgmentsdiscover previously hidden, potentially useful information (data discover previously hidden, potentially useful information (data mining)mining)

What is machine learning?

How does it work?user provides learning system with examples of concept to be learneduser provides learning system with examples of concept to be learned

induction algorithm infers a characteristic model of the examplesinduction algorithm infers a characteristic model of the examples

model is used to predict whether or not future novel instances are also model is used to predict whether or not future novel instances are also examples – and it does this very consistently, and very, very quickly!examples – and it does this very consistently, and very, very quickly!

Page 7: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Unstructured learning in computational biology Tony C Smith

WeightWeight DamageDamage DirtDirt FirmnessFirmness QualityQuality

heavy high mild hard poorheavy high mild soft poornormal high mild hard goodlight medium mild hard goodLight clear clean hard goodnormal clear clean soft poorheavy medium mild hard poor. . .

Mushroom DataMushroom Data

weightweight

goodgooddirtdirt firmnessfirmness

poorpoor

heavyheavy lightlight normalnormal

mildmild cleanclean hardhard softsoft

poorpoorgoodgood goodgood

Structured learningStructured learning

Page 8: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Unstructured learning in computational biology Tony C Smith

Unstructured learningUnstructured learningdata does not have fixed fields with specific valuesdata does not have fixed fields with specific values

examples: images, continuous signals, expression data, examples: images, continuous signals, expression data, texttext

learning proceeds by correlating the presence or absence learning proceeds by correlating the presence or absence of any and all salient attributesof any and all salient attributes

Document ClassificationDocument Classificationgiven examples of documents covering some topic, learn a given examples of documents covering some topic, learn a semantic model that can recognize whether or not other semantic model that can recognize whether or not other documents are relevantdocuments are relevant

prioritize them: i.e. quantify “how relevant” documents are prioritize them: i.e. quantify “how relevant” documents are to the topicto the topic

not limited to keywords (nor is it misled by them)not limited to keywords (nor is it misled by them)

adapt to the user’s needs (ephemeral or long-term)adapt to the user’s needs (ephemeral or long-term)

Page 9: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Unstructured learning in computational biology Tony C Smith

Document classification demoDocument classification demo

Page 10: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Unstructured learning in computational biology Tony C Smith

bioinformaticsbioinformatics

Finding genesFinding genes

Determining gene rolesDetermining gene roles

Determining protein functionsDetermining protein functions

•Empirical tests•Sequence similarity comparison•Literature

Page 11: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Unstructured learning in computational biology Tony C Smith

GO-KDS demoGO-KDS demo

Page 12: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Unstructured learning in computational biology Tony C Smith

Amidegroup

Carboxylgroup

R group

Amino AcidAmino Acid

Page 13: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Unstructured learning in computational biology Tony C Smith

Amino AcidAmino Acid

glycine

tyrosine

Page 14: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Unstructured learning in computational biology Tony C Smith

DNA encodes amino acidsDNA encodes amino acids

Page 15: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Unstructured learning in computational biology Tony C Smith

Page 16: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Unstructured learning in computational biology Tony C Smith

Page 17: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Unstructured learning in computational biology Tony C Smith

Page 18: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Unstructured learning in computational biology Tony C Smith

Rasmol demoRasmol demo

Page 19: Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz

Unstructured learning in computational biology Tony C Smith

BiotechnologyBiotechnology

Biologists know proteins, computer Biologists know proteins, computer scientists know machine learningscientists know machine learning

Together, they can find out a lot of hidden Together, they can find out a lot of hidden information about genes and proteinsinformation about genes and proteins

Biotechnology is a multi-billion dollar Biotechnology is a multi-billion dollar industryindustry

Biotechnology is one of the best funded Biotechnology is one of the best funded areas of scientific research areas of scientific research