Upload
hope-hutchinson
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Application of Unstructured Application of Unstructured Learning in Computational BiologyLearning in Computational Biology
Tony C SmithTony C SmithDepartment of Computer ScienceDepartment of Computer Science
University of WaikatoUniversity of [email protected]@cs.waikato.ac.nz
Unstructured learning in computational biology Tony C Smith
ComputabilityComputability
Before computers were built, Before computers were built, mathematicians knew what they could domathematicians knew what they could do
• arithmetic (e.g. missile trajectories)arithmetic (e.g. missile trajectories)• search (e.g. keys for secret codes)search (e.g. keys for secret codes)• sort (census information)sort (census information)• … … anything with a mathematical algorithmanything with a mathematical algorithm
Unstructured learning in computational biology Tony C Smith
Artificial IntelligenceArtificial Intelligence
Computers do things Computers do things only human brains only human brains can otherwise docan otherwise do
expert expert
Unstructured learning in computational biology Tony C Smith
Artificial IntelligenceArtificial Intelligence
Computers do things Computers do things only human brains only human brains can otherwise docan otherwise do
expertsystem
expert
Unstructured learning in computational biology Tony C Smith
Artificial IntelligenceArtificial Intelligence
Computers do things Computers do things only human brains only human brains can otherwise docan otherwise do
learningsystem
expertsystem
Unstructured learning in computational biology Tony C Smith
Machine learningMachine learning
creating computer programs that get better with experiencecreating computer programs that get better with experiencelearn how to make expert judgmentslearn how to make expert judgmentsdiscover previously hidden, potentially useful information (data discover previously hidden, potentially useful information (data mining)mining)
What is machine learning?
How does it work?user provides learning system with examples of concept to be learneduser provides learning system with examples of concept to be learned
induction algorithm infers a characteristic model of the examplesinduction algorithm infers a characteristic model of the examples
model is used to predict whether or not future novel instances are also model is used to predict whether or not future novel instances are also examples – and it does this very consistently, and very, very quickly!examples – and it does this very consistently, and very, very quickly!
Unstructured learning in computational biology Tony C Smith
WeightWeight DamageDamage DirtDirt FirmnessFirmness QualityQuality
heavy high mild hard poorheavy high mild soft poornormal high mild hard goodlight medium mild hard goodLight clear clean hard goodnormal clear clean soft poorheavy medium mild hard poor. . .
Mushroom DataMushroom Data
weightweight
goodgooddirtdirt firmnessfirmness
poorpoor
heavyheavy lightlight normalnormal
mildmild cleanclean hardhard softsoft
poorpoorgoodgood goodgood
Structured learningStructured learning
Unstructured learning in computational biology Tony C Smith
Unstructured learningUnstructured learningdata does not have fixed fields with specific valuesdata does not have fixed fields with specific values
examples: images, continuous signals, expression data, examples: images, continuous signals, expression data, texttext
learning proceeds by correlating the presence or absence learning proceeds by correlating the presence or absence of any and all salient attributesof any and all salient attributes
Document ClassificationDocument Classificationgiven examples of documents covering some topic, learn a given examples of documents covering some topic, learn a semantic model that can recognize whether or not other semantic model that can recognize whether or not other documents are relevantdocuments are relevant
prioritize them: i.e. quantify “how relevant” documents are prioritize them: i.e. quantify “how relevant” documents are to the topicto the topic
not limited to keywords (nor is it misled by them)not limited to keywords (nor is it misled by them)
adapt to the user’s needs (ephemeral or long-term)adapt to the user’s needs (ephemeral or long-term)
Unstructured learning in computational biology Tony C Smith
Document classification demoDocument classification demo
Unstructured learning in computational biology Tony C Smith
bioinformaticsbioinformatics
Finding genesFinding genes
Determining gene rolesDetermining gene roles
Determining protein functionsDetermining protein functions
•Empirical tests•Sequence similarity comparison•Literature
Unstructured learning in computational biology Tony C Smith
GO-KDS demoGO-KDS demo
Unstructured learning in computational biology Tony C Smith
Amidegroup
Carboxylgroup
R group
Amino AcidAmino Acid
Unstructured learning in computational biology Tony C Smith
Amino AcidAmino Acid
glycine
tyrosine
Unstructured learning in computational biology Tony C Smith
DNA encodes amino acidsDNA encodes amino acids
Unstructured learning in computational biology Tony C Smith
Unstructured learning in computational biology Tony C Smith
Unstructured learning in computational biology Tony C Smith
Unstructured learning in computational biology Tony C Smith
Rasmol demoRasmol demo
Unstructured learning in computational biology Tony C Smith
BiotechnologyBiotechnology
Biologists know proteins, computer Biologists know proteins, computer scientists know machine learningscientists know machine learning
Together, they can find out a lot of hidden Together, they can find out a lot of hidden information about genes and proteinsinformation about genes and proteins
Biotechnology is a multi-billion dollar Biotechnology is a multi-billion dollar industryindustry
Biotechnology is one of the best funded Biotechnology is one of the best funded areas of scientific research areas of scientific research