View
0
Download
0
Category
Preview:
Citation preview
Connecting Advice Taking and Big Data!
Randy Goebel!Alberta Innovates Centre for Machine Learning!
Department of Computing Science!University of Alberta!
Edmonton, Alberta !Canada!
rgoebel@ualberta.ca!
National Institute of Informatics, Lecture One, February 12, 2013!
All I know in four lectures!
• Connecting Advice Taking and Big Data!• Tuesday, February 12, 2013, 13:30 - 15:00!
• Structured inference and incomplete information!
• Wednesday, February 20, 2013, 13:30 – 15:00!
• Natural Language Processing: Compressing Data to Models!
• Tuesday, February 26, 2013, 13:30 - 15:00!• Hypothesis Management with Symbols
and Pictures!• Friday, March 1, 2013, 13:30 - 15:00!
National Institute of Informatics, Lecture One, February 12, 2013!
Lecture preparation began …!
• When I read Minsky’s (1968) edited volume in September of 1970!
• Evans, Quillian, Raphael, Bobrow, Black, McCarthy, Minsky!
National Institute of Informatics, Lecture One, February 12, 2013!
Outline!
• Preserving the goal of McCarthy’s Advice Taker!
• The challenges of big data!• Necessary components for creating and
managing knowledge at scale!• The role of multi-disciplinary collaboration
in building multi-scale knowledge models!• Prospective!
National Institute of Informatics, Lecture One, February 12, 2013!
Outline!
• Preserving the goal of McCarthy’s Advice Taker!
• The challenges of big data!• Necessary components for creating and
managing knowledge at scale!• The role of multi-disciplinary collaboration
in building multi-scale knowledge models!• Prospective!
National Institute of Informatics, Lecture One, February 12, 2013!
The Advice Taker (1958)!
“The basic program will draw immediate conclusions from a list of premises. These conclusions will be either declarative or imperative sentences. When an imperative sentence is deduced the program takes a corresponding action. These actions may include printing sentences, moving sentences on lists, and reinitiating the basic deduction process on these lists.”!!J. McCarthy, “Programs with common sense,” http://www-formal.stanford.edu/jmc/mcc59/mcc59.html!
National Institute of Informatics, Lecture One, February 12, 2013!
The Advice Taker (1958)!
“The main difference … is that in the previous programs the formal system was the subject matter but the heuristics were all embodied in the program. !In this program the procedures will be described as much as possible in the language itself and, in particular, the heuristics are all so described.”!!J. McCarthy, “Programs with common sense,” http://www-formal.stanford.edu/jmc/mcc59/mcc59.html!
National Institute of Informatics, Lecture One, February 12, 2013!
Uniform formal representation!
model/landscape
objective functions
heuristic knowledge
c1, c2, …, cn
o1, o2, …, on
h1, h2, …, hn
• Model/landscape sentences describe the search space
• Objective function(s) sentences describe the solution space
• Heuristic knowledge sentences describes constraints on search
National Institute of Informatics, Lecture One, February 12, 2013!
Changing the Model!
model/landscape
objective functions
heuristic knowledge
c1, c2, …, cn
o1, o2, …, on
h1, h2, …, hn
model/landscape
objective functions
heuristic knowledge
c1, cnew, …, cn
o1, o2, …, on
h1, h2, …, hn
± cnew BR
National Institute of Informatics, Lecture One, February 12, 2013!
Changing the Objectives!
model/landscape
objective functions
heuristic knowledge
c1, c2, …, cn
o1, o2, …, on
h1, h2, …, hn
model/landscape
objective functions
heuristic knowledge
c1, c2, …, cn
o1, onew, …, on
h1, h2, …, hn
± onew BR
National Institute of Informatics, Lecture One, February 12, 2013!
Changing the Search!
model/landscape
objective functions
heuristic knowledge
c1, c2, …, cn
o1, o2, …, on
h1, h2, …, hn
model/landscape
objective functions
heuristic knowledge
c1, c2, …, cn
o1, o2, …, on
h1, hnew, …, hn
± hnew BR
National Institute of Informatics, Lecture One, February 12, 2013!
The Advice Taker revisted (2003)!
• Advice taken:!• Logical representation languages!• “Common sense informatic situation”!• Explicit representation of actions, prove
sequence of actions achieves goal, “should(X)”!• Advice not yet taken:!
• Formal generalization (learning abstractions)!• Domain-dependent logical heuristics!
J. McCarthy, “The Advice Taker revisted,” http://www-formal.stanford.edu/jmc/slides/advice03/advice03-sli.pdf!
National Institute of Informatics, Lecture One, February 12, 2013!
Outline!
• Preserving the goal of McCarthy’s Advice Taker!
• The challenges of big data!• The role of multi-disciplinary collaboration
in building multi-scale knowledge models!• Necessary components for creating and
managing knowledge at scale!• Prospective!
National Institute of Informatics, Lecture One, February 12, 2013!
The challenges of big data!
• “External” challenges!• the Square Kilometre Array!• the DNA transistor!• the volume & diversity of environmental
monitoring!• “Internal” responses!
• Physical symbol systems (Newell), knowledge representation hypothesis (Smith)!
• enCYClopedia building!• semantic web!• crowd sourcing!• Watson!
National Institute of Informatics, Lecture One, February 12, 2013!
External challenges!
• the Square Kilometre Array!• the DNA transistor!• the volume & diversity of environmental
monitoring!
National Institute of Informatics, Lecture One, February 12, 2013!
Square Kilometer Array!
http://en.wikipedia.org/wiki/Square_Kilometre_Array http://www.skatelescope.org/the-technology/
National Institute of Informatics, Lecture One, February 12, 2013!
Square Kilometer Array!
http://en.wikipedia.org/wiki/Square_Kilometre_Array http://www.skatelescope.org/the-technology/
National Institute of Informatics, Lecture One, February 12, 2013!
Square Kilometer Array!
• 3 clusters of sensing technologies (radio dishes, dense and sparse aperture arrays)!
• Distributed over 3,000km!• Effectively a single square kilometer multi-spectral
sensor!• Final processed data output: 10Gb/second!
http://en.wikipedia.org/wiki/Square_Kilometre_Array http://www.skatelescope.org/the-technology/
National Institute of Informatics, Lecture One, February 12, 2013!
3 Generations of technology!
• Three generations of genomic “sequencing”!• Sanger size-separation imaging!• Second generation sequencing (SGS)!• Single molecule sequencing (SMS)!
Human Molecular Genetics, 2010, Vol. 19, Review Issue 2
National Institute of Informatics, Lecture One, February 12, 2013!
Single Molecule Sequencing!
• IBM’s DNA transistor technology!• reads individual bases of ssDNA molecules as they pass
through a narrow aperture based on the unique electronic signature of each individual nucleotide. !
• Gold bands represent metal and gray bands dielectric layers of the transistor.!
Human Molecular Genetics, 2010, Vol. 19, Review Issue 2
National Institute of Informatics, Lecture One, February 12, 2013!
Typical data output!• Epoch Life Sciences!
• SGS (Ion Torrent machines)!• http://www.epochlifescience.com/Service/Next_Gen_Seq.aspx!• 10Mb in about 24 hours!
• BGI (Shenzhen)!• 128 Illumina HiSeq 2000 sequencers!• 10,000 human genomes/year!• 25Gb/day*128 = 3,200Gb/day!
• IBM single molecule sequencing (SMS)!• “…Speed, read length and low cost are again the chief
advantages of this type of approach. In fact, the speed of sequencing could be very dramatically increased with this approach, given the theoretical limit has been computed to be 500 000 000 bases read per transistor per second.”!
National Institute of Informatics, Lecture One, February 12, 2013!
Environmental Modeling!
• Carbon flux nets!• Biodiversity data capture!• Toxicology modeling and remediation!
National Institute of Informatics, Lecture One, February 12, 2013!
Fluxnet!
• Data impression every 30 minutes from about 500 towers!
http://daac.ornl.gov/FLUXNET/fluxnet.shtml
National Institute of Informatics, Lecture One, February 12, 2013!
Oilsands metagenomics!
• Exploit natural processes of oilsands microbial diversity!
http://www.bio.ucalgary.ca/contact/faculty/voordouw/Uncovering%20the%20Microbial%20diversity%20of%20the%20Alberta%20Oil%20Sands%20theough%20Metagenomics.pdf
National Institute of Informatics, Lecture One, February 12, 2013!
Environmental Modeling Data
“The data generated by sensors currently exceeds, twofold, the world’s capacity to store digital data [8], requiring selective capture, compression and summarization.” !
- Porter et al., Cell, Trends in Ecology and Evolution, February 2012, Vol. 27, No. 2
National Institute of Informatics, Lecture One, February 12, 2013!
The challenges of big data!
• “External” challenges!• the Square Kilometre Array!• the DNA transistor!• the volume & diversity of environmental
monitoring!• “Internal” responses!
• Physical symbol systems (Newell), knowledge representation hypothesis (Smith)!
• enCYClopedia building!• semantic web!• crowd sourcing!• Watson!
National Institute of Informatics, Lecture One, February 12, 2013!
Internal responses!
• Physical symbol systems (Newell), knowledge representation hypothesis (Smith)!
• enCYClopedia building!• semantic web!• crowd sourcing!• Watson!
National Institute of Informatics, Lecture One, February 12, 2013!
The Knowledge Representation Hypothesis!
Any mechanically embodied intelligent process will be comprised of structural ingredients that !
a) we as external observers naturally take to represent a propositional account of the knowledge that the overall process exhibits, and !b) independent of such external semantic attribution, play a formal but causal and essential role in engendering the behaviour that manifests that knowledge.!
!– Brian C. Smith (1982)!
National Institute of Informatics, Lecture One, February 12, 2013!
CYC!
• Elkan, Greiner (1993), Review of “Building large knowledge-based systems: representation and inference in the CYC Project” Lenat & Guha!http://www.sciencedirect.com/science/article/pii/000437029390092P!
• CYC separates “epistemological” and “heuristic” knowledge (cf. Advice Taker)!
• “CYC 4.0 released June 2012, has 239,000 concepts and 2,093,000 facts and can be browsed on the OpenCyc website”!
National Institute of Informatics, Lecture One, February 12, 2013!
Semantic “Layer Cake”!
• disciplined bottom up construction of semantics!
• based on foundation of engineered layers!
http://www.w3.org/2004/Talks/0412-RDF-functions/slide4-0.html
National Institute of Informatics, Lecture One, February 12, 2013!
Semantic “Layer Cake”!
• The knowledge representation hypothesis applies at least to these three levels!
Web 3.0: Convergence of Web 2.0 and the Semantic Web, Wahlster & Dengel, Deutsche Telekom Laboratories, June 2006
KR
National Institute of Informatics, Lecture One, February 12, 2013!
SPARQL and RDF databases!
• Resource Description Framework (RDF)!• SPARQL (SPARLQ Protocol and RDF Query
Language)!• “a SPARQL endpoint is mostly conceived as a
machine-friendly interface towards a knowledge base” !
!http://semantic.org/wiki/SPARQL_endpoint!• RDF databases, e.g., Allegro Triple graph (more
than 1015 RDF triples)!!http://www.w3.org/wiki/LargeTripleStores!
!
National Institute of Informatics, Lecture One, February 12, 2013!
SPARQL and RDF databases!
• One trillion of these:!
National Institute of Informatics, Lecture One, February 12, 2013!
Crowdsourcing: ESP
• Luis van Ahn “some people play 60-70 hours a week…”
http://www.espgame.com
National Institute of Informatics, Lecture One, February 12, 2013!
Crowdsourcing: Captcha
• Improving accuracy of optical character recognition
http://www.espgame.com
National Institute of Informatics, Lecture One, February 12, 2013!
Watson: Jeopardy!
• Knowledge-based probablistic abduction!
• Knowledge accumulated by NLP information extraction from sources like Wikipedia!
http://www-03.ibm.com/innovation/us/watson/
National Institute of Informatics, Lecture One, February 12, 2013!
Watson: beyond Jeopardy!
• Knowledge-based probablistic abductive differential diagnosis!
• Knowledge accumulated by NLP information extraction, e.g., reading New England Journal of Medicine!
http://www-03.ibm.com/innovation/us/watson/
National Institute of Informatics, Lecture One, February 12, 2013!
Watson: reading journals!
Dear James, … Attached is our output of "http://content.nejm.org/cgi/content/full/352/14/1474"the format is:a sentence chunk of the sentence by any punctuation 0] CID => the entry in UMLS \t system's score for this entry.1] CID => the entry in UMLS \t system's score for this entry.... One Surprise after Another. One Surprise after Another In this Journal feature, information about a real patient is presented in stages (boldface type) to an expert clinician, who responds to the information, sharing his or her reasoning with the reader (regular type). In this Journal feature information about a real patient is presented in stages 0] C0280209 => melanoma stage -124.47169494628906 boldface type 0] C0268405 => ah type amyloidosis
-111.1513671875 to an expert clinician who responds to the information sharing his or her reasoning with the reader regular type 0] C0268405 => ah type amyloidosis
-108.95413970947266 …
National Institute of Informatics, Lecture One, February 12, 2013!
Outline!
• Preserving the goal of McCarthy’s Advice Taker!
• The challenges of big data!• Necessary components for creating and
managing knowledge at scale!• The role of multi-disciplinary collaboration
in building multi-scale knowledge models!• Prospective!
National Institute of Informatics, Lecture One, February 12, 2013!
Knowledge at scale!
• Significant knowledge stores can not be directly engineered!
• Creation and curation of knowledge stores requires continuous incremental machine learning!
• The challenge of automatically building multi-scale “models” depends on “collaborative naming”!
National Institute of Informatics, Lecture One, February 12, 2013!
Modeling and Data Abstraction!
…100011001011010010110101010101010101…
…, 24, 36, 19, 54, …
National Institute of Informatics, Lecture One, February 12, 2013!
Modeling and Data Abstraction!
…100011001011010010110101010101010101…
…, 24, 36, 19, 54, …
National Institute of Informatics, Lecture One, February 12, 2013!
KR at scale principles!
• “The world is its own best representation, and everything else is an approximation.”!
• “All models (theories) are wrong; some are useful.”!http://en.wikiquote.org/wiki/George_E._P._Box!
!
National Institute of Informatics, Lecture One, February 12, 2013!
Vocabulary & Modeling Challenges!
• Determination of suitable vocabularies and their role in structured models is the general problem of domain modeling!
• An AI complete problem!• We have information theoretic control on
artificial domains!• Not clear if information theoretic control is
consistent with natural constraints on natural domains (the general question about biological systems)!
National Institute of Informatics, Lecture One, February 12, 2013!
Granularity!
• Exploiting abstraction!• Self-similarity, power laws, scale free
structures!• Multi-scale representations!• Learning, Deep learning!!
National Institute of Informatics, Lecture One, February 12, 2013!
Planning in a hierarchy of abstractions!
• Abstrips created a hierarchy of abstractions based on “criticality” assignments!
http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA269528&Location=U2&doc=GetTRDoc.pdf
National Institute of Informatics, Lecture One, February 12, 2013!
Compression Vocabularies!
Primary Structure (sequence of amino acids)! MVKQIESKTA FQEALDAAGD KLVVVDFSAT WCGPCKMIKP FFHSLSEKYS! NVIFLEVDVD DCQDVASECE VKCMPTFQFF KKGQKVGEFS GANKEKLEAT! INELV!
Secondary Structure (alpha Helix, Beta strand, random Coil)! CBBBBCCHHH HHHHHHHCCC CBBBBBBBCC CCHHHHHHHH HHHHHHHHCC ! CBBBBBBBCC CCHHHHHHCC CCCCCBBBBB BCCBBBBBBB CCCHHHHHHH ! HHHCC!
National Institute of Informatics, Lecture One, February 12, 2013!
Self-similarity, like fractals!
• Hierarchical construction with similar components!
• Vocabulary (both physical and linguistic) is a limiting factor!
http://gurneyjourney.blogspot.jp/2009/08/self-similarity-in-fractals.html
National Institute of Informatics, Lecture One, February 12, 2013!
Scale Free Structures!
• Replicated structure at multiple scales!
• Graph theoretic characterization within pursuit of modeling natural phenomenon (e.g., molecular dynamics)!
http://en.wikipedia.org/wiki/Scale-free_network
National Institute of Informatics, Lecture One, February 12, 2013!
Natural Granularity and Power Laws!
• Vocabulary and transformation levels!
• Scale invariant structures and modeling!
http://en.wikipedia.org/wiki/Power_law
National Institute of Informatics, Lecture One, February 12, 2013!
Multi-scale Representation!
• E.g., multi-scale representation of biological structure !
• Inference about components organized as!
• Reduction/aggregation!• “Localized” cellular activity!
http://homepage.ntlworld.com/l.dupuy/Manual/architecture.htm
National Institute of Informatics, Lecture One, February 12, 2013!
Learning Architectures!
• Provide a framework to help!• Identify and capture predictively rich data
streams!• Apply machine learning methods to create
predictors!• Provide evaluation methods to iterate and
improve!
National Institute of Informatics, Lecture One, February 12, 2013!
Learning Architecture!
• Which data?!• How much data?!• Which learning
method!• How to evaluate?!• How to do it
incrementally?!
National Institute of Informatics, Lecture One, February 12, 2013!
RLAI within AICML!
http://rlai.cs.ualberta.ca/RLAI/asimpleplan.html
National Institute of Informatics, Lecture One, February 12, 2013!
Reinforcement Learning Architectures!
R. Sutton, Reinforcement Learning Architectures
National Institute of Informatics, Lecture One, February 12, 2013!
Reinforcement Learning Architectures!
R. Sutton, Reinforcement Learning Architectures
National Institute of Informatics, Lecture One, February 12, 2013!
Deep Learning!
• Physiological motivation!
• Semantics based only on inducing complex functions!
• N-level networks without semantic rationalization for N!
http://videolectures.net/okt09_bengio_ldhr/
Popular trend in neural ML, Y. Bengio (Montreal), G. Hinton (Toronto)
National Institute of Informatics, Lecture One, February 12, 2013!
Deep Learning!
• Information theoretically sound!
• Sophisticated adjustments to neural net training (e.g., restricted Boltzman Machines, clever application of Gibbs sampling).!
http://videolectures.net/okt09_bengio_ldhr/
National Institute of Informatics, Lecture One, February 12, 2013!
Outline!
• Preserving the goal of McCarthy’s Advice Taker!
• The challenges of big data!• Necessary components for creating and
managing knowledge at scale!• The role of multi-disciplinary
collaboration in building multi-scale knowledge models!
• Prospective!
National Institute of Informatics, Lecture One, February 12, 2013!
Explain and Explore!
• We need tools for both explanatory and exploratory analytics!
• “Volume, variety, velocity”!Understanding Big Data: Analytics for Enterprise Class
Hadoop and Streaming Data, Zikopoulos et al., McGraw-Hill, 2012!
!
National Institute of Informatics, Lecture One, February 12, 2013!
prediction
modeling
data capture
The Systems of Systems Biology!
• data capture & integration!
• machine learning (classification, clustering, model induction, …)!
• integration of instrumentation data and biological knowledge (sequencing, NMR, cell lineage, …)!
• modeling & simulation!• scientific reasoning:
hypothesize models, predict consequences, revise!
• integration of multiple levels of granularity!
• prediction, design, intervention!
• causal identification!• intervention design (anti-viral/
bacterial, gene therapy, …)!
National Institute of Informatics, Lecture One, February 12, 2013!
Protein Folding Hypotheses!• Foldit is a protein
folding game for generating plausible protein folding hypotheses!
• Initiated by David Baker at the U of Washington, Seattle!
• Baker, David Salesin, Zoran Popović developed the game!
!!http://en.wikipedia.org/wiki/Foldit
National Institute of Informatics, Lecture One, February 12, 2013!
Atlases of Cyberspaces!• Bill Cheswick, Hal
Burch (formerly of Bell Labs)!
• http://research.lumeta.com/ches/map/gallery/index.html!
• “Internet has a diameter of about 10,000 pookies”!
• (A pookie is an arbitrary unit of distance in the space in which the maps are laid out.)
National Institute of Informatics, Lecture One, February 12, 2013!
Modeling the universe!
• What should you do with 10Gb/sec of data?!• Model our galaxy (i.e., the Milky Way galaxy)!• Model universal processes (e.g., gravity, black holes)!
http://www.apple.com/science/insidetheimage/hurt_stolovy/ http://www.spitzer.caltech.edu/images/1540-ssc2006-02a-A-Cauldron-of-Stars-at-the-Galaxy-s-Center
National Institute of Informatics, Lecture One, February 12, 2013!
Spiral galaxies!
http://en.wikipedia.org/wiki/Square_Kilometre_Array http://www.skatelescope.org/the-technology/
National Institute of Informatics, Lecture One, February 12, 2013!
Disease causation!
• Distinguish !• viral from genomic causation!• heredity influence with haplotype maps!
http://www.nature.com/nature/journal/v437/n7063/fig_tab/nature04226_F15.html
National Institute of Informatics, Lecture One, February 12, 2013!
Haplotype maps!
http://dienekes.blogspot.com/2011/01/x-linked-haplotype-of-neandertal-origin.html
National Institute of Informatics, Lecture One, February 12, 2013!
Genetic disease causation!
http://www.familytreedna.com/public/I1d/default.aspx?section=results
National Institute of Informatics, Lecture One, February 12, 2013!
Metagenetic disease causation!
http://www.environmentmagazine.org/Archives/Back%20Issues/March-April%202010/made-in-china-full.html
National Institute of Informatics, Lecture One, February 12, 2013!
Metagenetic disease causation!
World J Gastroenterol 2006 January 7; 12(1): 17-20 World Journal of Gastroenterology ISSN 1007-9327, also http://www.wjgnet.com/1007-9327/12/17.pdf
National Institute of Informatics, Lecture One, February 12, 2013!
Sustainable energy!
• Genome Alberta’s $11M project on oil sands metagenomics!
!
Ondov et al. BMC Bioinformatics 2011, 12:385
National Institute of Informatics, Lecture One, February 12, 2013!
Food security!
• E.g., genomic support for mutagenesis for crop improvement!
Journal of Experimental Botany, Vol. 60, No. 10, pp. 2817–2825, 2009
National Institute of Informatics, Lecture One, February 12, 2013!
Prospective!
• AI people are knowledge facilitators – they are multidisciplinary scientific collaborators!
• The discipline of nurturing knowledge creation and use at scale, may not be possible without new views of “individual” and crowd sourcing!
• Measures of incremental local improvements are ok, but systematic acceleration of scientific progress is the ultimate measure!
National Institute of Informatics, Lecture One, February 12, 2013!
… for graduate students!
• Ensure you understand the formal methods of computer science, especially discrete mathematics, probability, and logic!
• Remain aware of the power of multiple perspectives – including the power of collaboration!
• Whether you seek theoretical or practical contributions, always consider their impact on the overall science of informatics!
National Institute of Informatics, Lecture One, February 12, 2013!
Start and end are fine …!
Courtesy of Sidney Harris, www.sciencecartoonsplus.com
Recommended