Upload
miriamfs
View
352
Download
0
Embed Size (px)
DESCRIPTION
Slides for my keynote at the KEESOS workshop http://ir.ii.uam.es/keesos2011/, CAPEIA 2011
Citation preview
The Data Era: Production, Consumption, Challenges
Miriam Fernández8th November, CAEPIA 2011
Website: http://people.kmi.open.ac.uk/miriam/about/
Twitter: @miri_fs
Slide_share: http://www.slideshare.net/miriamfs
What is … ?
How do humans infer knowledge?
Alejandro
in Chicago!
Syntactic interpretation
Semantic interpretation
A picture!
How do machines infer knowledge?
Syntactic interpretation
Semantic interpretation
A picture!
The Challenge
• We need to find the way in which machines will interpret and extract knowledge for us!
=
The Challenge
The Data Era
• The 2011 Digital University Study: Extracting Value from Chaos (IDC)
–We have entered the Zettabyte era (a trillion gigabytes or a billion terabytes)
–The great of information growth appears to be exceeding Moore’s Law
http://www.emc.com/collateral/demos/
microsites/emc-digital-universe-
2011/index.htm
Big Value from Data
• Big Data: The next frontier for innovation, competition and productivity (McKinsey)
–$300 billion potential annual value to US health care
–€250 billion potential annual value to Europe’s public sector administration
http://www.mckinsey.com/mgi/publications/big_data/pdfs/M
GI_big_data_full_report.pdf
IBM City Forward
The Smarter Cities Challenge is a competitive grant program
awarding $50 million worth of IBM expertise over the next three
years to 100 cities around the globe. Designed to address the
wide range of challenges facing cities today
Consumption
• We need to provide efficient ways to consume data in order to extract the value out of it, the knowledge
–Syntactic approaches (visual analytics)• The data is collected, centralized and analysed
• Visualizations for humans to extract knowledge
–Semantic approaches• The information is distributed / interlinked
• Semantic structures are added to the data so that machines can better understand it
Syntactic approaches
• Some examples
–Gap Minder
– IBM many eyes
–Google Public Data Explorer
–Google correlate
–Google N-Gram viewer•What is the most popular hair
colour in the literature?
Google N-Gram Viewer
Semantic approaches
• The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation
Tim Berners-Lee, James Hendler,
Ora Lassila, The Semantic Web,
Scientific American, May 2001
The SW vision
• Use semantic structures (ontologies) to represent data. Provide machines with the ability to interpret and extract knowledge
=
Adding Structure
• Two paths towards the SW vision
–Metadata embedded in HTML •Microformats
• RDFa
•Microdata
–Linked Data• Putting the data online in a standard, web enabled representation (RDF)
•Make the data Web addressable (URIs)
Metadata in HTML
• An exampleKnowledge Media Institute
Walton Hall
Milton Keynes
MK7 6AA
<div class="vcard">
<div class="fn org">Knowledge Media Institute</div>
<div class="adr">
<div class="street-address">Walton Hall</div>
<div>
<span class="locality">Milton Keynes</span>,
<span class="postal-code">MK7 6AA</span>
</div>
<div class="country-name">United Kingdom</div>
</div>
</div>
Metadata in HTML
• Schema.org
Semantically enhanced Information Retrieval:
an ontology-based approach
http://people.kmi.open.ac.uk/miriam/about/
Metadata in HTML
• The Open Graph protocol
2007
2008
2009 2010
Linking Open Data cloud diagram,
by Richard Cyganiak and Anja Jentzsch.
http://lod-cloud.net/
Linked Data
Linked Data
• An example
@prefix dbpedia <http://dbpedia.org/resource/>.
@prefix dbterm <http://dbpedia.org/property/>.
dbpedia:Amsterdam
dbterm:officialName “Amsterdam” ;
dbterm:longd “4” ;
dbterm:longm “53” ;
dbterm:longs “32” ;…
@prefix dbpedia <http://dbpedia.org/resource/>.
@prefix dbterm <http://dbpedia.org/property/>.
dbpedia:Amsterdam
dbterm:officialName “Amsterdam” ;
dbterm:longd “4” ;
dbterm:longm “53” ;
dbterm:longs “32” ;…
http://data.semanticweb.org/person/miriam-fernandez/rdf
<ns1:Person rdf:about="http://data.semanticweb.org/person/miriam-
fernandez">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
http://data.semanticweb.org/person/miriam-fernandez/rdf
<ns1:Person rdf:about="http://data.semanticweb.org/person/miriam-
fernandez">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
Open Government
• Data.gov
• Data.gov.uk
• Many others…
Research Funding Explorer
BBC
• Programs
• Music
• Artist
• World Cup
Who won it? ;)
Open University
ORO
Archive of
Course
Material
Library’s
Catalogue
Of Digital
Content
OpenLearn
Content
A/V Material
Podcasts
iTunesU
Data from
Research
Outputs
BBC
DBPedia
DBLP
RAE
geonames
data.gov.uk
Currently: OU public
data sit in different
systems – hard to
discover, obtain,
integrate by users.
Exposed as linked
data, our data
interlink with each
other and the external
world: become part
of the “global data
space” on the Web
Data.open.ac.uk
data.open.ac.uk
The Value
• Recognized as a critical step forward for the HE sector in the UK
–Favor transparency and reuse of data, both externally and internally
–Reduces cost of dealing with our own public data
–Enable both new kinds of applications, and to make the ones that are already feasible more cost effective
The Value
• Linking educational material across universities http://smartproducts1.kmi.open.ac.uk/
web-linkeduniversities/index.htm
The Value
• Exploring research communities
The Value
• And many others….
Conclusions
• We have reached the Data Era
–Production: currently more than a Zettabyte of information in the digital world and increasing really fast
–Consumption: syntactic and semantic approaches have emerged to extract the value (the knowledge) out of the data
–Challenges: Provide machines with the capabilities to extract the knowledge for us!
Conclusions
• Many more challenges ahead…
–Different formats (text vs. multimedia)
–Different dynamics (time / location)
–Different provenance
–Different topics (heterogeneous)
–Distributed, Massive, stream
–Various quality
–…
THX!
• Any ideas to make me rich? ☺
=
• Slide_share: http://www.slideshare.net/miriamfs
• Website: http://people.kmi.open.ac.uk/miriam/about/
• Twitter: @miri_fs
Thanks to Fouad Zablith and Mathieu d'Aquin ☺ for sharing with me some of their slides and
for their valuable comments on this presentation