Upload
jem-rayfield
View
4.104
Download
1
Tags:
Embed Size (px)
DESCRIPTION
BBC Dynamic Semantic Publishing.Transformational technology strategy the BBC Future Media & Technology department is using to evolve from a relational content model and static publishing framework to a fully dynamic semantic publishing (DSP) architecture. Supporting BBC World Cup 2010, BBC Sport and BBC Olympics 2012 online.http://www.bbc.co.uk/worldcup/http://news.bbc.co.uk/sport/http://www.bbc.co.uk/2012/
Citation preview
Journalism BBC MMIX
BBC Dynamic Semantic Publishing [DSP]
Jem Rayfield : Senior Technical Architect
BBC Future Media and Technology
Journalism BBC MMIX
Outline
BBC News Online
BBC World Cup 2010
BBC Sport 2011
BBC Olympics 2012
Journalism BBC MMIX
Radio since 1922 TV Since 1930 Web since 1994
Journalism BBC MMIX
http://bbc.co.uk/news
online
Journalism BBC MMIX
BBC News [Static Publishing]
Journalism BBC MMIX
Static News Architecture
Journalism BBC MMIX
BBC CPS/CMS
AssetAuthoring
Journalism BBC MMIX
BBC CPS/CMS
IndexAuthoring
Journalism BBC MMIX
Static NewsThe Good
1) Simple
2) Scales cheaply
3) Difficult to break [bad rendering logic etc..]
4) Handles high load
Journalism BBC MMIX
Static NewsThe BAD 1) Relational taxonomic
meta model
2) Static! Inflexible! SSI!
3) Document publishing
4) Content non re-usable
5) Content non repurpose-able
6) Difficult to personalize
7) Publication per output
Journalism BBC MMIX
BBC World Cup 2010
http://bbc.co.uk/worldcup
Journalism BBC MMIX
1. 32 teams, 8 groups, 736 players 776 pages
2. Fixtures & Results, Groups & Teams pages
3. To many web pages for too few journalists
4. Improve the publishing system to help achieve all of this
World Cup 2010
Journalism BBC MMIX
Page Per Playerhttp://news.bbc.co.uk/sport/football/world_cup_2010/groups_and_teams/team/england/wayne_rooney
Journalism BBC MMIX
Page PerTeam
Journalism BBC MMIX
Page PerGroup
Journalism BBC MMIX
Semantic publishing
USER EXPERIENCE
ONTOLOGY
TRIPLE STORE
Journalism BBC MMIX
Rationale
• Automated content publishing• Huge increase in content breadth (number of manageable pages)• Content re-use and re-purposing, increasing reach• Simplified content management• Journalist headcount reduction• Multi-dimensional entry points and semantic navigation• Improved user experience with high levels of user engagement• Dynamic, state (time|event) and semantic driven page layout• Personalized content • Open data and API’s
Journalism BBC MMIX
Dynamic Semantic Architecture [DSP]
Journalism BBC MMIX
API Stack
Journalism BBC MMIX
Highly Scalable Clustered BigOWLIM• Horizontally scalable• No single point of failure• Fault tolerant
Journalism BBC MMIX
Plenty ofCaching
Journalism BBC MMIX
ExtendableDomain DrivenAssetTagging
Journalism BBC MMIX
Open Ontology/Dataset reuse Event | Geonames | Foaf | Etc.
Journalism BBC MMIX
World cup ontology
Journalism BBC MMIX
Graffiti: Suggest -> Tag [Player]
Journalism BBC MMIX
Graffiti: Suggest -> Tag [Location] (Geonames)
Journalism BBC MMIX
Tag playerInfer teamInfer competition
Happy Journalist
Journalism BBC MMIX
World Cup statistics
• 750+ Dynamic aggregations/pages (Player, Squad, Group, etc..)
• Average unique page requests a day : 2 million +
• Average BigOWLIM SPARQL queries a day : 1 million
• 100s RDF statement updates/inserts per minute with full OWL reasoning and associated inference. Including sports statistics
• Multi data center fully resilient, clustered 6 node triple store
Journalism BBC MMIX
BBC Sport Online Refresh
http://bbc.co.uk/sport
Journalism BBC MMIX
Sport Refresh : Stealth Infra upgrade [DSP]
http://bbc.co.uk/sport1/hi/football/teams/c/chelsea
Journalism BBC MMIX
REST APIContent negotiation: json rdf, xml rdf, turtle
Publically accessible (with SSL cert)
GET /sport/football/teams/<TEAM>Accept: application/rdf+json
GET /sport/football/<COMPETITION>Accept: application/rdf+xml
GET /assets/<ASSET>Accept: text/rdf+n3
Etc….
Journalism BBC MMIX
<http://www.chelseafc.com/> domain:documentType <http://www.bbc.co.uk/things/document-types/homepage> , <http://www.bbc.co.uk/things/document-types/external> .
<http://www.bbc.co.uk/sport/football/teams/chelsea> domain:documentType <http://www.bbc.co.uk/things/document-types/bbc-document> , <http://www.bbc.co.uk/things/document-types/homepage> .
<http://www.bbc.co.uk/things/2acacd19-6609-1840-9c2b-b0820c50d281#id> a sport:CompetitiveSportingOrganisation ; domain:canonicalName "Chelsea"^^<xsd:string> ; domain:document <http://www.chelseafc.com/> , <http://www.bbc.co.uk/sport/football/teams/chelsea> ; domain:externalId <http://dbpedia.org/resource/Chelsea_F.C.> , <urn:sports-stats:137316635> ; domain:name "Chelsea" ; domain:shortName "Chelsea"^^<xsd:string> ; sport:competesIn <http://www.bbc.co.uk/things/5cd4682a-7643-f445-8b1f-bcbaf450bc89#id> .
<http://dbpedia.org/resource/Chelsea_F.C.> domain:externalIdType <http://www.bbc.co.uk/things/external-id-types/dbpedia> .
<urn:sports-stats:137316635> domain:externalIdType <http://www.bbc.co.uk/things/external-id-types/bbc-sport-stats> .
<http://www.bbc.co.uk/things/5cd4682a-7643-f445-8b1f-bcbaf450bc89#id> domain:canonicalName "Premier League"^^<xsd:string> ; domain:externalId <urn:sports-stats:118996114> ; sport:competitionType <http://www.bbc.co.uk/things/competition-types/domestic-league> .
GET Accept text/rdf+n3https://api.live.bbc.co.uk/dsp/sport/football/teams/chelsea
Journalism BBC MMIX
GET Accept application/rdf+jsonhttps://api.live.bbc.co.uk/dsp/sport/football/teams/chelsea
{ "http:\/\/www.chelseafc.com\/":{ "http:\/\/www.bbc.co.uk\/ontologies\/domain\/documentType":[ { "value":"http:\/\/www.bbc.co.uk\/things\/document-types\/homepage", "type":"uri" }, { "value":"http:\/\/www.bbc.co.uk\/things\/document-types\/external", "type":"uri" } ] }, "http:\/\/www.bbc.co.uk\/things\/2acacd19-6609-1840-9c2b-b0820c50d281#id":{ "http:\/\/www.bbc.co.uk\/ontologies\/domain\/externalId":[ { "value":"http:\/\/dbpedia.org\/resource\/Chelsea_F.C.", "type":"uri" }, { "value":"urn:sports-stats:137316635", "type":"uri" } ], "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type":[ { "value":"http:\/\/www.bbc.co.uk\/ontologies\/sport\/CompetitiveSportingOrganisation", "type":"uri" } ], "http:\/\/www.bbc.co.uk\/ontologies\/domain\/name":[ { "value":"Chelsea", "type":"literal" } ], "http:\/\/www.bbc.co.uk\/ontologies\/sport\/competesIn":[ { "value":"http:\/\/www.bbc.co.uk\/things\/5cd4682a-7643-f445-8b1f-bcbaf450bc89#id", "type":"uri" } ],
Journalism BBC MMIX
PHP->EasyRDF->APIPHP Render layer consumes RDF from REST API via EasyRDF (http://www.aelius.com/njh/easyrdf/)
EasyRDF open PHP library (Primary committer Nicholas Humfrey BBC)
protected function getOptions() { return array( "config" => array("usecert" => true), "headers" => array( "Accept" => "application/rdf+json", "X-Expect" => "http://www.bbc.co.uk/things/platforms/hiweb" )
);
$options = $this->getOptions()$response = $this->get("https://api.test.bbc.co.uk/dsp/sport/football/teams/chelsea", $options)$this->data = new EasyRdf_Graph("http://www.bbc.co.uk", $response->getBody());$teams = $this->data->allofType("sport:CompetitiveSportingOrganisation”)
Journalism BBC MMIX
But?..... “Our website is the API”
http://www.bbc.co.uk/programmes/
Program “The Carpenters’ Story” HTML => http://www.bbc.co.uk/programmes/b011rf7f RDF => http://www.bbc.co.uk/programmes/b007cllb.rdf
Sport .RDF coming……soon…
Journalism BBC MMIX
Augment architecture with a Content Store
1. Atomic content assets stored in MarkLogic XML store
2. XML content queryable via Xquery
3. Content Assets searchable
4. Sports statistics searchable/queryable via XQuery
5. Ontological SPARQL via BigOWLIM, assets Xquery via MarkLogic
Journalism BBC MMIX
API Stack
Journalism BBC MMIX
Ontology aware NLP
GATE +Ontotext
Journalism BBC MMIX
Euro 2012
Dynamic semantic aggregation pages for
8 Venues
4 Groups
16 Teams
336 Players
Journalism BBC MMIX
Olympics 2012 http://www.bbc.co.uk/2012/
Journalism BBC MMIX
Olympics 2012 – The requirements
1. Page per Athlete [10,000+], Page per country [200+], Page per Discipline [400-500], Page per venue A lot of output…
2. Almost real time statistics and live event pages
3. Time coded, metadata annotated, on demand video, 58,000 hours of content
1. Far too many web pages for far too few journalists
2. DSP annotation architecture to automate content aggregation
Journalism BBC MMIX
BBC Sport: http://www.bbc.co.uk/ontologies/sportOpen Sport Ontology
Journalism BBC MMIX
More…. BBC Open Ontologies
Programmes : http://www.bbc.co.uk/ontologies/programmes
Wildlife : http://www.bbc.co.uk/ontologies/wildlife/
Journalism BBC MMIX
• Entire BBC sport site re-engineered and domain modeled using RDF framework
• Geospatial (GeoSPARQL) powered news aggregations. Stories about London or Berlin…
• News Event and time based asset aggregations
• Additional domain modeling and extensions. (Business, wildlife, programmes etc..).
• Replicated triple store to facilitate a public facing BBC SPARQL endpoint and API
• SportML and BBC Sport ontology mapping
Platform future…..
Journalism BBC MMIX
Questions? [email protected]