View
1.960
Download
1
Category
Tags:
Preview:
DESCRIPTION
Stephen Wang http://stephenwang.comAlivenotdead.com CTOmongoDB Beijing Presentation (March 3, 2011):From Rotten Tomatoes to alivenotdead.com to alive.cn, an explanation of the evolution of building an entertainment database at each stage of evolution. The current version is a multi-lingual global entertainment database using linked open data and mongoDB.
Citation preview
Building a super database from linked data
Stephen Wang 王傳仁me@stephenwang.com
March 3, 2011
Who is this NOT for?
Building a large database from a tiny team Organizing the world's information Information innovation
Who IS this for?
About
Co-founder, CTO Popular movie reviews web site Aggregated reviews,
comprehensive film database
The Stone Age
Static HTML templates
Editors read articles and pull quotations
Only cover the newest movies
~1000 films
Modern Times
Shift to LAMP License long-tail
database Automated spiders,
early UGC via critics Use homegrown
CMS for additional content
(How I felt maintaining Rotten Tomatoes' overloaded database servers)
8 million unique visitors / month Lean startup: 25x traffic with 7 staff Great site for film lovers (including Steve Jobs)
v
The Result
About Co-founder, CTO
SNS for artists started with Daniel Wu 吴彦祖
Started with six artists, now 1,600 artists, 600K registered users
Also powers official web sites:
李连杰: JetLi.com
成龙: JackieChan.com
莫文蔚: KarenMok.com
Our LAMP stack: Not the best setup for...Newsfeeds...
Viral loop analysis...
Multivariate testing...
The Problem?!?Scalability issues with real-time data, but without traffic from
public, long-tail content
About
A better entertainment database
Providing the long-tail content
Still a part of alivenotdead.com
Still in alpha
Features Comprehensive info
for celebrities, films, music, and TV
Searchable, structured data
Multilingual: English, Chinese, Japanese
Aggregated social media from inside/outside China
Why use mongoDB?
Flexible schema for different data sources
Dozens of other sources...
Why use
Scalable big data 500,000 translations
Next challenge:
Aggregating and storing the social media firehose
2 million+ topics covered
Why use
Crossing the border... alive.tom.com in
Tianjin Alivenotdead.com
in Hong Kong
Use replica sets/eventual consistency to overcome frequent cross-border network issues
Wikipedia as structured data Creative Commons license
Multiple CC sources Organized taxonomy Acquired by Google No Chinese/Japanese yet!
Using Linked Open Data
Wikipedia as structured data Creative Commons license
Only Wikipedia Messy taxonomy Chinese/Japanese topic
translations, but requires English topic link
Using Linked Open Data
Using Linked Open Data
Use Freebase organized taxonomy, broad data Expand DBpedia to Chinese-only topics Same methodology across Chinese wiki sources
The Future
Developer API Topic extraction Real-time trends
across languages Other verticals
Already 10x more data than Rotten Tomatoes...
The complete sum of information from across the web...
Information not constrained by language...
We're hiring PHP engineers! Send your CV to me@stephenwang.com
My blog: http://stephenwang.com
Recommended