34
Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark • http://aclark.net March 12, 2008 • Plone Symposium East

Case Study: RFA Migration

Embed Size (px)

DESCRIPTION

Case Study: RFA Migration. How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark • http://aclark.net March 12, 2008 • Plone Symposium East. Who Am I?. Plone Consultant Non-profits in DC Foundation Member Zope/Python Users Group of DC (ZPUGDC) Events Organizer - PowerPoint PPT Presentation

Citation preview

Page 1: Case Study: RFA Migration

Case Study: RFA Migration

How I migrated 208,566 news stories from Bricolage to Plone.

Alex Clark • http://aclark.net

March 12, 2008 • Plone Symposium East

Page 2: Case Study: RFA Migration

Who Am I?

• Plone Consultant– Non-profits in DC

• Foundation Member• Zope/Python Users Group of DC

(ZPUGDC) Events Organizer• “UNIX guy”, sysadmin, Bachelor of

Science in Computer Science, not really a programmer.

Page 3: Case Study: RFA Migration

What is this?

• An example of a “successful” migration, YMMV (your mileage may vary).

• Inspiration-a-palooza! If I can do it, anyone can.

• An opportunity to learn from my mistakes.– Analyses at the end.

• XXX: News ‘story’ not ‘news item’ ;-)– i.e. rfasite product ‘story’ content type, not Plone

default content type ‘news item’.

• Medium to large size migration

Page 4: Case Study: RFA Migration

What this is not

• Plone vs. Bricolage.

• How to: <your migration>.

• Best practice (OK, maybe some best practice.)

Page 5: Case Study: RFA Migration

Radio Free Asia

• RFA is a private, nonprofit corporation that broadcasts news and information in nine native Asian languages to listeners who do not have access to full and free news media. The purpose of RFA is to provide a forum for a variety of opinions and voices from within these Asian countries.

• Our Web site adds a global dimension to this objective. If you have comments, questions or suggestions, please contact us…

Page 6: Case Study: RFA Migration

Before

Page 7: Case Study: RFA Migration

After

• Not yet! ;-)

Page 8: Case Study: RFA Migration

Pre-migration decisions

i.e. how to get the data out of the old site?• Relational database “content”?

– No one understood the Bricolage data model.

• http?– I didn’t want to crawl the website.

• “Baked” content on the filesystem.– provided the clearest migration path.– Find /var/www/rfa -name index.html

Page 9: Case Study: RFA Migration

Zopectl run, then what?

• Need a way to structure the migration of 10 different language services– e.g. zopectl run mandarin.py.

• Need to ‘walk’ the file system.– i.e. how do we find the stories.

• Need a way to parse the html on the file system, – i.e. we can’t shove the entire index.html into the

body via setText()

• Need to do Unicode conversions.– E.g. from Big5, euc_kr, gb2312, ascii to Unicode.

Page 10: Case Study: RFA Migration

Zopectl run, then what?• Use Framework for performing asynchronous

tasks, http://www.simplistix.co.uk/software/zope/stepper

• Use os.walk, http://docs.python.org/lib/os-file-dir.html (in particular cb2_examples/cb2_2_16_sol_1.py)

• Use HTML parsing, http://docs.python.org/lib/module-sgmllib.html (in particular diveintopython-5.4/py/BaseHTMLProcessor.py)

• Use Unicode conversions, http://docs.python.org/lib/standard-encodings.html

Page 11: Case Study: RFA Migration

Stepper Basics• Allows you to break your migration into pieces.• Commits transactions for you.• Zopectl run run.py site-object steps-or-chains

Page 12: Case Study: RFA Migration

Stepper config.py

Page 13: Case Study: RFA Migration

Basic Results

• The ‘create’ step creates the site structure based on a list of categories defined in categories.py

• The ‘migrate’ step walks the file system looking for index.html files, then– Extracts the contents– Invokes the Factory on the new object in the

context of the category.– Calls mutators to insert content into fields,

• E.g. obj.setTitle(title_extracted)

Page 14: Case Study: RFA Migration

Intermediate Results(How to: Promise Too Much)

• Slug-i-fication: Turning– /english/news/symposium_talks_rfa/2008/03/12/

index.html into– /english/news/20080312-symposium_talks_rfa.html

• Change “category” names, e.g. from– /english/news to – /english/exciting_news.

• Import audio and image files from file system– insert into story fields and/or story folders (stories are

folderish).• Featured audio or image, vs. inline audio or image.

Page 15: Case Study: RFA Migration

Advanced Results(How to: Really Promise Too

Much)• Related Links

– At the bottom of each story are related links.

– Slug-I-fy then insert them inline.– Slug-I-fy, change the category, then insert

them inline.

Page 16: Case Study: RFA Migration

No, Really…

• I promised too much.

Page 17: Case Study: RFA Migration

The RFA Migration Story

• 10 Language Services

• 208,566 stories

• 5 Different encodings

• 70GB of content on the file system

• Hundreds of categories

Page 18: Case Study: RFA Migration

The RFA Migration - E! True Hollywood Story

• Images everywhere– /english/category/story/2008/01/01/index.html has

image • /english/category/story/2008/01/01/foo.jpg and• /english/images/foo.jpg

• Audio everywhere• Duplicate stories everywhere

– Stories published as• /english/category/story/2008/01/01/index.html were also

published as• /english/category2/story/2008/01/01/index.html.

Page 19: Case Study: RFA Migration

Sidebar: Buildout vs. Buildit

• Shortly after this project began, Buildout became the de facto standard for deploying a Plone site.

• Deploy migration code and sample data with your buildout.– e.g. bin/buildout -c migration.cfg

• where migration.cfg installs your migration code and sample data

– Even better: bin/migrate

Page 20: Case Study: RFA Migration

And now the moment you have all been waiting for!

• Run buildout

• Add site

• Configure migration

• Run migration

Page 21: Case Study: RFA Migration

Run buildout and add site

Page 22: Case Study: RFA Migration

Configure migration ; run migration

Page 23: Case Study: RFA Migration

Runme.py

Page 24: Case Study: RFA Migration

Site wide results

Page 25: Case Study: RFA Migration

Individual story results

Page 26: Case Study: RFA Migration

Showcase of all language services

Page 27: Case Study: RFA Migration

Wrap up

• Unexpected results

• Avoidable problems

• General wrap up

Page 28: Case Study: RFA Migration

Unexpected results

• Missing content

• Wrong content

• Silent failures

Page 29: Case Study: RFA Migration

Quick Fix for date!

Page 30: Case Study: RFA Migration

Quick Fix for duplicates!

Page 31: Case Study: RFA Migration

Quick Fix for broken content!

Page 32: Case Study: RFA Migration

Avoidable problems

• Don’t promise too much

• Don’t write bad code (read: bare try/excepts, etc.)

• Don’t write slow code (use string methods over regular expressions, etc.)

Page 33: Case Study: RFA Migration

General Wrap-up

• Client is happy• May actually launch soon• Huge rewards

– Great learning experience– This talk– Help others

• Things I would do different?– unrestrictedTraverse instead of app.rfa[‘english’]

[‘news’][‘20080101-slug.html’]

Page 34: Case Study: RFA Migration

Questions/Comments?

• Email me: [email protected]

• http://aclark.net • ACLARK.NET, LLC