45
Wikisource technical infrastructure What we have done and what we could do II Thomas Pellissier Tanon User:Tpt @Tpt93 Wikisource Conference 2015 Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 1 / 20

Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Wikisource technical infrastructureWhat we have done and what we could do

II

Thomas Pellissier TanonUser:Tpt@Tpt93

Wikisource Conference 2015

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 1 / 20

Page 2: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Wikisource ?

Current state:

4 millions of pages2.1 millions of proofread pages600 active editors (> 5 edits)

Strong issues:

books not easily accessibleno real bibliographic databasecontributing is quite difficult

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 2 / 20

Page 3: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Wikisource ?

Current state:

4 millions of pages2.1 millions of proofread pages600 active editors (> 5 edits)

Strong issues:

books not easily accessibleno real bibliographic databasecontributing is quite difficult

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 2 / 20

Page 4: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Wikisource technical infrastructure

MediaWiki

but with custom extensions like ProofreadPage

developed and maintained by volunteer contributors and a few GSoCprojects

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 3 / 20

Page 5: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Wikisource technical infrastructure

MediaWiki

but with custom extensions like ProofreadPage

developed and maintained by volunteer contributors and a few GSoCprojects

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 3 / 20

Page 6: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Wikisource technical infrastructure

MediaWiki

but with custom extensions like ProofreadPage

developed and maintained by volunteer contributors and a few GSoCprojects

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 3 / 20

Page 7: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Outline

1 What we have done

2 What we could do

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 4 / 20

Page 8: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

What we have done

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 5 / 20

Page 9: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Wsexport

Is a ”magic” export tool

adapted to Wikisource needs

ePub is the base format

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 6 / 20

Page 10: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Wsexport

Is a ”magic” export tool

adapted to Wikisource needs

ePub is the base format

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 6 / 20

Page 11: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Wsexport

Is a ”magic” export tool

adapted to Wikisource needs

ePub is the base format

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 6 / 20

Page 12: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Wsexport

Migrated on Wikimedia Toolslabs at https://tools.wmflabs.org/wsexport

Integrated in the UI of most ofWikisources

48, 000 exports in October 2015

Supports PDF, mobi...

ePub 3 is the default format

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 7 / 20

Page 13: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Wsexport

Migrated on Wikimedia Toolslabs at https://tools.wmflabs.org/wsexport

Integrated in the UI of most ofWikisources

48, 000 exports in October 2015

Supports PDF, mobi...

ePub 3 is the default format

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 7 / 20

Page 14: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Wsexport

Migrated on Wikimedia Toolslabs at https://tools.wmflabs.org/wsexport

Integrated in the UI of most ofWikisources

48, 000 exports in October 2015

Supports PDF, mobi...

ePub 3 is the default format

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 7 / 20

Page 15: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Wsexport

Migrated on Wikimedia Toolslabs at https://tools.wmflabs.org/wsexport

Integrated in the UI of most ofWikisources

48, 000 exports in October 2015

Supports PDF, mobi...

ePub 3 is the default format

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 7 / 20

Page 16: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Wsexport

Migrated on Wikimedia Toolslabs at https://tools.wmflabs.org/wsexport

Integrated in the UI of most ofWikisources

48, 000 exports in October 2015

Supports PDF, mobi...

ePub 3 is the default format

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 7 / 20

Page 17: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Refactoring of ProofreadPage

Goals:

More maintainable code

Use new MediaWiki features (ContentHandler...)

Better performances

Less breakages

Content validation

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 8 / 20

Page 18: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Refactoring of ProofreadPage

Goals:

More maintainable code

Use new MediaWiki features (ContentHandler...)

Better performances

Less breakages

Content validation

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 8 / 20

Page 19: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Refactoring of ProofreadPage

Goals:

More maintainable code

Use new MediaWiki features (ContentHandler...)

Better performances

Less breakages

Content validation

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 8 / 20

Page 20: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Refactoring of ProofreadPage

Done:

Rewrite editing interfaces in PHP

Try to have not too badly architectured codeAutomated testsJSON encoding of Page: pages in API

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 9 / 20

Page 21: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Refactoring of ProofreadPage

Done:

Rewrite editing interfaces in PHPTry to have not too badly architectured code

Automated testsJSON encoding of Page: pages in API

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 9 / 20

Page 22: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Refactoring of ProofreadPage

Done:

Rewrite editing interfaces in PHPTry to have not too badly architectured codeAutomated tests

JSON encoding of Page: pages in API

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 9 / 20

Page 23: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Refactoring of ProofreadPage

Done:

Rewrite editing interfaces in PHPTry to have not too badly architectured codeAutomated testsJSON encoding of Page: pages in API

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 9 / 20

Page 24: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Interproject sidebar

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 10 / 20

Page 25: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

IA-Upload

An Internet Archive to Commons import tool(http://tools.wmflabs.org/ia-upload)

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 11 / 20

Page 26: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

BUB

An import tool for Internet Archive from Google Books and other sources(https://tools.wmflabs.org/bub/)

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 12 / 20

Page 27: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

What we could do

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 13 / 20

Page 28: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Some ideas...

A Wikisource contributors survey done in Fall 2013 with 251 answers

Figure: What do you think are the core priorities for the Wikisource community?

17 %

Integrated ePub exporter15 %

Localized OCR

15 %

Better and easier workflow

11 %Wikidata integration 9 %

Visual Editor (Page: namespace)

9 %

Visual Editor (main namespace)

14 %

Metadata management system

10 %

Import-export systems

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 14 / 20

Page 29: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

VisualEditor support

Improve Parsoid rendering of Wikisource content

Support tags used on Wikisource (pages, poem, section...)

Custom interface for Page: pages

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 15 / 20

Page 30: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

VisualEditor support

Improve Parsoid rendering of Wikisource content

Support tags used on Wikisource (pages, poem, section...)

Custom interface for Page: pages

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 15 / 20

Page 31: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

VisualEditor support

Improve Parsoid rendering of Wikisource content

Support tags used on Wikisource (pages, poem, section...)

Custom interface for Page: pages

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 15 / 20

Page 32: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Mobile support

Custom edit interface for Page:and Index: pages

We should have a nice UI forboth browsing and editing

Future: gamification?

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 16 / 20

Page 33: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Mobile support

Custom edit interface for Page:and Index: pages

We should have a nice UI forboth browsing and editing

Future: gamification?

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 16 / 20

Page 34: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Mobile support

Custom edit interface for Page:and Index: pages

We should have a nice UI forboth browsing and editing

Future: gamification?

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 16 / 20

Page 35: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Exportation of content

Improve Wsexport performances

Use Parsoid

Nice book browsing interface + OPDS?

Integrated inside of Wikimedia infrastructure?

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 17 / 20

Page 36: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Exportation of content

Improve Wsexport performances

Use Parsoid

Nice book browsing interface + OPDS?

Integrated inside of Wikimedia infrastructure?

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 17 / 20

Page 37: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Exportation of content

Improve Wsexport performances

Use Parsoid

Nice book browsing interface + OPDS?

Integrated inside of Wikimedia infrastructure?

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 17 / 20

Page 38: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Exportation of content

Improve Wsexport performances

Use Parsoid

Nice book browsing interface + OPDS?

Integrated inside of Wikimedia infrastructure?

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 17 / 20

Page 39: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Some other ideas

Use Wikidata as much as possible

Gamification (capcha...)

. . .

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 18 / 20

Page 40: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Some other ideas

Use Wikidata as much as possible

Gamification (capcha...)

. . .

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 18 / 20

Page 41: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Some other ideas

Use Wikidata as much as possible

Gamification (capcha...)

. . .

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 18 / 20

Page 42: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Conclusion

What we need now:

Stronger interwiki collaboration

People to build the Wikisource of tomorrow

Stronger support from the Wikimedia movement

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 19 / 20

Page 43: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Conclusion

What we need now:

Stronger interwiki collaboration

People to build the Wikisource of tomorrow

Stronger support from the Wikimedia movement

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 19 / 20

Page 44: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Conclusion

What we need now:

Stronger interwiki collaboration

People to build the Wikisource of tomorrow

Stronger support from the Wikimedia movement

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 19 / 20

Page 45: Wikisource technical infrastructure - Wikimedia · Rewrite editing interfaces in PHP Try to have not too badly architectured code Automated tests JSON encoding of Page: pages in API

Thanks a lot for your attention!

Thomas Pellissier Tanon Wikisource technical infrastructure 2 Wikisource Conference 2015 20 / 20