120
Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode 3: Test Suite Episode 4: Building Efficient Indexes Episode 5: Load-balancing Conclusions Invenio Technology Selected Practical Software Development Lessons From A Large Digital Library System Tibor Šimko Department of Information Technology CERN August 2010 / openlab talk

Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio TechnologySelected Practical Software Development Lessons

From A Large Digital Library System

Tibor Šimko<[email protected]>

Department of Information TechnologyCERN

August 2010 / openlab talk

Page 2: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Outline

1 IntroductionDigital LibraryInvenio

2 Case StudiesEpisode 1: PythonEpisode 2: GitEpisode 3: Test SuiteEpisode 4: Building Efficient IndexesEpisode 5: Load-balancing

3 Conclusions

Page 3: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Outline

1 IntroductionDigital LibraryInvenio

2 Case StudiesEpisode 1: PythonEpisode 2: GitEpisode 3: Test SuiteEpisode 4: Building Efficient IndexesEpisode 5: Load-balancing

3 Conclusions

Page 4: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

What is Digital Library?

“library in which collections are stored in digital formats(as opposed to print, microform, or other media) andaccessible by computers”(1) institutional document repositories(2) world-wide subject-based information systems

Example: CERN Document Server

managing CERN and selected non-CERN high-energyphysics and related documents since ∼1993more than 1,000,000 recordsarticles, books, theses, photos, videos, and morepowered by Invenio, free digital library softwarehttp://cdsweb.cern.ch/

Page 5: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

What is Digital Library?

“library in which collections are stored in digital formats(as opposed to print, microform, or other media) andaccessible by computers”(1) institutional document repositories(2) world-wide subject-based information systems

Example: CERN Document Server

managing CERN and selected non-CERN high-energyphysics and related documents since ∼1993more than 1,000,000 recordsarticles, books, theses, photos, videos, and morepowered by Invenio, free digital library softwarehttp://cdsweb.cern.ch/

Page 6: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

CDS: Collection Tree

Page 7: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

CDS: Search for Books

Page 8: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

CDS: Search for Photos

Page 9: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

CDS Features: Commenting

Page 10: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Features: Reviewing

Page 11: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

CDS: Create Personal Alert

Page 12: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

CDS: Add to Personal Basket

Page 13: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

CDS: Display Personal Basket

Page 14: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

CDS: Organize and Share Your Baskets

Page 15: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

CDS: Journals and Bulletins

Page 16: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Outline

1 IntroductionDigital LibraryInvenio

2 Case StudiesEpisode 1: PythonEpisode 2: GitEpisode 3: Test SuiteEpisode 4: Building Efficient IndexesEpisode 5: Load-balancing

3 Conclusions

Page 17: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Key Features

navigable collection tree (regular, virtual)powerful search engine

Google-like speed for up to 5M recordscombined metadata, reference and fulltext search

flexible metadata (MARC, OA)handling any kind of document (multimedia)customizable input, formatting and linking

personalization and collaborative features:alerts, baskets, groups, reviews, commentsinternationalization (26 languages)

open source, GNU General Public Licenseco-developed by CERN (2002–), EPFL (2004–),DESY/FNAL/SLAC (2008–), CfA (2009–)installed at ∼30 institutions world-wide

Page 18: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Architecture: Overview

Author

Page 19: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Architecture: Overview

Author

Ingestion

Page 20: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Architecture: Overview

Author

Ingestion

Database

Page 21: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Architecture: Overview

Author

Ingestion

Database

Sources

Page 22: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Architecture: Overview

Author

Ingestion

Database

Sources

Processing

Page 23: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Architecture: Overview

Author

Ingestion

Database

Sources

Processing

Dissemination User

Page 24: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Architecture: Overview

Author

Ingestion

Database

Sources

Processing

Dissemination User

Curation

Librarian

Page 25: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Architecture: Overview

Author

Ingestion

Database

Sources

Processing

Dissemination User

Curation

Librarian

Overview

Page 26: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Ingestion

Author

Page 27: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Ingestion

Author

WebSubmit

WebSession, WebAccess

Page 28: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Ingestion

Author

WebSubmit

WebSession, WebAccess

Metadata Full-text

full-text documentmetadata

Page 29: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Ingestion

Author

WebSubmit

WebSession, WebAccess

Metadata Full-text

full-text document

BibUpload

BibSched

BibConvert

metadata

MARCXML

Page 30: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Ingestion

Author

WebSubmit

WebSession, WebAccess

Metadata Full-text

full-text document

BibUpload

BibSched

BibConvert

metadata

MARCXML

BibHarvest

OAI Data Source

Page 31: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Ingestion

Author

WebSubmit

WebSession, WebAccess

Metadata Full-text

full-text document

BibUpload

BibSched

BibConvert

metadata

MARCXML

BibHarvest

OAI Data Source

ElmSubmit

Non-OAI Data Source

Page 32: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Ingestion

Author

WebSubmit

WebSession, WebAccess

Metadata Full-text

full-text document

BibUpload

BibSched

BibConvert

metadata

MARCXML

BibHarvest

OAI Data Source

ElmSubmit

Non-OAI Data Source

Ingestion

Page 33: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Processing

Metadata Full-text

Page 34: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Processing

Metadata Full-textRefExtract

BibClassify

Page 35: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Processing

Metadata Full-textRefExtract

BibClassify

Clusters BibIndex

Page 36: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Processing

Metadata Full-textRefExtract

BibClassify

Clusters BibIndex

WebColl

Page 37: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Processing

Metadata Full-textRefExtract

BibClassify

Clusters BibIndex

WebColl

BibRank

Page 38: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Processing

Metadata Full-textRefExtract

BibClassify

Clusters BibIndex

WebColl

BibRank

BibFormat

Page 39: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Processing

Metadata Full-textRefExtract

BibClassify

Clusters BibIndex

WebColl

BibRank

BibFormat

Processing

Page 40: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Dissemination

Metadata Full-textClusters

Page 41: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Dissemination

Metadata Full-textClusters

WebSearch

User

Page 42: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Dissemination

Metadata Full-textClusters

WebSearch

User

WebBasket

Page 43: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Dissemination

Metadata Full-textClusters

WebSearch

User

WebBasketWebTag

Page 44: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Dissemination

Metadata Full-textClusters

WebSearch

User

WebBasketWebTag WebAlert

Page 45: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Dissemination

Metadata Full-textClusters

WebSearch

User

WebBasketWebTag WebAlert BibHarvest

OAI Harvester

Page 46: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Dissemination

Metadata Full-textClusters

WebSearch

User

WebBasketWebTag WebAlert BibHarvest

OAI HarvesterWebComment

Page 47: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Dissemination

Metadata Full-textClusters

WebSearch

User

WebBasketWebTag WebAlert BibHarvest

OAI HarvesterWebComment

WebMessage

Page 48: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Dissemination

Metadata Full-textClusters

WebSearch

User

WebBasketWebTag WebAlert BibHarvest

OAI HarvesterWebComment

WebMessageWebJournal

Page 49: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Dissemination

Metadata Full-textClusters

WebSearch

User

WebBasketWebTag WebAlert BibHarvest

OAI HarvesterWebComment

WebMessageWebJournal BibCirculation

Page 50: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Dissemination

Metadata Full-textClusters

WebSearch

User

WebBasketWebTag WebAlert BibHarvest

OAI HarvesterWebComment

WebMessageWebJournal BibCirculation

WebStat

Page 51: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Dissemination

Metadata Full-textClusters

WebSearch

User

WebBasketWebTag WebAlert BibHarvest

OAI HarvesterWebComment

WebMessageWebJournal BibCirculation

WebStat WebHelp

Page 52: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Dissemination

Metadata Full-textClusters

WebSearch

User

WebBasketWebTag WebAlert BibHarvest

OAI HarvesterWebComment

WebMessageWebJournal BibCirculation

WebStat WebHelpDissemination

Page 53: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Curation

Metadata Librarian Full-text

Page 54: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Curation

Metadata Librarian Full-textBibEdit

Page 55: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Curation

Metadata Librarian Full-textBibEdit

MultiEdit

Page 56: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Curation

Metadata Librarian Full-textBibEdit

MultiEdit

BatchUploader

Page 57: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Curation

Metadata Librarian Full-textBibEdit

MultiEdit

BatchUploader

BibCheck

Page 58: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Curation

Metadata Librarian Full-textBibEdit

MultiEdit

BatchUploader

BibCheck

BibCirculation

Page 59: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Curation

Metadata Librarian Full-textBibEdit

MultiEdit

BatchUploader

BibCheck

BibCirculation

BibDocFile

Page 60: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Curation

Metadata Librarian Full-textBibEdit

MultiEdit

BatchUploader

BibCheck

BibCirculation

BibDocFile

BibClassify

Page 61: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Curation

Metadata Librarian Full-textBibEdit

MultiEdit

BatchUploader

BibCheck

BibCirculation

BibDocFile

BibClassify

RefExtract

Page 62: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Curation

Metadata Librarian Full-textBibEdit

MultiEdit

BatchUploader

BibCheck

BibCirculation

BibDocFile

BibClassify

RefExtract

Tasks

BibCatalog

Page 63: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Curation

Metadata Librarian Full-textBibEdit

MultiEdit

BatchUploader

BibCheck

BibCirculation

BibDocFile

BibClassify

RefExtract

Tasks

BibCatalog

Knowledge Bases

BibKnowledge

Page 64: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Curation

Metadata Librarian Full-textBibEdit

MultiEdit

BatchUploader

BibCheck

BibCirculation

BibDocFile

BibClassify

RefExtract

Tasks

BibCatalog

Knowledge Bases

BibKnowledge

BibExport

Page 65: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Curation

Metadata Librarian Full-textBibEdit

MultiEdit

BatchUploader

BibCheck

BibCirculation

BibDocFile

BibClassify

RefExtract

Tasks

BibCatalog

Knowledge Bases

BibKnowledge

BibExportBibMatch

Page 66: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Curation

Metadata Librarian Full-textBibEdit

MultiEdit

BatchUploader

BibCheck

BibCirculation

BibDocFile

BibClassify

RefExtract

Tasks

BibCatalog

Knowledge Bases

BibKnowledge

BibExportBibMatch

BibMerge

Page 67: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Curation

Metadata Librarian Full-textBibEdit

MultiEdit

BatchUploader

BibCheck

BibCirculation

BibDocFile

BibClassify

RefExtract

Tasks

BibCatalog

Knowledge Bases

BibKnowledge

BibExportBibMatch

BibMerge

Curation

Page 68: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Invenio Modules: Summary

∼33 modulescodebase

∼250,000 lines of Python code∼10,000 lines of JavaScript code∼6,000 lines of XSL code∼5,000 lines of autotools code

∼40 authorsmany short-term studentsimportance of informal coding standards

∼10 years of developmentstarted at CERN, first release in 2002now co-developed world-wide (EU, US)

lego programming... but no silver bullet

Page 69: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Outline

1 IntroductionDigital LibraryInvenio

2 Case StudiesEpisode 1: PythonEpisode 2: GitEpisode 3: Test SuiteEpisode 4: Building Efficient IndexesEpisode 5: Load-balancing

3 Conclusions

Page 70: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Why Python?

easy to read and understand(good for many temporary developers)suitable for rapid prototyping(good for organic-growth software development model)write code to throw it away

Page 71: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Art of Ikebana

Japanese art of flowerarrangement“way of flowers”natural shapes, gracefullinesminimalism“disciplined art form inwhich nature andhumanity are broughttogether”

Page 72: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Art of Ikebana Programming

Java?

new Callable() {public Object call(Object x) {

return x.times(k)}

}

Python!

lambda x: k * x

Page 73: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Art of Ikebana Programming

Java?

new Callable() {public Object call(Object x) {

return x.times(k)}

}

Python!

lambda x: k * x

Page 74: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Speeding Up Python

bytecode interpreted languagebut Cython permits to write C extensions easilycombining efficiency of C with high-levelness of Python

Example: intbitset.pyx

ctypedef unsigned long long int word_t

ctypedef struct IntBitSet:int sizeint allocatedword_t trailing_bitsint totword_t *bitset

Page 75: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Outline

1 IntroductionDigital LibraryInvenio

2 Case StudiesEpisode 1: PythonEpisode 2: GitEpisode 3: Test SuiteEpisode 4: Building Efficient IndexesEpisode 5: Load-balancing

3 Conclusions

Page 76: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Why Git?

good for distributed teamsoffline development possible“pull on demand” collaboration model(as opposed to “shared push” collaboration model)

inherent,natural code review process

commit early, commit often (to private repositories)rebase and clean (before pushing for publicconsumption)interplay with SVN

Page 77: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Branches

C1 master

Page 78: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Branches

C1 masterC2

Page 79: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Branches

C1 masterC2 C3

v1.0.0

Page 80: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Branches

C1 masterC2 C3

v1.0.0

C4

Page 81: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Branches

C1 masterC2 C3

v1.0.0

C4

M1 maintenance

Page 82: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Branches

C1 masterC2 C3

v1.0.0

C4

M1 maintenanceM2

Page 83: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Branches

C1 masterC2 C3

v1.0.0

C4

M1 maintenanceM2

C5

Page 84: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Branches

C1 masterC2 C3

v1.0.0

C4

M1 maintenanceM2

C5

M3

v1.0.1

Page 85: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Branches

C1 masterC2 C3

v1.0.0

C4

M1 maintenanceM2

C5

M3

v1.0.1M4

Page 86: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Branches

C1 masterC2 C3

v1.0.0

C4

M1 maintenanceM2

C5

M3

v1.0.1M4

N1 next

Page 87: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Branches

C1 masterC2 C3

v1.0.0

C4

M1 maintenanceM2

C5

M3

v1.0.1M4

N1 next

C6

Page 88: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Branches

C1 masterC2 C3

v1.0.0

C4

M1 maintenanceM2

C5

M3

v1.0.1M4

N1 next

C6

N2

Page 89: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Branches

C1 masterC2 C3

v1.0.0

C4

M1 maintenanceM2

C5

M3

v1.0.1M4

N1 next

C6

N2

C7

v1.1.0

Page 90: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Branches

C1 masterC2 C3

v1.0.0

C4

M1 maintenanceM2

C5

M3

v1.0.1M4

N1 next

C6

N2

C7

v1.1.0

M5

v1.0.2

Page 91: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Branches

C1 masterC2 C3

v1.0.0

C4

M1 maintenanceM2

C5

M3

v1.0.1M4

N1 next

C6

N2

C7

v1.1.0

M5

v1.0.2

maint — release maintenance branch

master — new feature branch

next — things not yet release-ready

Page 92: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Development

C1

M1

N1

master

maint

next

Page 93: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Development

C1

M1

N1

master

maint

next

B1 some-bugfix

Page 94: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Development

C1

M1

N1

master

maint

next

B1 some-bugfix

C2

M2

N2

Page 95: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Development

C1

M1

N1

master

maint

next

B1 some-bugfix

C2

M2

N2

B2

Page 96: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Development

C1

M1

N1

master

maint

next

B1 some-bugfix

C2

M2

N2

B2

M3

merge

Page 97: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Development

C1

M1

N1

master

maint

next

B1 some-bugfix

C2

M2

N2

B2

M3

merge

C3

merge

Page 98: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Development

C1

M1

N1

master

maint

next

B1 some-bugfix

C2

M2

N2

B2

M3

merge

C3

merge

F1 some-new-feature

Page 99: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Development

C1

M1

N1

master

maint

next

B1 some-bugfix

C2

M2

N2

B2

M3

merge

C3

merge

F1 some-new-featureF2

Page 100: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Development

C1

M1

N1

master

maint

next

B1 some-bugfix

C2

M2

N2

B2

M3

merge

C3

merge

F1 some-new-featureF2

C4

merge

Page 101: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Development

C1

M1

N1

master

maint

next

B1 some-bugfix

C2

M2

N2

B2

M3

merge

C3

merge

F1 some-new-featureF2

C4

merge

N3

merge

Page 102: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Development

C1

M1

N1

master

maint

next

B1 some-bugfix

C2

M2

N2

B2

M3

merge

C3

merge

F1 some-new-featureF2

C4

merge

N3

merge

E1 some-experimental-feature

Page 103: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Development

C1

M1

N1

master

maint

next

B1 some-bugfix

C2

M2

N2

B2

M3

merge

C3

merge

F1 some-new-featureF2

C4

merge

N3

merge

E1 some-experimental-featureE2

Page 104: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git Development

C1

M1

N1

master

maint

next

B1 some-bugfix

C2

M2

N2

B2

M3

merge

C3

merge

F1 some-new-featureF2

C4

merge

N3

merge

E1 some-experimental-featureE2

N4

merge

Page 105: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Git collaboration model

Page 106: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Outline

1 IntroductionDigital LibraryInvenio

2 Case StudiesEpisode 1: PythonEpisode 2: GitEpisode 3: Test SuiteEpisode 4: Building Efficient IndexesEpisode 5: Load-balancing

3 Conclusions

Page 107: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Unit testing

test-driven development when appropriatee.g. before/while developing strip_accents(), write:

Example: search_engine_tests.py

class TestStripAccents(unittest.TestCase):"""Test for handling of UTF-8 accents."""

def test_strip_accents(self):"""search engine - stripping of accented letters"""self.assertEqual("memememe",

search_engine.strip_accents('mémêmëmè'))self.assertEqual("MEMEMEME",

search_engine.strip_accents('MÉMÊMËMÈ'))

Page 108: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Functional testing

functional/acceptance/regression testingtestbed site (Atlantis of Institute Fictive Science)e.g. Python mechanize module to emulate browser

Example: websearch_regression_tests.py

class WebSearchSearchEnginePythonAPITest(unittest.TestCase):

"Check typical search engine Python API calls on the demo data."

def test_search_engine_python_api_for_failed_query(self):

"websearch - search engine Python API for failed query"

self.assertEqual([],

perform_request_search(p='aoeuidhtns'))

def test_search_engine_python_api_for_successful_query(self):

"websearch - search engine Python API for successful query"

self.assertEqual([8, 9, 10, 11, 12, 13, 14, 15, 16, 17,

18, 47],

perform_request_search(p='ellis'))

Page 109: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Web testing

sometimes we need to run tests in real browsere.g. pages with heavy JavaScript

using Selenium IDE extension for Firefoxrecord and replay browser actionstest for text existence or non-existence on pagestest for link labels and targets

Example: test_search_ellis.html

<tr><td>open</td><td>http://localhost</td><td></td> </tr>

<tr><td>type</td><td>p</td><td>ellis</td> </tr>

<tr><td>clickAndWait</td><td>action_search</td><td></td> </tr>

<tr><td>verifyTextPresent</td><td>1. Thermal conductivity of dense quark matter and cooling of stars</td><td></td> </tr>

Page 110: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Outline

1 IntroductionDigital LibraryInvenio

2 Case StudiesEpisode 1: PythonEpisode 2: GitEpisode 3: Test SuiteEpisode 4: Building Efficient IndexesEpisode 5: Load-balancing

3 Conclusions

Page 111: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Designing A Search Engine

performance-driven design assumptions:high number of selects, low number of updatesfast searching, slow indexationcache everything cacheable

search functionality:search for words, phrases, regular expressionssearch in any field, authors, titles, etc

index design:forward indexes: word1 −→ [rec1, rec2, . . . ]

word2 −→ [rec2, rec7, . . . ]reverse indexes: rec1 −→ [word1, word8, . . . ]

rec2 −→ [word1, word2, . . . ]Zipf’s law on word frequency:

few words occur very often (e.g. the)most words are infrequent (even e.g. boson)

Page 112: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Search Engine Under Cover

Page 113: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Measuring the Performance

three important speed factors to consider:speed of finding sets (DB Server)speed of demarshaling sets (DB↔Web App Server)speed of intersecting sets (Web App Server)

Example: speed of various parts (2002, before optimization)

action / query: "CERN 2002" "of the this"

-----------------------------------------------

fetching 0.28 sec 0.34 sec

demarshaling 0.78 sec 1.10 sec

adding colls 0.37 sec 0.63 sec

intersecting 0.64 sec 1.19 sec

-----------------------------------------------

total search time 2.07 sec 3.22 sec

Page 114: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Optimizing Data Structures

data structures tested:‘sorted’ (lists, Patricia trees)‘unsorted’ (hashed sets, binary vectors)

fast prototyping: (Python, Lisp in 2002)throw-away coding to test ideas

Example: lists vs dicts, 350K sets in 800K universe

marshaling lists ..... 532616+532571 bytes in 1.33 sec

demarshaling lists ... 350000+350000 items in 0.10 sec

merging lists ........ 546965 items in 0.34 sec

intersecting lists ... 153035 items in 0.35 sec

marshaling dicts ..... 576491+576450 bytes in 0.87 sec

demarshaling dicts ... 350000+350000 items in 0.36 sec

merging dicts ........ 546965 items in 0.09 sec

intersecting dicts ... 153035 items in 0.15 sec

Page 115: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

. . . and the winner is:

binary vectors found the best compromise!using Numeric Python module (in 2002)typical search time gain: 4.0 sec→ 0.2 sec (in 2002)typical indexing time loss: 7 hours→ 4 days (in 2002)mostly spare data modelled via mostly dense datastructure?free your mind, think critically

further optimization:Numeric module not addressing real bits, only bytesso home-made intbitset C extension in 2007

addressing real bits (factor of 8 already)saving space, saving (indexing) time

Page 116: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Outline

1 IntroductionDigital LibraryInvenio

2 Case StudiesEpisode 1: PythonEpisode 2: GitEpisode 3: Test SuiteEpisode 4: Building Efficient IndexesEpisode 5: Load-balancing

3 Conclusions

Page 117: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Splitting Web App Server and DB Server

load of CDS Web and DB servers at the split time:

web + db→ web idle→ dbsplit leads to efficient use of OS resources by lone,non-competing Web and DB daemon processes

Page 118: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Load-Balanced Setup

useful for “LHC First Beam Day” rush situations withmany concurrent visitorsApache mod_proxy_balancer

User Load Balancer App Worker 2

App Worker 1

App Worker 3

DB Server

Page 119: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Measuring Scalability

using siege to simulate concurrent users and tomeasure throughput on a sample of typical URLs

Example: inspirebeta.net under gentle siege

$ siege -d 1 -c 20 -t 1m -f inspirebeta_urls.txtTransactions: 1329 hitsAvailability: 100.00 %Elapsed time: 60.23 secsData transferred: 37.12 MBResponse time: 0.41 secsTransaction rate: 22.07 trans/secThroughput: 0.62 MB/secConcurrency: 8.96Successful transactions: 1329Failed transactions: 0Longest transaction: 3.05Shortest transaction: 0.01

Page 120: Invenio Technology - Selected Practical Software ... · Invenio Technology Tibor Šimko Introduction Digital Library Invenio Case Studies Episode 1: Python Episode 2: Git Episode

InvenioTechnology

Tibor Šimko

IntroductionDigital Library

Invenio

Case StudiesEpisode 1: Python

Episode 2: Git

Episode 3: Test Suite

Episode 4: BuildingEfficient Indexes

Episode 5:Load-balancing

Conclusions

Conclusions

building Invenio digital library system∼250,000 LOCs from ∼40 authors over ∼10 years

value of rapid prototypingvalue of organic-growth software development modelvalue of coding aesthetics and minimalismmorale from selected anecdotes?

“Never Lose A Holy Curiosity” (A. Einstein)