9
NLM Digital Repository Server Architecture January 18, 2011

NLM Digital Repository Server Architecture January 18, 2011

Embed Size (px)

Citation preview

Page 1: NLM Digital Repository Server Architecture January 18, 2011

NLM Digital RepositoryServer Architecture

January 18, 2011

Page 2: NLM Digital Repository Server Architecture January 18, 2011

Design Considerations

Consistency with NLM architecture and processes

Remove single points of failure Data redundancy for preservation Availability Scalability Ingest ease, speed

2

Page 3: NLM Digital Repository Server Architecture January 18, 2011

3

Single Server Architecture

NWU BookViewer

Flash Video Player with Search

Muradora 1.4b

Fedora 3.2.1

Solr GSearch

OS: CentOS

HW: virtual server, 3 CPU, 24 GB RAM

Djatoka

MySQL5.0

Tomcat

FedoraManagedStorage

ExternalStorage

SolrIndex

ResourceIndex

Application Server Database ServerFile Server

Page 4: NLM Digital Repository Server Architecture January 18, 2011

Content and code

Fedora managed content Fedora database Fedora Resource Index Solr Index External content Application code Can and should these items be shared

across Fedora servers?

4

Page 5: NLM Digital Repository Server Architecture January 18, 2011

Data Center Environment

Two locations with two virtual servers each– Primary: NLM data center– Backup: Contingency operations data center– Active/Active – both locations always in use– Each virtual server has 3 CPU, 24 GB RAM

System tools– 3DNS – wide load-balancing– BIG-IP – local load balancing– Server monitoring, automatic failover– SnapMirror – NetApp filesystem replication

5

Page 6: NLM Digital Repository Server Architecture January 18, 2011

System ArchitecturePrimary Data Center Backup Data Center

BIG-IP

FedoraPrimary #1

Fedora DB

ExternalStorage

ManagedStorage

Solr IndexResource Index

FedoraPrimary #2

Fedora DB

ManagedStorage

Solr IndexResource Index

BIG-IP

FedoraBackup #1

Fedora DB

ExternalStorage

ManagedStorage

Solr IndexResource Index

FedoraBackup #2

Fedora DB

ManagedStorage

Solr IndexResource Index

Browser Browser

3DNS

Page 7: NLM Digital Repository Server Architecture January 18, 2011

Ingest considerations

Our Fedora system is read-only with controlled periodic batch content updates

System is available during updates – use one data center while updating the other

Code and content should be identical across servers

Reduce time to ingest to all servers in system. Approx. 10 hours for full re-ingest.

7

Page 8: NLM Digital Repository Server Architecture January 18, 2011

Content replication Content replication strategies

1. Fedora journaling (ingest to master, master-slave, messaging)2. Ingest to master, copy managed content to slave, rebuild

slave DB and resource index from managed content (rebuild is faster than full ingest)

3. Ingest to master, use system tools (NetApp SnapMirror) to copy all resources to slaves.

4. Ingest to each server independently

Our approach– Turn off primary data center, use backup data center to serve

public– Ingest to primary 1, copy managed content to primary 2,

rebuild primary 2 ...– Turn off backup data center, use primary data center to serve

public– Use SnapMirror to copy all resources from primary 1,2 to

backup 1,2– Turn on backup data center, both data centers available to

serve public

8

Page 9: NLM Digital Repository Server Architecture January 18, 2011

NLM Content ReplicationPrimary Data Center Backup Data Center

FedoraPrimary #1

Fedora DB

ExternalStorage

ManagedStorage

Solr IndexResource Index

FedoraPrimary #2

Fedora DB

ManagedStorage

Solr IndexResource Index

FedoraBackup #1

Fedora DB

ExternalStorage

ManagedStorage

Solr IndexResource Index

FedoraBackup #2

Fedora DB

ManagedStorage

Solr IndexResource Index

Ingest

Rebuild

SnapMirror