ESGF + DOCKER ESGF/Docker Solr Cloud Architecture Shard 1 Shard 2 Shard 3 ESGF SOLR CLOUD (Jetty) Shard

  • View
    5

  • Download
    0

Embed Size (px)

Text of ESGF + DOCKER ESGF/Docker Solr Cloud Architecture Shard 1 Shard 2 Shard 3 ESGF SOLR CLOUD (Jetty)...

  • ESGF + DOCKER

    Luca Cinquini NASA/Jet Propulsion Laboratory + California Ins:tute of Technology

    ESGF F2F Workshop, Washington, DC, December 2016

  • Docker in a Nutshell

    • Docker is the leading “containeriza:on” technology: run an applica:on as a “black box” on any Docker-enabled server ‣ Build: images are built as bundles that include the applica:on itself, all of its dependencies, and “just-enough-opera:ng-system” ‣ Ship: images are hosted on online repositories such as DockerHub ‣ Run: images are run as containers on any host that includes a Docker daemon

    •Why using Docker for ESGF ? ‣ Promises to greatly improve installa:on and maintenance of an ESGF node ‣ Part of DREAM strategy for modularizing the ESGF architecture and por:ng it to other domains

  • Demo: Install and Run an ESGF Node

    • Pre-requisite: install Docker Engine on host (Linux, MacOSX, Windows) • Instruc:ons: ‣ git clone h`ps://github.com/ESGF/esgf-docker.git ‣ cd esgf-docker ‣ export ESGF_HOSTNAME= ‣ export ESGF_CONFIG= ‣ ./esgf_node_init.sh ‣ docker-compose up

    h`ps://hub.docker.com/esggub/

    h`ps://my-node.esgf.org/

    CoG

    TDS

    Solr

    https://github.com/ESGF/esgf-docker.git

  • ESGF/Docker complete Node Architecture

    • All ESGF services are run as independent, interac:ng Docker containers • Specific Node configura:on (cer:ficates, passwords, XML files) saved in $ESGF_CONFIG • Specific Node data (Postgres db, Solr indexes, TDS catalogs, NetCDF files, CoG site media) stored on Docker volumes

    • Custom networks isolate applica:ons for addi:onal security (for example, Postgres db)

    Threads Data Server + OpenID Relying Party

    (Tomcat)

    ESGF DATA NODE

    ESGF POSTGRES DB

    TDS CATALOGS

    ESGF CONFIG DATA

    ESGF SOLR (Jetty)

    SOLR DATA

    ESGF CONFIGURATION (site specific)

    esgf-search webapp (Tomcat)

    ESGF INDEX NODE

    ESGF COG

    COG DATA

    httpd + mod_wsgi

    ESGF WEBUI NODE

    ESGF IDP NODE

    esgf-idp webapp (Tomcat)

    ESGF/Docker complete Node Architecture

    POSTGRES DATA

    ESGF DATA

    ESGF FTP

    docker-compose -f docker-compose.yml up

  • Docker Details

    Dockerfile: “recipe” for building a Docker image

    docker-compose.yml: configura:on file for bundling several images

  • ESGF/Docker Data Node Architecture

    docker-compose -f docker-compose-data-node.yml up

    Threads Data Server + OpenID Relying Party

    (Tomcat)

    ESGF DATA NODE

    ESGF POSTGRES DB

    TDS CATALOGS

    ESGF CONFIG DATA

    ESGF CONFIGURATION (site specific)

    ESGF/Docker Data Node Architecture

    POSTGRES DATA

    ESGF DATA

    ESGF FTP

  • ESGF/Docker Index Node Architecture

    docker-compose -f docker-compose-index-node.yml up

    Using standard Solr replica:on + distributed search

    ESGF CONFIG DATA

    ESGF SOLR (Jetty)SOLR DATA

    ESGF CONFIGURATION (site specific)

    esgf-search webapp (Tomcat)

    ESGF INDEX NODE

    httpd + mod_wsgi

    ESGF WEBUI NODE

    ESGF/Docker Index Node Architecture

    Slave Shard Master Shard

  • ESGF/Docker Index Node with Solr Cloud

    docker-compose -f docker-compose-solr-cloud.yml up

    ESGF CONFIG DATA

    ESGF SOLR CLOUD (Jetty)SOLR DATA

    ESGF CONFIGURATION (site specific)

    esgf-search webapp (Tomcat)

    ESGF INDEX NODE

    httpd + mod_wsgi

    ESGF WEBUI NODE

    ESGF/Docker Solr Cloud Architecture

    Shard 1 Shard 2 Shard 3 ESGF SOLR CLOUD

    (Jetty)

    Shard 1 Shard 2 Shard 3 ESGF SOLR CLOUD

    (Jetty)

    Shard 7 Shard 8 Shard 9

    • Solr-Cloud advantages: ‣ Automa:c distributed indexing and searching (no custom configura:on) ‣ Load balancing and high availability ‣ Fault tolerance ‣ Scalability

  • Advantages of using Docker for ESGF

    • Installa:on: ‣ Installa:on scripts are much more modular and much smaller ‣ Installa:on process is mush easier: simply download the images, no compila:on involved ‣ Easier upgrades, possibly one module at a :me, and reversible ‣ Everybody runs exactly the same sokware

    • Architecture: ‣ Can define and deploy new architectures by simply wri:ng new configura:on files ‣ Can introduce new modules by simply wri:ng & wiring new images (solr-cloud, nginx, …)

    • Portability: ESGF node will run on any planorm (Linux, MacOSx, Windows) including Cloud

    • Scalability: modules can be scaled arbitrarily by running more containers (e.g. TDS, WPS…) ‣ Caveat: applica:on must be wri`en to enable distributed access to data

    • Other Docker/Swarm advanced features: ‣ Scalability onto mul:ple hosts clusters ‣ High availability, fault tolerance, rolling updates, …

  • Disadvantages of using Docker for ESGF

    •Docker is a new paradigm for building and running applica:ons: ‣ New knowledge for applica:on developers ‣ New training for node administrators ‣ Excellent and up-to-date documenta:on is a must

    •Must port the remaining ESGF modules to Docker: ‣ ESGF Publisher Client ‣ Globus ‣MyProxy or new OAuth server ‣ Desktop + Dashboard ‣ LAS

    •Must develop a process to migrate all applica:on data: ‣ ESGF Postgres database (users and data) ‣ TDS catalogs ‣ Solr indexes ‣ CoG Postgres database and site data

  • Possible Future Roadmap

    •Establish a “whale team” of experts to work on: ‣ Test the current infrastructure ‣ Port the remaining modules ‣Work on other outstanding :ckets ‣ Develop a tes:ng suite ‣ Develop data migra:on tools ‣ Revise and expand the documenta:on ‣ Re-use current esgf-iwt biweekly mee:ngs ?

    •Migra:ng to Docker would take 6-12 months ‣Must support current or upgraded installed in the mean:me ‣Might want to switch aker CMIP6 - looking into the long-term longevity of ESGF…