22
CBS-KNAW The development of the MIRRI ICT infrastructure for microbial resources Paolo Romano, Boyke Bunk, Anna Klindsworth, David Smith, Alexander Vasilenko, Frank Oliver Glockner and Vincent Robert

The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

The development of the MIRRI ICT

infrastructure for microbial resources

Paolo Romano, Boyke Bunk, Anna Klindsworth, David Smith, Alexander

Vasilenko, Frank Oliver Glockner and Vincent Robert

Page 2: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

A common situation …

MS-Access MySQL

MS-Excel

Page 3: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

A common situation …

Page 4: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

Outlook

1. Management system for curators

2. Publication of data for third parties

3. Interoperability

Page 5: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

1. Management system for curators

A. MANAGE COLLECTION’S DATA USING WEB BASED APPLICATIONS

Pros:• Accessibility to databases from anywhere

• Accessibility to databases using any devices

• Possibly easy to use for basic operation

• Maintenance is easy for IT departments since the software is centrally

installed and maintained

• No need for installations on curators, researchers or technicians devices

(Desktop, laptop, tablet, smart phone, etc.) since access is done using

browsers

• The same software might be used for the management and the publication

of data

Cons:• Developments costs are usually higher

• Developments can be significantly more complex to support all browsers and their versions

• Some advanced or even basic functionalities might be much more difficult or impossible to program

• Rich interfaces or memory demanding operation might be impossible

• Interface can be much slower than desktop applications

• Interactions with other software might be more difficult or impossible

• Maintenance of software might be more intensive to allow new versions of browsers to still function properly

• Security issues are more complex to handle with Web Apps than with desktop application since the application is

potentially accessible from any device by anyone

• Stable Internet connections are needed

Page 6: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

1. Management system for curators

B. MANAGE COLLECTION’S DATA USING DESKTOP APPLICATIONS

Pros:• Rich software interface

• Easy to use

• Fast response to user’s commands

• Memory demanding or interface rich operations can easily be performed

(to the technical limits of the OS, computer, etc., of course)

• Relatively easy to develop (for basic functionalities at least)

• Interactions with other software can be easy to establish. Pipelines can be

created and import-export functionalities easy to implement or to use

• Data access security can easily be ensured

Cons:• Installation can be problematic (different Operating Systems (OS) versions, missing DLL, etc.)

• DA are usually made for one OS (Windows, Mac or Linux) but won’t work with others.

• When installed on different computers, updates and upgrades of the software must be re-installed everywhere

making bug fixing or new version less easy to fix or install

• DA are usually not accessible from a remote computer or device

• For software working with limited installation options (fixed number of licenses), DA might become expensive

and/or difficult to update/upgrade

• Can be heavy to manage for IT departments

Page 7: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

1. Management system for curators

C. CREATE MANAGEMENT SOFTWARE USING IN-HOUSE RESOURCES

Pros:

• Taylor made application fitting perfectly with the needs of

the curators (at design time at least)

• Fast response to implement new features and bug solving

• This solution can be quite cheap if the software remains

simple

• Possible if strong team of stable developers

Cons:• Curators or researchers are rarely good software designers or programmers making the resulting solution uneasy

to use, maintain and further develop

• Real developers are rarely available in culture collections (CC) because they are expensive.

• Good developers easily tend to leave the CC to find better paid position leaving the software unmaintained and

hardly usable by newly recruited developers.

• This option can be extremely expensive when the wanted functionalities are complex and large.

• Most in-house solutions are not (easily at least) scalable (add/modify/remove more

tables, fields, operations, etc.) and redesign or complete rewriting of software is often needed. This leads to

interfacial instability for the end-users which is a key issue.

• Developments take a long time before being usable and stable especially for single or small developers teams.

• Many software were abandoned after a few months/years because they were too slow, difficult to use, user-

unfriendly, buggy or unstable. This is a common situation in a CC.

“If you think that professionals are expensive, wait until you work with amateurs …” Red Adair

Page 8: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

1. Management system for curators

E. USE EXISTING OPEN-SOURCE OR FREE SOFTWARE

Pros:• Large offer

• Very good and advanced software available

• Free of charge

• Unlimited use

• Good for collections with strong IT support and software

developers

• Extensions possible by local developers (not always)

Cons:• No complete solution for culture collections available

• Creation of pipelines needed and can be difficult to achieve in a user-friendly way

• Using open-source software is far from easy and in practice it may be impossible to

enter into the code of others Access to code can be an illusion

• Support might be a serious issue in case of problems … and there are always problems

with any software …

Page 9: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

1. Management system for curators

D. USE EXISTING COMMERCIAL SOFTWARE

Pros:

• Large offer

• Very good and advanced software available

• Support available

• Custom developments can be made by professionals

• Complete or near-complete solutions are available

for culture collections, so why reinvent the wheel ?

Cons:

• Few complete solutions for culture collections available

• Some solutions are not extensible/flexible or adapted to all collections

• Pure software companies have little biological background making it difficult

to communicate

• Costs associated with software can be important

• Maintenance costs should remain under control

“If you think that professionals are expensive, wait until you work with amateurs …” Red AdairButSelect the right professionals too …

Page 10: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

1. Management system for curators

F. DATABASE TYPES

NOSQL

Page 11: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

1. Management system for curators

F. DATABASE TYPES

Good:• Relational databases :

• MySQL

• PostgreSQL

• MSSQL

• Oracle

• Document based or other advanced databases

• MongoDB

• Vertica

• etc

• All data should be in databases

Not good:• Proprietary databases

• Catalogs on paper

• Word

• Excel

• MS-Access

• Filemaker Pro

Page 12: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

1. Management system for curators

G. DATABASE ACCESS & BACKUP

Good:• Backup 2x/day

• Sharding which is the process of storing data records

across multiple servers

• Live replication

• Databases should be physically close to application

especially for large data exchanges or sequence

alignments (for example)

Not good:• No backup

• Remote databases are slow

Page 13: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

1. Management system for curators

H. INSTALLATION OF SOFTWARE, VERSIONING INFORMATION AND

TECHNOLOGY (IT) RESOURCES NEEDS

Good:• No installation or simple or minimum (true for web

apps, less for desktop apps if installed on all computers)

• Hosted solutions are super easy for both IT and users

Not good:• Very complex installations or settings of parameters

• Some LIMS software can be extremely hard/long/expensive to set

• Client-server apps are more difficult to maintain if installed on all computers and

updates can be challenging

• IT costs can be high

• Salaries

• Servers, hardware, firewalls, SAN, etc

• Management software like VMware, etc

Page 14: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

1. Management system for curators

I. HOSTED SOLUTIONS

Pros:• No installation

• Super easy for both IT and users

• Available anywhere, anytime on any device (computer, smartphone, tablet, etc)

• Fast and reliable if good IT infrastructure behind and using Citrix

• Easy maintenance of software/databases

• No need to buy hardware (server, SAN, firewalls, etc)

• No need to buy and maintain expensive and sophisticated software for the management and the monitoring of

the system (VMWare vSphere, for example)

• No need to hire IT staff

• Continuous monitoring and support

• Given the number of services provided, hosted solutions are often much cheaper than running a complete

infrastructure in house

• Management of CC software and associated database can directly be connected to the website used for

publication of CC data

Cons:• Require recurrent payments (monthly or annually) which means that these costs must be part of the annual

budget of the CC

• Access to database engine might not be possible (only backups of databases could be asked from time to time)

• Dependency to the hosting company

• Need Internet connection to work

• Not possible for extremely slow or erratic Internet connections/networks

Page 15: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

1. Management system for curators

J. MOST WANTED FUNCTIONALITIES

We love:• Collection maintenance

• Strain distribution

• Research

• Screening

• Dynamic System (curators/researcher can change the system without the need for IT

or developers)

• Advanced security and access management

• Tracking of database modifications by each user

• Ability to import and export data as text, images, DNA trace files, microplate reader

data, MS-Excel, HTML, XML, FASTA, NCBI and more

• Linking or exportation of data to other websites such as GBIF, StrainInfo, NCBI, etc.

• Ability to create custom layouts such as invoices, catalogs, sample labels

• Strains stock management

• Customer information management

• Orders and invoices management

Page 16: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

1. Management system for curators

J. MOST WANTED FUNCTIONALITIES

We love:• LIMS module to manage and track DNA sequencing projects including revival

of strains from collection stocks, DNA prep, PCR, gels, viewing, aligning and

editing DNA sequences, and depositing consensus DNA sequences into the

database and online catalog

• Scripting and debugger tools to automate routine tasks and extend

functionalities of the software

• Integration of scripts within existing menus of the software

• Reporting functions allow export of data in many formats including tab

delimited, text, MS-word, MS-excel, HTML, FASTA, NCBI, etc.

• Integrated content management system for the administration of CC websites

and associated communication devices

• Polyphasic identification and classification, to identify and classify strains

based on a custom weighted combination of DNA

sequence, physiological, morphological and other

• Species determinations

• Cluster analysis using various algorithms such as UPGMA, WPGMA, Single

and Complete Linkage, Ward’s Minimum Variance, and Neighbor Joining

• Dendrogram generation

Page 17: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

1. Management system for curators

J. MOST WANTED FUNCTIONALITIES

We love:• Pairwise DNA sequence alignment.

• Multiple DNA sequence alignment

• Storage of data of many formats including text, dates, calculations, literature

references, DNA sequence trace files, electrophoresis gel photos, GPS

coordinates, microplate reader data (96 or 384 wells), and photos. Data types

can thus include morphological, physiological, molecular, chemical,

ecological, geographic, and literature reference data

• DNA gel analysis

• Cell size determination

• Import, manage, analyze and export spectral data such as MALDI tof or other

systems

• Generation of dynamic geographic distribution maps using Google Maps

• etc

Page 18: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

2. Publication of data for third parties

Curators want:• Direct access to published data.

• Easy/live release of new strains and associated data

• Restrict data access to Internet users/clients if needed

• Easy/live adaption of webpages and website content

• Websites should be seen as a way to communicate with clients and end-users. This

could be done by:

• simple webpages

• forums

• news systems

• Change the look and some functionalities of the website on the fly without the

intervention of website developers

• Allow deposit forms to be filled by depositors of strains without having to re-type all

data manually.

• Allow clients to easily select strains to be ordered via a Cart system

• Know pending orders, payments and data associated with any client

• Allow end-users searching their databases according to the specificities of their

collection

• Allow third parties to take advantage of their CC’s data to increase traffic to their

websites. This can be done via friendly URLs, simple or advanced web services

(REST, SOAP, etc.).

• etc.

Page 19: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

2. Publication of data for third parties

Clients want:• Easy searching system on as many features as possible

• Simple Cart system allowing easy (de-)selection of strains to

be ordered

• Not having to retype all personal or institutional information

each time they order strains

• Fast and easy communication with curators or sales

departments of the CC

• Frequently asked question (FAQ) section answering most of

their questions

• Etc.

Page 20: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

2. Publication of data for third parties

End-users want:• Easy searching system on as many features as possible

• Advanced query system allowing to combine queries in complex

ones using AND, OR and NOT operators (including brackets to

group conditions)

• Easy copy-pasting of data

• Easy exportation of selected data, manually or via software (web

services)

• Pairwise DNA or protein sequences alignments against reference

databases

• Polyphasic identifications and/or classifications against reference

databases

• MLST (or similar methods) allowing identifications or typing of

strains

• etc

Page 21: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

3. Interoperability

DATA STANDARDS AND PROTOCOLS

• BioSharing (http://biosharing.org/)

• Biodiversity Information Standards (TDWG; http://www.tdwg.org/)

• Genomic Standards Consortium (GSC;

http://en.wikipedia.org/wiki/Genomic_Standards_Consortium)

• etc

LINKS TO EXISTING RESOURCES

• STRAININFO

• WDCM

• TAXONOMIC DATABASES (MYCOBANK, DSMZ, ETC)

• GBIF

• INSDC (NCBI, ENBL, DDBJ, ETC)

• BOLD

• LIFEWATCH, BIOVEL, VIBRANT, LIFELINK, ELIXIR, Q-BANK, ETC

• MANY MORE …

Page 22: The development of the MIRRI ICT infrastructure for microbial … The development of the MIRRI ICT... · CBS-KNAW 1. Management system for curators B. MANAGE COLLECTION’S DATA USING

CBS-KNAW

Work in progress

We need your help, opinions, suggestions and critics

Contact us : [email protected]