Data standardization within a research information system framework - Steve Revucky

Preview:

Citation preview

CASRAI ReConnect 2015INTELLECTUAL PROPERTY AND SCIENCE

STEVE REVUCKY, PRE-SALES SOLUTIONS SPECIALISTMay 3, 2023

FOLLOWING AN EXAMPLE• Should be easy, right?

STANDARDIZATION IS THIS EASY

YOU SAY POTATO…– Classifications and standardizations exist so that we can

all understand each other and streamline communication– Taxonomies and lexica designed by organizations such

as CASRAI, among others, allow researchers, funders, partners, government authorities, and others to understand each other

– But what about the balancing flexibility with standardization?

CONVERIS integrates with internal and external systems

-VIVO, etc.

Engagement in Industry Standards

CONVERIS

CONFIGURATION• Workflow processes• Labels• Entities (create or adapt)• Roles and rights

Integrating Systems with Converis

ETL Concept

• Extract data in a certain format (CSV, XML, JSON, etc.) from a source location

• Transform and apply business logic to data including aggregation, counting, concatenation, scripting, lookups, merging, push files, etc.

• Load data in a certain format (CSV, XML, JSON) to a destination location

Extract Transform Load

ETL and Converis• General ETL (all output formats/steps allowed)

• Converis ETL (fixed output step)

Extract Transform Load

Extract Transform Load

Plugin is installed on your Converis serverIt needs to be installed on your workstation too

Implementing IntegrationsRequirements documents covers three points:

File Handling− Format (*.csv)− Location (/dir/*)− Frequency (e.g. nightly)

Data/Field Mapping− “hrID” = “converisID”− “surname” = “lastName”

Business Logic− What records should be added?− What updates/changes to data can/should be made in Converis?− Bidirectional integration?

Sample Banner Req. Doc

System architecture

Search EngineInstitutional Repositories

Internal Data Sources

Fin-system

HR-system

DatabasePostgreSQL

LoginServer

DSpaceFedora EPrints

External Data Sources

ScopusWoS PubMedORCID …

Apache Solr

Research AnalyticsPentaho

Kettle ETL

… Java Server Faces(JSF)

Mapping Engine

Business logic(EJB)

API

RESTWeb

services

OAI-PMH

CONVERISJava EE

GlassFish

Data Integration

CONVERIS is a JavaEE application following the typical JavaEE 3-tier-architecture with a modular design of user interface, business logic (i.e. functionality) and data management (i.e. data model)

CUSTOMIZATION (WITH LIMITS)• XML templates• Choicegroup modification• Field formatting

RESEARCH AREA CLASSIFICATIONS– Keyword classifications:

AUTHOR DISAMBIGUATION

TO THE WIDER WORLD

FUTURE POSSIBILITIES AND PLANS• Any structured data can be ingested into Converis• Fields can be mapped to existing or new fields• Each implementation is customized, so potential

exists to follow CASRAI guidelines:– CRediT – Contributor roles taxonomy

• Canadian Common CV (CCV) coming soon – early 2016

Thank you

Steve RevuckyPre-Sales Solutions SpecialistIP & ScienceThomson ReutersPhiladelphia, PA

Tel: +1 215 823 1760steve.revucky@thomsonreuters.com

Recommended