Future Asset Management Architecture - Statnett · AWS Amazon Web Services – cloud platform from...

Future Asset Management Architecture SAMBA WP4 report

Executive summary

This report defines and describes the existing and future architecture for asset management in

Statnett. TOGAF methodology has been used for assessing, analyzing and documenting the

architecture. The architecture has been described using multiple layers and viewpoints of ArchiMate

3.0 modelling language including strategy and motivation, application layer and technology layer.

The report builds on the results and conclusions made in other Big Data and Analytics related projects

at Statnett. In particular, it builds on the results of the Finbeck and Fia projects as well as the AutoDig

2.0 projects.

The report describes as well the future Big Data and Analytics platform and defines a number of

capabilities that are required from such a platform. The asset management solution itself is expected

to be a hybrid solution based on Big Data and Analytics platform and combined with functionality

implemented in several existing internal systems as well as new components.

This report describes as well several areas that are not yet addressed in the platform being currently

introduced at Statnett and need to be further explored. The most important among these areas are

cloud integration, advanced PaaS and SaaS cloud services offering advanced AI services like natural

language comprehension, data exchange APIs and gateways with third parties as well as improving the

infrastructure for ingestion of sensor data.

Contents

Abbreviations 6

1 Introduction 7

1.1 Underlying idea of the SAMBA-project 7

2 Methodology 8

3 Big Data and Analytics technology 11

3.1 Main On-premise Big Data distributions 11

3.2 Other on premise solutions 12

3.3 Cloud solutions 14

3.4 Other solutions 16

4 Current solution – BASELINE ARCHITECTURE 19

5 Big Data lake – TRANSITION ARCHITECTURE 22

5.1 AutoDig 2.0 project 22

5.2 ArcGIS environment 27

6 Reference Architecture – TARGET ARCHITECTURE 30

6.1 Overall Reference Architecture 30

6.2 Strategy and motivation layers 34

6.3 Capabilities and information needs 47

6.4 Business architecture 50

6.5 Overall Strategy and Motivation layer 53

6.6 Application Architecture 55

6.7 Capability to application component mapping 60

6.8 Technology Architecture 62

6.9 Technology to application component mapping 68

6.10 Governance Principles 72

6.11 Principles for Big Data and Analytics platform 72

6.12 APIs for ingestion and integration 74

7 Concluding remarks 76

8 References 77

V1 EA Diagrams 80

8.1 Strategy and Motivation 80

Abbreviations

AWS Amazon Web Services – cloud platform from Amazon

ADM Architecture Development Method

APM Asset Performance Management

BPMN Business process model and notation

CEN European Committee for Standardization

CENELEC European Committee for Electrotechnical Standardization

CIM Common Information Model

COTS Commercial Off The Shelf

CPU Central Processing Unit (processor)

DL Deep Learning

DSO Distribution System Operator

ETSI European Telecommunications Standards Institute

EA Sparx Enterprise Architect

ETL Extract, Transform, Load

GCP Google Cloud Platform

GPU Graphics Processing Unit

HDF Hortonworks Dataflow

HDFS Hadoop File System

HDP Hadoop Data Platform

MapRFS MapR Filesystem

ML Machine Learning

NIST National Institute of Standards and Technology

PaaS Platform as a Service

SaaS Software as a Service

SGAM Smart Grids Architecture Model

TOGAF The Open Group Architecture Framework

TSO Transmission System Operator

1 Introduction

The main objective of the WP 4 is to design and develop a reference ICT architecture that utilizes a

common integration environment and the “common data models” developed in WP3. The architecture

must facilitate openness, security, safety in addition to big data analytics and business intelligence by

rule-based filtering techniques. Open interfaces, message bus and message queuing, standardized

models and protocols are important.

In particular, the WP 4 provides:

- An overall description of different stakeholders, drivers, outcomes and tactics to address the needs related to Big Data and asset management at Statnett

- Available data sources and their requirements for the data harvesting services and suggestions for improvements with regards to data ingestion

- Specification of critical capabilities of the future asset management system and how the different use cases identified in WP 2 and WP 3 map to these capabilities

- Comparison of the reference architecture with industry standards and international suppliers as well as different best practice implementations

This report provides an overview of results of the WP 4 and describes the ICT architecture that could

in the future support the needs for data collection and analysis primarily within the asset management

domain in Statnett. In particular, this report describes the assessment and analysis of the architecture

with the input from WP 1 report [1], the use cases from WP 2-WP 3 report [2] and conclusions from

the WP 6 report [3].

The report is organized as follows. Chapter two contains information about the methodology for

assessing, analyzing, developing and documenting the architecture. Chapter three gives the overview

of the technology landscape when it comes to Big Data and Analytics field.

Chapter four provides information about the baseline architecture. Chapter five describes the

transitional architecture, which is being implemented as a part of the AutoDig 2.0 project.

Chapter six describes the reference model as defined by the Finbeck project as well as the implications

for asset management and the SAMBA project. Chapter seven and eight contain concluding remarks

and references, respectively. Appendix includes as well a full size version of the diagrams of the

architecture models described in this report.

1.1 Underlying idea of the SAMBA-project

Asset management in Statnett can be improved by utilizing new developments in ICT, such as big data

technology, data fusion and business intelligence. The underlying idea of the project is to use these

generic ICT-developments together with existing domain research results (such as models on ageing

and lifetime of power system components) to establish a reference architecture for data collection,

communication and handling. This can optimize maintenance and reinvestments through facilitating a

more efficient analysis of incipient failures, ageing mechanisms and remaining lifetime of power

system components.

The amounts of sensor data related to asset management can be overwhelming. Big Data and Analytics

technology is an important prerequisite to be able to gather, process, distribute and visualise this data

as well as to provide an open access to integrate and reuse this data as well in other systems.

2 Methodology

TOGAF (The Open Group Architecture Framework) has been used primarily as the main framework

and process for creating, assessing and documenting the architecture in the SAMBA project. TOGAF is

a framework for enterprise architecture that provides an approach for designing, planning,

implementing, and governing an enterprise information technology architecture [4]. In particular, the

following phases of Architecture Development Method (ADM) have been used:

Preliminary A. Architecture Vision B. Business Architecture C. Information Systems Architectures D. Technology Architecture

Figure 1 TOGAF – Architecture Development Method

The remaining phases (E-H) are not relevant in the projects like SAMBA, which focus on research and

do not directly intend to implement the architecture.

For documentation purposes, ArchiMate 3.0 has been used. ArchiMate defines three main layers:

Business, Application and Technology [5]:

Business layer describes business processes, services, functions and events. It describes the products and services offered to the customers and users

Application layer describes application services and components Technology layer describes hardware, communication infrastructure and system software

These three layers provide a structured way of bridging the different perspectives from business to

technology and infrastructure. The full model of ArchiMate 3.0 also brings or enhance another three

very useful layers:

Strategy and Motivation layer – introduced in 2016 in ArchiMate 3.0 for modeling of the capabilities of an organization and help explaining impact of changes on the business (gives better connection between strategic and tactic planning)

Implementation and Migration layer – supports modeling related to project, portfolio or program management

Physical layer – for modeling physical assets like factories

Primarily, In SAMBA, there have been created Business layer, Application layer and Technology layer

models in addition to Strategy and Motivation layer.

Moreover, as explained in WP1 report [1], the complex research challenges in this project are specific

to the transition towards the Smart grid. The SAMBA-project uses the Smart Grids Architecture Model

(SGAM) see Figure 2, from CEN-CENELEC-ETSI Smart Grid Coordination Group to describe the projects

central R&D challenges and scientific methods [6].

The SGAM framework consists of five interoperability layers representing business objectives and

processes, functions, information exchange and models, communication protocols and components.

Interoperability is an important issue and research challenge in smart grids.

The component layer covers the physical infrastructure; electrical components, sensors, networks,

routers, computers and so on that form the basis for any form of communication and information

gathering. Gaining an overview of this layer will be a starting point for the project. In the

communication layer, different protocols are used to send and receive data between components.

However, just enabling better communication does not guarantee that useful information is

exchanged.

Figure 2 SGAM framework

The information layer describes the data models and information objects included in use cases in order

for the information to be interpreted correctly when testing use cases. A data model using open

standards (i.e. CIM) is an important prerequisite for SAMBA-project.

In the function layer in Figure 2 functions and services are represented as use cases independent of

the physical realization in systems and components. The level ensures that the right information enters

the right process and the right actor. This represent a large research challenge, as information must

enter the asset management of Statnett, a high level process in any company.

3 Big Data and Analytics technology

This chapter discusses and describes different Big Data technologies and architectures, both on-

premise and in cloud that have been assessed in the course of the project. This includes the technology

already selected and acquired as basis for the initial Big Data and Analysis solution in Statnett.

The overall Big Data technology landscape is extensive spanning the infrastructure, storage, analytics,

data source and API tools and applications. This has been summarized by following overview by Matt

Turck [7] (See Figure 3).

Figure 3 Big Data Landscape

3.1 Main On-premise Big Data distributions

Several Big Data technology suppliers developed their own software suites for Big Data containing

Hadoop and other components. These software suites are called Hadoop distributions. These

distributions package multiple tools / technologies into a technology stack ready for customers to use.

Suppliers often offer technical support as well as a comprehensive product with several

complementary tools that can be customized for specific tasks.

Hortonworks Data Platform

Hortonworks was established in 2011 and is the only distribution that uses pure Apache Hadoop

without any proprietary tools and components. Hortonworks Data Platform is also the only pure

Open Source project of all three distributions. Hortonworks is now also an integral part of the IBM

BigInsights [8].

In 2016 Hortonworks has created a separate line of product for processing streaming data.

Hortonworks Dataflow (HDF) is optimized to ingest, curate and handle data in flow and contains several

additional tools to facilitate that, e.g. NiFi, MiNifi and Schema Registry [9]

Cloudera

Cloudera was one of the first Hadoop distributions, established in 2008. Cloudera is based to large

extent on Open Source components, but not as much as Hortonworks. Cloudera is easier to get

installed and use than Hortonworks. The most important difference from Hortonworks is the

proprietary management stack [8].

MapR does not use the HDFS file system, but swaps it with a proprietary MapRFS. This is due to that

MapRFS gives better robustness and redundancy and largely simplified use. Most likely the on premise

distribution that offers the best performance, redundancy and user friendliness. MapR improves also

performance of other components, including Hbase (called MapR DB). MapR offers also extensive

documentation, courses and other materials [8].

3.2 Other on premise solutions

Oracle Cloudera

Oracle Cloudera is a joint solution from Oracle/Cloudera. Oracle based their Big Data platform on a

Cloudera distribution. This distribution offers some additional and useful tools and solutions that give

increased performance, in particular Oracle Big Data Appliance, Oracle Big Data Discovery, Oracle

NoSQL database and Oracle R Enterprise.

Oracle Big Data appliance is an integrated HW and SW Big Data solution running on a platform based

on Engineered Systems (like ExaData). Oracle adds Big Data Discovery visualization tools on top

of Cloudera/Hadoop while Oracle R Enterprise includes R – an open source, advanced statistical

analysis tool [8].

IBM BigInsights

IBM BigInsights for Apache Hadoop is a solution from IBM that also builds on top of Hadoop. BigInsights

offers in addition to Hadoop, some proprietary tool for analysis like BigSQL, BigSheets and BigInsights

Data Scientist that includes BigR.

IBM BigInsights for Hadoop also offers BigInsights Enterprise Management solution and IBM Spectrum

Scale-FPO file system as an alternative to HDFS [8].

SAP HANA and Vora

SAP HANA is an in-memory, column-oriented, relational database management system developed and

marketed by SAP SE. Its primary function as a database server is to store and retrieve data as requested

by the applications. In addition, it performs advanced analytics (predictive analytics, spatial data

processing, text analytics, text search, streaming analytics, graph data processing) and includes ETL

capabilities as well as an application server [10].

SAP HANA Vora is an in-memory computing engine designed to make big data from Hadoop more

accessible and usable for enterprises. SAP developed Vora out of SAP HANA as a way to address specific

business cases involving big data. Hadoop offers lower-cost storage for vast amounts of data, but

adoption initially lagged in the enterprise because the data in a data lake is unstructured and can be

hard to deal with. SAP HANA Vora builds structured data hierarchies for the Hadoop data and

integrates it with data from HANA to enable OLAP-style in-memory analysis on the combined data

through an Apache Spark structured query language (SQL) interface [11].

OSIsoft PI

OSIsoft PI is a suite of software products that are used for data collection, historicizing, finding,

analyzing, delivering, and visualizing. It is marketed as an enterprise infrastructure for management of

real-time data and events. The term PI System is often used to refer to the PI Server but the two are

not the same. The PI System refers to all OSIsoft software products whereas the PI Server is the core

product of the PI System [12].

The following table gives a quick overview of main on-premise Hadoop distributions and their features

[8] [13].

Table 1 Comparison of most important Hadoop distributions (based on: “Hadoop buyers guide”) [8] [13]

Category Feature Hortonworks Cloudera MapR

Data access

SQL Hive Impala MapR-DB

Impala

SparkSQL

NoSQL HBase

Accumulo

Phoenix

HBase HBase

Scripting Pig Pig Pig

Batch MapReduce Spark

MapReduce

Search Solr Solr Solr

Graph/ML

GraphX

Mahout

Kudu MySQL

File system access Limited, not

standard NFS

Limited, not

standard NFS

HDFS, read/write NFS

(Posix)

Authentication Kerberos Kerberos Kerberos and native

Streaming Storm Spark Storm

MapR-Streams

Ingestion Ingestion Sqoop

Operations Scheduling Oozie

Category Feature Hortonworks Cloudera MapR

Data lifecycle Falcon

Cloudera Navigator

Resource

management

YARN YARN

Coordination ZooKeeper

ZooKeeper

Sahara

Myriad

Security Security

Sentry

Record Service

Sentry

Record Service

Performance Data ingestion Batch Batch Batch and streaming

(write)

Metadata

Architecture

Centralized Centralized Distributed

Redundancy

HA Survives single fault Survives single fault Survives multiple faults

(self-healing)

MapReduce HA Restart of jobs Restart of jobs Continuous without

restart

Upgrades With planned

downtime

Rolling upgrades Rolling upgrades

Replication Data only Data only Data and metadata

Snapshots Consistent for

closed files

Consistent for

closed files

Consistent for all files

and tables

Disaster recovery None Scheduled file copy Data mirroring

Management

Tools Ambari

Cloudbreak

Cloudera Manager MapR Control System

Heat map, alarms Supported Supported Supported

ReST API Supported Supported Supported

Data and job

placement

None None Yes

3.3 Cloud solutions

IBM Cloud – Watson Data Platform

IBM provides a comprehensive solution for cloud based data platform. Watson Data Platform for data

ingestion, data storage and analytics [14].

Amazon EMR

Amazon EMR (Elastic Map Reduce) is a Hadoop distribution put together by Amazon and running in

Amazon cloud. Amazon EMR is easier to take into use than on premise Hadoop. Amazon is absolutely

the biggest cloud provider but when it comes to Big Data its solution is relatively new compared to

Google [8].

Microsoft Azure

Microsoft offers three different cloud solutions based on Azure: Hadoop based HDInsights, HDP for

Windows and Microsoft Analytics Platform System.

The following table gives a quick overview of main cloud based Hadoop distributions and their features

[8] [13].

Google Cloud Platform

Google offers also Big Data cloud services. The most popular service in GCP (Google Cloud Platform) is

known as BigQuery (which is a SQL like database), Cloud Dataflow (processing framework) and Cloud

Dataproc (Spark and Hadoop services). Google has been working on Big Data technologies for a long

time, which gives a good start point when it comes to advanced Big Data tools. GCP offers analysis

and visualization tools as well as an advanced platform to test the solutions (known as Cloud Datalab)

Table 2 Comparison of most important Big Data cloud solutions [8] [13]

Category Feature Amazon

Services

(HDInsights)

IBM Cloud

Watson Data

Platform

Google

Cloud Platform

Data access

File system

storage

Hadoop Cloud Object

Storage

Cloud Storage

NoSQL HBase HBase Cloudant Cloud Bigtable

SQL Hive

Presto

Hive DB2 on Cloud BigQuery

Cloud SQL

RDBMS Phoenix Compose Cloud SQL

Batch Pig

Map Reduce

Cloud Dataflow

Streaming Spark Storm

Streaming

Analytics

Google Cloud

Pub/Sub

Script

Search

Ingestion Ingestion Sqoop Streaming

Analytics

Cloud Dataflow

Visualization Visualization

Data Science

Experience

CloudData lab

Category Feature Amazon

Services

(HDInsights)

IBM Cloud

Watson Data

Platform

Google

Cloud Platform

Analytics Machine

Learning

Mahout R Server

Azure Machine

Learning

Streaming

Analytics

Analytics Engine

Google Cloud

Machine Learning

Speech API

Natural Language

Translate API

Vision API

Operations

Logging

Error reporting

Coordination ZooKeeper

Scheduling Oozie

Resource

Management

HCatalog

Cloud Console

Cloud Resource

Manager

Monitoring Ganglia

Monitoring

3.4 Other solutions

Predix

Predix is General Electric's software platform for the collection and analysis of data from industrial

machines. General Electric plans to support the growing industrial IoT with cloud servers and app store.

Predix as a cloud-based PaaS (Platform as a Service) is claimed to enable industrial-scale analytics for

asset performance management (APM) and operations optimization by providing a standard way to

connect machines, data, and people. Predix provides a microservices based delivery model with a

distributed architecture (cloud, and on premise) [15].

Figure 4 GE Predix platform

Insights Foundation for Energy

IBM® Insights Foundation for Energy is an energy analytics, data management and visualization

software solution for utility and energy companies. It provides a single energy analytics platform to

support various analytic applications. This includes situational awareness visualizing patterns,

predicting actions and connecting data points to derive insights, predictive maintenance using

historical data to determine asset repair or replacement, and asset health and risk analytics to measure

asset status and assess risk and consequences in near real-time. It is available through IBM software-

as-a-service (SaaS) subscription services or as an on premise solution [16].

IFE (Insights Foundation for Energy) creates operational insights based on energy analytics to optimize

business outcomes, provides a single energy analytics platform that can expand over time to meet

evolving analytics needs and unifies systems and business processes for more innovative, effective

business procedures [16].

Figure 5 IBM IFE

ABB Asset Health Center

ABB also offers an energy analytics, data management and visualization software solution based on

the Azure cloud platform and Cortana. ABB Asset Health Center uses predictive and prescriptive

analytics, as well as customized models incorporating industry expertise, to identify and prioritize

emerging maintenance needs based on probability of failure and asset criticality. ABB Asset Health

Center offers ingestion of asset and sensor data in the Azure BLOB Storage as well as Azure SQL

Database, Azure Machine Learning and Power BI visualization [17].

Cognite

Cognite is a Norwegian company specializing in customized Big Data, Analytics and IoT solutions mainly

for the Energy sector (offshore), in particular Aker BP and Kværner. It is based on several components

from the Google Cloud platform.

Kongsberg Digital

Kongsberg Digital has also build a similar platform for the Energy (offshore). The platform from

Kongsberg Digital is based on the Microsoft Azure cloud platform.

4 Current solution – BASELINE ARCHITECTURE

Current status for asset management has already been documented in SAMBA through reports from

WP1 [1], WP2 and WP3 [2]. Here is the summary of the most important findings.

Statnett’s ICT-support for asset management has been developed over time, in the form of different

information systems, and often based on a per need – approach. See Figure 6 for the asset

management ICT-landscape.

Figure 6 Asset management system landscape

Table 3 describes most important components of AS-IS architecture for asset management.

Table 3 Most important components in AS-IS architecture for asset management

Component Layer Comment

AutoDig Visualization

Data Store

Fault analysis tool which collects and presents data from various

sensors and systems in an efficient way

Innsikt / HIS web Visualization Visualization / analytics platform at Statnett as well

DDK-GUI Visualization Visualization of asset data from various sources in a tabular way.

Front-end to SYSBAS.

ArcGIS Visualization

Data Store

Map visualization tool at Statnett

IFS Visualization

Data Store

ERP system

Innsikt /

BiCycle

Analysis / visualisation

GUIAnleggs

guiden

webTKP

(PMU)DFR

Spider/

EMSRelays Lightning

quality

meters

Innsikt/

GISIFS

common

DIGFASIT

Data store / hub

Data sources and sensors

Component Layer Comment

TPV-T/TPV-T Visualization Total planning tool with visualization of all stations and

switchgear

TKP Visualization Project module for overview of activities

Bicycle Visualization A maintenance DWH solution serving as visualization and

planning tool for asset management

FASIT Visualization

Data Store

System for handling the fault reports from Statnett and

Norwegian DSOs

SYSBAS Data Store Data hub for asset data from various sources

FOS common Data Store Landing area for asset data provided by the Norwegian DSOs

Innsikt DWH Data Store Storage part of Innsikt DWH

The most important systems for asset management are IFS, SYSBAS and FOS. In addition, there are a

couple of fault analysis systems, which are also important for asset management. Those include

AutoDig and FASIT. Innsikt as a common analysis platform naturally plays an important role for asset

management. His Web stores historical data.

Among all the systems, it is important to mention TPV, TKP and BiCycle. TPV-T ("Total Planning Tool")

database is practically a "mirror" of IFS, showing data for all Statnett stations, with switchgear in all

voltage levels and all components/equipment with technical data and age. Equivalent for overhead

lines and cables. The tool generates proposal for "equipment replacement measures" based on age of

the different type of components. TPV-P (project module) gives an overview of activities and is used

to group activities together, manually. BiCycle on the other hand is a specialized analytics solution for

RCM (Reliability Centered Maintenance).

The Statnetts ERP system – IFS is the kernel of the asset database and asset management functionality.

However, most of the analyses are performed in a series of additional tools which combined solve

most current user needs. However, the analyses are fragmented and mostly have different logic for

data collecting and storage. The current architecture is not a good basis for growth. The largest data

storage is a traditional data warehouse with a BI-tool on top.

Today Statnett has still not realized the possibilities that big-data-concepts can provide. The main

reasons for this is:

Data is not easily accessible for access, integration and sharing, often locked in proprietary

systems

There is no common data store / data hub which makes it possible to access and assemble

data from various sources

There is no uniform way of collecting the data from the sensors as well as the distribution

systems for collecting these data are often unreliable, not monitored, not properly maintained.

Data is often of poor quality, delayed or missing

Organizational silos which make it difficult and time consuming to integrate the systems

Use of obsolete integration paradigms (i.e. SOA – Service Oriented Architecture) which

mandate exchange data of and restrict sharing of data

Current analytics platform (i.e. Innsikt) do not provide sufficient capacity and performance to

implement the asset management use cases efficiently

5 Big Data lake – TRANSITION ARCHITECTURE

This chapter describes the BigData and Analytics platforms at Statnett: the platform to be developed

within the AutoDig 2.0 project and the ArcGis environment.

The AutoDig 2.0 project is an important first step on the road to implement the future Big Data and

Analytics platform at Statnett. The platform will be further developed in another project, the Finbeck

project, which defines the long term roadmap and high level reference architecture of the Big Data,

Analytics, IoT and adjacent areas. The Finbeck project will extend the architecture and the platform

with additional software components and features as suggested in the long-term strategy for Big Data

and Analytics.

5.1 AutoDig 2.0 project

Statnett is in the process of implementing a platform for BigData and Analytics as a part of the AutoDig

2.0 project. In this project we test a solution that will be crucial for the future development of the asset

management platform to support the needs described in the SAMBA use cases and not only.

AutoDig is a system for acquisition, sorting, presentation and analysis of information regarding power

system disturbances [18]. The software in use today is a prototype developed within an R&D project.

There prototype has been successfully taken into use but there is a need to develop an improved, more

stable and efficient tool to help perform this analysis work. Statnett has initiated a project, which will

deliver a new and improved operative solution in close integration with Statnett's ICT infrastructure.

The AutoDig 2.0 system will gather, store and analyze large amounts of data collected from multiple

sources and sensors in the network (See Table 4 and Figure 7).

Table 4 AutoDig 2.0 data sources

Data source Description

PMU data Multiple time series (1 kHz sampling)

DFR (Digital Fault Recorder) Time series

Power Quality Measurements / Elspec

Several time series containing aggregated parameters (50 Hz sampling) and raw data time series (50 kHz sampling)

Power Quality Measurements / Metrum Several time series containing aggregated parameters

Distance Relay Protection Comtrade

Operation and Maintenance Database Events, breaker positions, network configuration and operational measurements (P, Q, I, U, f)

Network Repository Power grid model

Time variable data /met.no Weather and lightning data

ERP Asset data

Operation Management Support system Operation and fault reports

The AutoDig 2.0 solution that Statnett is implementing is based on the use of a Big Data lake / Data

Lake architecture (see Figure 7) which includes the following elements:

Data collected from multiple sensors and data sources, after initial processing (ELT1 / ingestion). This data is stored in a BigData lake

All relevant data is stored in the BigData Lake in a structured format, both in a CIM (common information model) format or as time series

Data stored in the BigData lake must be available for reuse in new / future applications and solutions at Statnett

The solution consists of analysis and visualization components, as detailed in For the analyses

performed with the help of AutoDig 2.0 it is crucial that the time elapse from the moment the data

should be available till ingestion and the result are presented is kept as low as possible (preferably

below one minute).

Table 5.

Figure 7 AutoDig 2.0 incl. Big Data lake

1 ELT – Extract Load Transform

In AutoDig 2.0, collected data will be retained for a long time and be available for use in future analyses.

Statnett aims to be able to retain the raw data up to 10 years and processed/aggregated data in at

least 60 years or the lifetime of the assets.

For the analyses performed with the help of AutoDig 2.0 it is crucial that the time elapse from the

moment the data should be available till ingestion and the result are presented is kept as low as

possible (preferably below one minute).

Table 5 AutoDig 2.0 application components

Component Description

AutoDig Dashboard Tailored web app providing a consolidated and configurable work surface and integrating data visualization components, analysis components (Advanced Analytics and Self-service Analytics) as well as visualization of data stored in the Big Data lake. AutoDig Dashboard will also be used to configure and select a set of triggers/criteria for performing analysis as well as perform analyses when these criteria are satisfied

Advanced Analytics COTS2 component for visualization and self-service analytics

AutoDig Analysis Engine

Component that will analyze collected data. Currently implemented in MatLab along with a number of MatLab algorithms

AutoDig AI/Rule Engine

component that detects patterns in data (both model based analyses as well as machine learning and pattern recognition)

Figure 8 presents high-level design of the Big Data lake platform and its components.

The Big Data lake will allow APIs to access the data and data ingestion. The storage and processing

infrastructure will primarily support structured storage of the time series (used to store sensor data)

and measurements as well structured storage of files used for storing raw data. The Big Data lake will

support real time processing, batch processing and analytics functions.

The initial Big Data and Analytics platform currently being introduced at Statnett consists of the

following main software components:

IBM BigInsights and Hortonworks the acquired platform consists of IBM BigInsights component, however as IBM is in process of restructuring it, in practice the platform will consist of Hortonworks 2.6 as the main component

IBM BigSQL IBM Streams Tableau Server and Desktop IBM SPSS Modeler IBM BigR

These software components are described in the subchapters below and also presented on Figure 25.

2 COTS – Commercial Off The Shelf

Figure 8 High-level design of the Big Data Lake platform

deployment BDL drawing - eng

Big Data Lake

Storage and processing infrastructure

Information consumers

Information sources

FOSWebFault Management

Batch processing

CIM objects

Market and Settlement

systems

Operation

Management Support

Operation

Management Support

ERPAutoDig

AnalyticsRealtime processing

Structured time seriesStructured file storage

Data Science Tools

VideoPower quality

Asset dataOscillation registrationDistance Relay

Protection

Distance Relay

Protection

Digital Fault RecorderSCADA

LightningMet.no

Hortonworks Data Platform

Hortonworks Data Platform (HDP) is a scalable open source Hadoop distribution and platform for

storing, processing and analyzing large amounts of data [19]. See also Chapter 3.1 for more details

about on premise Hadoop distributions.

Figure 9 Hortonworks platform [19]

IBM BigSQL

IBM provided BigSQL is a SQL layer on top of Hadoop/HDFS, which makes it possible to create tables

and query data using the SQL syntax. The SQL query engine supports joins, unions, grouping, common

table expressions, windowing functions, and other familiar SQL expressions.

Depending on the nature of the query, the data volumes, and other factors, Big SQL can use Hadoop's

MapReduce framework to process various query tasks in parallel or execute query locally within the

Big SQL server on a single node. [20]

Figure 10 IBM BigSQL

IBM Streams

IBMs provided advanced computing platform that allows user-developed applications to ingest,

analyze, and correlate information as it arrives from real-time sources. The solution can handle very

high data throughput rates, up to millions of events or messages per second. [21]

Tableau Server and Desktop

Tableau is an advanced and highly performant visualization tool. It is an industry leading BI tool that

focuses on data visualization, dash boarding and data discovery [22].

IBM SPSS Modeler

IBM provided SPSS3 is a statistical tool from IBM used for non-batch and batch statistical analysis [23].

IBM SPSS Modeler is a part of the SPSS suite, which provides a set of data mining tools to develop

predictive models using business expertise and deploy them into operations to improve decision-

making. IBM SPSS Modeler supports a variety of modeling methods taken from machine learning,

artificial intelligence, and statistics. [24]

5.2 ArcGIS environment

The ArcGIS environment at Statnett is also a BigData&Analytics platform with the following main

components:

- The GeoAnalytics Server

- The GeoEvent Server

- Image Server

3 SPSS was originally named Statistical Package for Social Sciences

- Insights for ArcGIS

GeoAnalytics Server and GeoEvent Server are a powerful combination.

Statnett uses GeoEvent as a development and production environment to streamline and analyze

lightning data and ship data in real time. GeoEvent has, among other things, great potential for use

with real-time sensor data.

GeoAnalytics, can be used in combination with scripting in Python and can use the archive data from

GeoEvent, which Statnett stores in spatiotemporal big data bars, as well as from various forms of

shares (Hadoop, AWS / Azure Cloud, etc.).

Figure 11 ArcGIS Big Data and Analytics landscape

ArcGIS GeoAnalytics Server

ArcGIS GeoAnalytics Server is designed to handle the analysis of massive datasets. GeoAnalytics tools

are a subset of Esri geoprocessing tools that use distributed and parallelized computing to run space-

time analyses on extremely large datasets. These tools can be executed using the Portal for ArcGIS

map viewer, ArcGIS Pro, the ArcGIS Server REST API, or from the new ArcGIS API for Python. ArcGIS

GeoAnalytics Server can connect to data from the Hadoop Distributed File System (HDFS), Hive, local

file shares, and data from within ArcGIS Enterprise, including using the archived spatiotemporal output

from ArcGIS GeoEvent Server as input. Because ArcGIS GeoAnalytics Server uses the base ArcGIS

Enterprise deployment to write and store analytical output, it is easy to use and share the resultant

layers and data [25] [26].

ArcGIS GeoEvent Server

ArcGIS GeoEvent Server is designed to handle high-volume, high-velocity real-time and streaming data.

It provides solutions through on-the-fly analysis and dynamic aggregation of large datasets, which

makes data visualization simple. When connected to the base ArcGIS Enterprise deployment, ArcGIS

GeoEvent Server can archive data to the spatiotemporal data store for further data analyses. [27] [28]

ArcGIS Image Server

ArcGIS Image Server provides serving, processing, analysis, and extracting value from massive

collections of imagery, rasters, and remotely sensed data. [29] [30]

Insights for ArcGIS

Insights for ArcGIS is a web-based, data analytics workbench where you can explore spatial and non-

spatial data

Insights for ArcGIS is somewhat similar to Tableau, and can be used for example against real-time data

stored in our internal Spatiotemporal Big Data Store via GeoEvent Server. The features in GeoAnalytics

server can also be used from Insights. [31] [32]

6 Reference Architecture – TARGET ARCHITECTURE

6.1 Overall Reference Architecture

The architecture for smarter asset management is aligned with the overall conceptual model for the

reference architecture at Statnett developed in the Finbeck project.

The Finbeck project has assessed several reference models defined by international institutions and

third parties. The most relevant reference architecture to be adopted by Statnett is the one defined

by the National Institute of Standards and Technology (NIST) in 2015. NIST reference model is a

supplier-neutral, technology and infrastructure-independent conceptual model for Big Data

architecture.

Figure 12 NIST Reference Model

The most important elements of a reference model as defined by NIST are:

System Orchestrator - ensures system requirements. This applies to business, architecture,

management, policy and resource requirements. In addition, the system orchestrator must

also monitor the system's compliance with the requirements. The system orchestrator role is

typically taken care of by one or more actors; which can be both human and machinery

(software), possibly a combination of the two

Data Provider - different data providers, which provide system data. An important

characteristic of a Big Data system is the ability to import and use data from a variety of

different sources in different formats. Examples of sources: internal and public documents,

images, audio files, video, sensor data and logs. Asset management and asset health

management systems are examples of systems that can be source of data as well

Big Data Application Provider - ensures execution of the data life cycle in accordance with the

security requirements and requirements set by the system orchestrator. The life cycle of the

data consists of five main activities that are relatively similar to those found in traditional

data processing systems. The difference now is that data characteristics in Big Data systems

(volume, speed and variation, etc.) require a radical change in the data processing

mechanisms. These must be customized and optimized to, for example, be able to reach

response time requirements in a world of ever-increasing data volumes. The five main

activities in the Big Data Application Provider are Collection, Preparation/curation, Analytics,

Visualization and Access

Big Data Framework Provider - most of the progress made in recent years has been on

frameworks that scale performance even though the data sets being processed have Big Data

characteristics (volume, velocity, variation, etc.)

Data Consumer - is the end user, which can be either a person or another system that

consumes data. Data from the analysis and visualization activities are accessed through the

service interface offered by the Big Data Application Provider. The communication can either

be pull-based where the Big Data Application Provider responds to Data Consumer requests

or be power / push based where Data Consumer listens for automated output from the Big

Data Application Provider. All decision levels within asset management are example of

systems that can be consumers of data.

Another important framework that has been used as a basis for the architecture of the Big Data Lake

in AutoDig project is the IBM Reference Model for Big Data and Analytics presented on Figure 13.

Figure 13 IBM Reference Model for Big Data and Analytics4

4 The IBM Reference Model has been created and provided to Statnett as a part of the AutoDig 2.0 project

Data Sources Analytical Data

Lake Storage

Security

Platform

Information Management & Governance

Actionable

Insight

Analytics In-Motion

Enhanced

Applications

Discovery & Exploration

Analytics Operating System

Ingestion &

Integration

Access

New sources

Traditionalsources

Data acquisition & application

access

IBM has divided their model into 12 different areas with an increased focus on the Analytics (Analytics

in-motion and Analytical Data Lake Storage) and the consumer side (Discovery and Exploration,

Actionable Insights and Enhanced Applications).

In a course of multiple workshop and discussions in the Finbeck project and with input from the NIST

and IBM models, Statnett has defined its own reference architecture (Figure 14). Statnett’s Overall

Reference model has been divided into four main areas: data provider, big data and analytics platform,

data consumer as well as security and governance.

The Big Data and Analytics platform consists of several high level components including ingestion,

distribution, analysis, storage and access (Figure 14). The high-level architecture defined by Finbeck is

matching the NIST reference architecture except for the visualization component. Visualization

components can exist both inside and outside the reference architecture. In Statnett visualization has

been defined outside the platform. In practice there will be a few technical software components also

implemented as a part of the Big Data and Analytics platform5.

Figure 14 Overall Statnett reference architecture model for Big Data & Analytics

The descriptions of components in the high-level reference architecture and their relation to asset

management are explained in the following table.

Table 6 Descriptions of components in the High Level Reference model as defined by Finbeck project

Data provider Considered as a component outside of the reference architecture. Detailed architecture will

still contain a description of which data sources the data platform will handle at all times.

Data sources could be systems and sensors. Asset management is an example of system that

can be data provider as well

5 I.e. Tableau or Cognos

Ingestion Data can be retrieved from several different data sources, which must be collected and

integrated with the data platform for further handling. The components that will handle this

will be described under Ingestion component.

Distribution Data needs to be distributed from source to consumer using one or more distribution

mechanisms. For Statnett, distribution of data consists mainly of data processing, in addition

to data handling historically using various storage technologies

Analysis The data needs to be processed in different ways, i.e. in real time (such as data streams) and

batch wise. Parts of the data processing will also handle data storage. The analysis will also

say something about the platform's ability to ensure that data supports advanced analysis

such as machine learning, deep learning, etc.

Batch Processing data batch wise, i.e., a periodization in handling the data. This means that data is

collected over time before it is distributed in the system. Data that does not need to be

visualized or analyzed in real time will normally be handled batch wise

Real time Statnett has large amounts of data handled in real time. In order for these data to be

distributed to more consumers, the platform must be able to handle flow data to meet new

needs and analyzes. Data ingested from the data sources should be able to flow as fast as

they occur in the sensors, source systems or external parties

Storage The data platform must contain several different storage components to ensure access to

historical information, traceability and access to real time information. The data platform

must handle storage such as relational databases, distributed storage, graph databases and

time series. Some storage will also be handled in processing (intermediate storage of data)

Access Data must be made available to different consumers and the architecture must support

several different ways of making available the data, consisting of API / HMI and search

API/HMI APIs (Application Programming Interface) and Human Machine Interface (HMI) are

components that will make data on the platform available to persons / systems on the

outside of the data platform. This also includes APIs that ensure the exchange of data to

external actors

Search The data platform will provide a fast, secure and easy access to the data you need. This will

require a form of search function, or Data Catalog, containing metadata about what is stored

within the architecture

Security and

governance

Security and governance provides a description of mechanisms for access control,

monitoring and safe handling of data stored in the solution including the data exchange with

external actors.

Visualization The architecture must support visualization of data and / or analyzes. Applications for

visualization can be seen as consumers for the data contained in the data platform. These

are key applications for realizing the business needs of Statnett, and one of the key

consumers. Certain technical components of the visualization will still need to be provided

as a part of the platform.

Data consumer Consumers are stakeholders of the architecture, and are described as the people or systems

that will need access to data stored in the data platform. Asset management and asset health

management systems are example of systems that can be consumers of data

6.2 Strategy and motivation layers

As explained in chapter 2 ArchiMate 3.0 defines different layers to document the architecture. This

chapter will focus on Strategy and motivation layers and sums up requirements and expectations that

a Big Data platform for SAMBA has to meet including:

explicit requirements from the projects: implicit requirements gathered from various sources incl i.e. eSmart report [33] and other

reports.

The following subchapters explain the link between drivers, goals, tactics and capabilities that a future

SAMBA platform will support.

Strategy layer

The big data and analytics reference model analyzed and described in the Finbeck project is based on

TOGAF methodology and described using ArchiMate. The Finbeck reference model has been based on

the outcome of the analysis of the strategic aspects of the architecture using the ArchiMate 3.0

Strategy layer. The strategy layer explains the impact of technology changes on the business. In our

case the strategy layer explains also how the capabilities of Big Data and Analytics platform relate to

the overall strategic drivers, goals and outcomes and how they support the expectations from the

stakeholders. The strategy layer has been created based on interviews with several stakeholders in the

organization and with the input from earlier phases in SAMBA. Figure 15 presents one of the early

SAMBA models from WP1 that shows different elements including roles and stakeholders that are

important for asset management.

Figure 15 Elements in asset management – SAMBA model for asset management introduced in WP1

Fia6 project has also provided important input in the assessment process, which has been used to

identify and align different roles and stakeholders in the Strategy layer. Figure 16 presents main

segments and information categories in Statnett as defined by Fia. It is apparent that asset

management has been identified a one the most central segments by Fia project.

Figure 16 Segments and information categories identified in the Fia project

During the process of assessment and analysis of the strategy layer, there have been identified eight

main stakeholders/roles in Statnett for which the Big Data and Analytics is of relevance:

Grid Owner - this is one of the three main responsibilities that Statnett has been chartered from the authorities and in which Statnett acts as the owner of the Norwegian transmission grid and the cable connections to abroad. Grid owner role is also the one where asset management plays a central part.

Grid Development - this is another of the three main responsibilities of Statnett. Grid development is about planning the future grid to meet the future needs not only for Statnett but also for the complete Norwegian power system.

System Operation - is the last of three main responsibilities of Statnett. System operation is about operating the transmission system, ensuring balance in the system as well as ensuring fair and equal treatment of all the market actors.

NVE7 – Norwegian regulator

6 Fia project focuses on Information Architecture at Statnett 7 NVE stands for The Norwegian Water Resources and Energy Directorate

Pow er grid models

Longterm planning

Asset management

Operations short term planning

Operations Actors

Observationsand

measurments

CFO, CIO, CEO and CISO8 - internal Statnett stakeholders External stakeholders, i.e. other DSOs, TSOs, research institutions, universities, consultants

and so on, performing analysis on Statnett data or sending data to Statnett.

On the lower level, there is a number of stakeholders and roles, which support the main roles.

However, according to the evaluation performed in Finbeck and SAMBA projects, currently only a

limited set of the roles and stakeholders actually relate and are affected by adoption of Big Data and

Analytics technology. The most affected roles are fault analysis, asset management, system operation

and long term planning. There have not been identified any direct relations for e.g. CFO nor short term

planning. Neither, market nor settlement were identified as any major users of the Big Data and

Analytics technology at the time this report was written.

The strategy model presented here covers all areas, which require use of the Big Data and Analytics.

Here, for the purpose of SAMBA project mainly focuses on the asset management however due to the

WP 6 [3] focus on risk monitoring the other roles, in particular fault analysis and system operation, are

also of interest.

Figure 17 presents the strategy layer of the Big Data & Analytics reference architecture. A full size

Enterprise Architect (EA) diagram is also attached in Appendix V1.

8 CFO - Chief Financial Officer, CIO – Chief Information Officer, CEO – Chief Executive Officer, CISO – Chief Information Security Officer

Figure 17 Reference Model - Strategy Layer

motivation Archimate3 Strategy Layer

Fault Analysis Engineer

Grid OwnerSystem Operation

Incident costs

Strategic Use of Modern

TechnologyNetwork balanceGrid costsAnalysis costs

Increase efficiency and safety

Optimize maintenance costs

Future Asset Management Architecture - Statnett · AWS Amazon Web Services – cloud platform from...

Documents

Explanatory document to Energinet, Fingrid, Statnett and ...nordicbalancingmodel.net/wp-content/uploads/2019/... · Statnett and Svenska kraftnät proposal in accordance with Article

HBL Asset Management Limited - Amazon S3

RENEWABLE ASSET - Amazon Web Services

Brukerforum 20-02 22.02.2020 Statnett SF, Oslo...2020/08/01 · 22.02.2020 Statnett SF, Oslo Brukerforumsaker og forbedringer - prioritert, under arbeid Key SaksID Summary Priority

2018 HECF Asset Sustainability Review - Amazon S3 · Core Fund 2018 HECF Asset Sustainability Review As of 31 December 2018. ... is a Luxembourg domiciled investment fund sponsored

Process Bus Implementation for Statnett

Asset Management - Amazon Web Services · ASSET MANAGEMENT Financing citizens, communities . and the European economy. Saving for the future is key at a time when demographics and

2. SGTech Europe 2019 - Substation - Nargis Hurzuk, Statnett · Statnett digitalisations plan One green field substation in Oslo area for 420KV GIS and one brownfield refurbishmentAIS

ND ASSET MANAGEMENT SUMMIT - Amazon Web …...You ll identify what to focus on to tell the most effective story following best practice. Asset management fundamentals Value - Linking

FINDING FUNDING SERIES - Amazon S3s3.amazonaws.com/uww.assets/site/out_of_school... · I: Funding Landscape for Asset-Building Initiatives Growing recognition that asset building

FCR-Design Project Summary report - Statnett

11126 Milford Unit Trust PIE Funds V8 - Amazon S3s3.amazonaws.com/milford-asset-docs/Milford Unit Trust PIE Funds... · Milford Asset Management Limited (Milford) is a specialist

REPORT - Statnett€¦ · emergency power control (epc) ... 9.2.5 tests to validate the online inertia estimation tool ... di dimensioning incident

J.P. Morgan Asset Management - Amazon Web Services · 2015-11-18 · J.P. Morgan Asset Management – Renda Fixa Dados Mercados Emergentes de 31/dez/2014 Dados de Gestão de Ativos

Presentation by Mr Auke Lont Statnett Norge at #NLC Olos 26.11.2014

Statnett Submarine Fiber Evaluation 2015/Nexia Fiber... · The Statnett Submarine Fiber Evaluation report ... HVDC-kabel vil ha. ... This report considers the technical and commercial

Q1 - Statnett · Q1-11 Q2-11* Q3-11 Q4-11 Q1-12 Q2-12 Q3-12 Q4-12 Q1-13 ... Statnett has been granted a final licence for the entire section and Statnett’s Board of Directors has

Quarterly report 2015 Q3...2015/11/12 · 1 Statnett - Third quarter report 2015 Quarterly report 2015 Q3 2 Statnett - Third quarter report 2015 Q3 Table of contents 04Director ’s

ANNUAL REPORT 2002 - nsd.no · ANNUAL REPORT 2002 STATNETT AIMS TO CREATE CONDITIONS CONDUCIVE TO AN EFFICIENT ENERGY MARKET. CONTENTS Page Key information, Statnett SF Role and responsibilities

Interim Report 2015 Q1 - Statnett · 2 000 1 800 1 600 1 400 1 200 1 000 800 600 400 200 0 Q2-12 Q3-12 Q4-12 Q1-13 Q2-13 Q3-13 Q4-13 Q1-14 Q2-14 Q3-14 Q4-14 Q1-15 Investments Statnett