View
2
Download
0
Category
Preview:
Citation preview
i
Future Asset Management Architecture SAMBA WP4 report
ii
iii
Executive summary
This report defines and describes the existing and future architecture for asset management in
Statnett. TOGAF methodology has been used for assessing, analyzing and documenting the
architecture. The architecture has been described using multiple layers and viewpoints of ArchiMate
3.0 modelling language including strategy and motivation, application layer and technology layer.
The report builds on the results and conclusions made in other Big Data and Analytics related projects
at Statnett. In particular, it builds on the results of the Finbeck and Fia projects as well as the AutoDig
2.0 projects.
The report describes as well the future Big Data and Analytics platform and defines a number of
capabilities that are required from such a platform. The asset management solution itself is expected
to be a hybrid solution based on Big Data and Analytics platform and combined with functionality
implemented in several existing internal systems as well as new components.
This report describes as well several areas that are not yet addressed in the platform being currently
introduced at Statnett and need to be further explored. The most important among these areas are
cloud integration, advanced PaaS and SaaS cloud services offering advanced AI services like natural
language comprehension, data exchange APIs and gateways with third parties as well as improving the
infrastructure for ingestion of sensor data.
5
Contents
Abbreviations 6
1 Introduction 7
1.1 Underlying idea of the SAMBA-project 7
2 Methodology 8
3 Big Data and Analytics technology 11
3.1 Main On-premise Big Data distributions 11
3.2 Other on premise solutions 12
3.3 Cloud solutions 14
3.4 Other solutions 16
4 Current solution – BASELINE ARCHITECTURE 19
5 Big Data lake – TRANSITION ARCHITECTURE 22
5.1 AutoDig 2.0 project 22
5.2 ArcGIS environment 27
6 Reference Architecture – TARGET ARCHITECTURE 30
6.1 Overall Reference Architecture 30
6.2 Strategy and motivation layers 34
6.3 Capabilities and information needs 47
6.4 Business architecture 50
6.5 Overall Strategy and Motivation layer 53
6.6 Application Architecture 55
6.7 Capability to application component mapping 60
6.8 Technology Architecture 62
6.9 Technology to application component mapping 68
6.10 Governance Principles 72
6.11 Principles for Big Data and Analytics platform 72
6.12 APIs for ingestion and integration 74
7 Concluding remarks 76
8 References 77
V1 EA Diagrams 80
8.1 Strategy and Motivation 80
6
Abbreviations
AWS Amazon Web Services – cloud platform from Amazon
ADM Architecture Development Method
APM Asset Performance Management
BPMN Business process model and notation
CEN European Committee for Standardization
CENELEC European Committee for Electrotechnical Standardization
CIM Common Information Model
COTS Commercial Off The Shelf
CPU Central Processing Unit (processor)
DL Deep Learning
DSO Distribution System Operator
ETSI European Telecommunications Standards Institute
EA Sparx Enterprise Architect
ETL Extract, Transform, Load
GCP Google Cloud Platform
GPU Graphics Processing Unit
HDF Hortonworks Dataflow
HDFS Hadoop File System
HDP Hadoop Data Platform
MapRFS MapR Filesystem
ML Machine Learning
NIST National Institute of Standards and Technology
PaaS Platform as a Service
SaaS Software as a Service
SGAM Smart Grids Architecture Model
TOGAF The Open Group Architecture Framework
TSO Transmission System Operator
7
1 Introduction
The main objective of the WP 4 is to design and develop a reference ICT architecture that utilizes a
common integration environment and the “common data models” developed in WP3. The architecture
must facilitate openness, security, safety in addition to big data analytics and business intelligence by
rule-based filtering techniques. Open interfaces, message bus and message queuing, standardized
models and protocols are important.
In particular, the WP 4 provides:
- An overall description of different stakeholders, drivers, outcomes and tactics to address the needs related to Big Data and asset management at Statnett
- Available data sources and their requirements for the data harvesting services and suggestions for improvements with regards to data ingestion
- Specification of critical capabilities of the future asset management system and how the different use cases identified in WP 2 and WP 3 map to these capabilities
- Comparison of the reference architecture with industry standards and international suppliers as well as different best practice implementations
This report provides an overview of results of the WP 4 and describes the ICT architecture that could
in the future support the needs for data collection and analysis primarily within the asset management
domain in Statnett. In particular, this report describes the assessment and analysis of the architecture
with the input from WP 1 report [1], the use cases from WP 2-WP 3 report [2] and conclusions from
the WP 6 report [3].
The report is organized as follows. Chapter two contains information about the methodology for
assessing, analyzing, developing and documenting the architecture. Chapter three gives the overview
of the technology landscape when it comes to Big Data and Analytics field.
Chapter four provides information about the baseline architecture. Chapter five describes the
transitional architecture, which is being implemented as a part of the AutoDig 2.0 project.
Chapter six describes the reference model as defined by the Finbeck project as well as the implications
for asset management and the SAMBA project. Chapter seven and eight contain concluding remarks
and references, respectively. Appendix includes as well a full size version of the diagrams of the
architecture models described in this report.
1.1 Underlying idea of the SAMBA-project
Asset management in Statnett can be improved by utilizing new developments in ICT, such as big data
technology, data fusion and business intelligence. The underlying idea of the project is to use these
generic ICT-developments together with existing domain research results (such as models on ageing
and lifetime of power system components) to establish a reference architecture for data collection,
communication and handling. This can optimize maintenance and reinvestments through facilitating a
more efficient analysis of incipient failures, ageing mechanisms and remaining lifetime of power
system components.
The amounts of sensor data related to asset management can be overwhelming. Big Data and Analytics
technology is an important prerequisite to be able to gather, process, distribute and visualise this data
as well as to provide an open access to integrate and reuse this data as well in other systems.
8
2 Methodology
TOGAF (The Open Group Architecture Framework) has been used primarily as the main framework
and process for creating, assessing and documenting the architecture in the SAMBA project. TOGAF is
a framework for enterprise architecture that provides an approach for designing, planning,
implementing, and governing an enterprise information technology architecture [4]. In particular, the
following phases of Architecture Development Method (ADM) have been used:
Preliminary A. Architecture Vision B. Business Architecture C. Information Systems Architectures D. Technology Architecture
Figure 1 TOGAF – Architecture Development Method
The remaining phases (E-H) are not relevant in the projects like SAMBA, which focus on research and
do not directly intend to implement the architecture.
For documentation purposes, ArchiMate 3.0 has been used. ArchiMate defines three main layers:
Business, Application and Technology [5]:
9
Business layer describes business processes, services, functions and events. It describes the products and services offered to the customers and users
Application layer describes application services and components Technology layer describes hardware, communication infrastructure and system software
These three layers provide a structured way of bridging the different perspectives from business to
technology and infrastructure. The full model of ArchiMate 3.0 also brings or enhance another three
very useful layers:
Strategy and Motivation layer – introduced in 2016 in ArchiMate 3.0 for modeling of the capabilities of an organization and help explaining impact of changes on the business (gives better connection between strategic and tactic planning)
Implementation and Migration layer – supports modeling related to project, portfolio or program management
Physical layer – for modeling physical assets like factories
Primarily, In SAMBA, there have been created Business layer, Application layer and Technology layer
models in addition to Strategy and Motivation layer.
Moreover, as explained in WP1 report [1], the complex research challenges in this project are specific
to the transition towards the Smart grid. The SAMBA-project uses the Smart Grids Architecture Model
(SGAM) see Figure 2, from CEN-CENELEC-ETSI Smart Grid Coordination Group to describe the projects
central R&D challenges and scientific methods [6].
The SGAM framework consists of five interoperability layers representing business objectives and
processes, functions, information exchange and models, communication protocols and components.
Interoperability is an important issue and research challenge in smart grids.
The component layer covers the physical infrastructure; electrical components, sensors, networks,
routers, computers and so on that form the basis for any form of communication and information
gathering. Gaining an overview of this layer will be a starting point for the project. In the
communication layer, different protocols are used to send and receive data between components.
However, just enabling better communication does not guarantee that useful information is
exchanged.
10
Figure 2 SGAM framework
The information layer describes the data models and information objects included in use cases in order
for the information to be interpreted correctly when testing use cases. A data model using open
standards (i.e. CIM) is an important prerequisite for SAMBA-project.
In the function layer in Figure 2 functions and services are represented as use cases independent of
the physical realization in systems and components. The level ensures that the right information enters
the right process and the right actor. This represent a large research challenge, as information must
enter the asset management of Statnett, a high level process in any company.
11
3 Big Data and Analytics technology
This chapter discusses and describes different Big Data technologies and architectures, both on-
premise and in cloud that have been assessed in the course of the project. This includes the technology
already selected and acquired as basis for the initial Big Data and Analysis solution in Statnett.
The overall Big Data technology landscape is extensive spanning the infrastructure, storage, analytics,
data source and API tools and applications. This has been summarized by following overview by Matt
Turck [7] (See Figure 3).
Figure 3 Big Data Landscape
3.1 Main On-premise Big Data distributions
Several Big Data technology suppliers developed their own software suites for Big Data containing
Hadoop and other components. These software suites are called Hadoop distributions. These
distributions package multiple tools / technologies into a technology stack ready for customers to use.
Suppliers often offer technical support as well as a comprehensive product with several
complementary tools that can be customized for specific tasks.
Hortonworks Data Platform
Hortonworks was established in 2011 and is the only distribution that uses pure Apache Hadoop
without any proprietary tools and components. Hortonworks Data Platform is also the only pure
12
Open Source project of all three distributions. Hortonworks is now also an integral part of the IBM
BigInsights [8].
In 2016 Hortonworks has created a separate line of product for processing streaming data.
Hortonworks Dataflow (HDF) is optimized to ingest, curate and handle data in flow and contains several
additional tools to facilitate that, e.g. NiFi, MiNifi and Schema Registry [9]
Cloudera
Cloudera was one of the first Hadoop distributions, established in 2008. Cloudera is based to large
extent on Open Source components, but not as much as Hortonworks. Cloudera is easier to get
installed and use than Hortonworks. The most important difference from Hortonworks is the
proprietary management stack [8].
MapR
MapR does not use the HDFS file system, but swaps it with a proprietary MapRFS. This is due to that
MapRFS gives better robustness and redundancy and largely simplified use. Most likely the on premise
distribution that offers the best performance, redundancy and user friendliness. MapR improves also
performance of other components, including Hbase (called MapR DB). MapR offers also extensive
documentation, courses and other materials [8].
3.2 Other on premise solutions
Oracle Cloudera
Oracle Cloudera is a joint solution from Oracle/Cloudera. Oracle based their Big Data platform on a
Cloudera distribution. This distribution offers some additional and useful tools and solutions that give
increased performance, in particular Oracle Big Data Appliance, Oracle Big Data Discovery, Oracle
NoSQL database and Oracle R Enterprise.
Oracle Big Data appliance is an integrated HW and SW Big Data solution running on a platform based
on Engineered Systems (like ExaData). Oracle adds Big Data Discovery visualization tools on top
of Cloudera/Hadoop while Oracle R Enterprise includes R – an open source, advanced statistical
analysis tool [8].
IBM BigInsights
IBM BigInsights for Apache Hadoop is a solution from IBM that also builds on top of Hadoop. BigInsights
offers in addition to Hadoop, some proprietary tool for analysis like BigSQL, BigSheets and BigInsights
Data Scientist that includes BigR.
IBM BigInsights for Hadoop also offers BigInsights Enterprise Management solution and IBM Spectrum
Scale-FPO file system as an alternative to HDFS [8].
SAP HANA and Vora
SAP HANA is an in-memory, column-oriented, relational database management system developed and
marketed by SAP SE. Its primary function as a database server is to store and retrieve data as requested
by the applications. In addition, it performs advanced analytics (predictive analytics, spatial data
processing, text analytics, text search, streaming analytics, graph data processing) and includes ETL
capabilities as well as an application server [10].
SAP HANA Vora is an in-memory computing engine designed to make big data from Hadoop more
accessible and usable for enterprises. SAP developed Vora out of SAP HANA as a way to address specific
business cases involving big data. Hadoop offers lower-cost storage for vast amounts of data, but
adoption initially lagged in the enterprise because the data in a data lake is unstructured and can be
13
hard to deal with. SAP HANA Vora builds structured data hierarchies for the Hadoop data and
integrates it with data from HANA to enable OLAP-style in-memory analysis on the combined data
through an Apache Spark structured query language (SQL) interface [11].
OSIsoft PI
OSIsoft PI is a suite of software products that are used for data collection, historicizing, finding,
analyzing, delivering, and visualizing. It is marketed as an enterprise infrastructure for management of
real-time data and events. The term PI System is often used to refer to the PI Server but the two are
not the same. The PI System refers to all OSIsoft software products whereas the PI Server is the core
product of the PI System [12].
The following table gives a quick overview of main on-premise Hadoop distributions and their features
[8] [13].
Table 1 Comparison of most important Hadoop distributions (based on: “Hadoop buyers guide”) [8] [13]
Category Feature Hortonworks Cloudera MapR
Data access
SQL Hive Impala MapR-DB
Hive
Impala
Drill
SparkSQL
NoSQL HBase
Accumulo
Phoenix
HBase HBase
Scripting Pig Pig Pig
Batch MapReduce Spark
Hive
MapReduce
Spark
Pig
MapReduce
Search Solr Solr Solr
Graph/ML
GraphX
MLib
Mahout
RDBMS
Kudu MySQL
File system access Limited, not
standard NFS
Limited, not
standard NFS
HDFS, read/write NFS
(Posix)
Authentication Kerberos Kerberos Kerberos and native
Streaming Storm Spark Storm
Spark
MapR-Streams
Ingestion Ingestion Sqoop
Flume
Kafka
Sqoop
Flume
Kafka
Sqoop
Flume
Operations Scheduling Oozie
Oozie
14
Category Feature Hortonworks Cloudera MapR
Data lifecycle Falcon
Atlas
Cloudera Navigator
Resource
management
YARN YARN
Coordination ZooKeeper
ZooKeeper
Sahara
Myriad
Security Security
Sentry
Record Service
Sentry
Record Service
Performance Data ingestion Batch Batch Batch and streaming
(write)
Metadata
Architecture
Centralized Centralized Distributed
Redundancy
HA Survives single fault Survives single fault Survives multiple faults
(self-healing)
MapReduce HA Restart of jobs Restart of jobs Continuous without
restart
Upgrades With planned
downtime
Rolling upgrades Rolling upgrades
Replication Data only Data only Data and metadata
Snapshots Consistent for
closed files
Consistent for
closed files
Consistent for all files
and tables
Disaster recovery None Scheduled file copy Data mirroring
Management
Tools Ambari
Cloudbreak
Cloudera Manager MapR Control System
Heat map, alarms Supported Supported Supported
ReST API Supported Supported Supported
Data and job
placement
None None Yes
3.3 Cloud solutions
IBM Cloud – Watson Data Platform
IBM provides a comprehensive solution for cloud based data platform. Watson Data Platform for data
ingestion, data storage and analytics [14].
Amazon EMR
Amazon EMR (Elastic Map Reduce) is a Hadoop distribution put together by Amazon and running in
Amazon cloud. Amazon EMR is easier to take into use than on premise Hadoop. Amazon is absolutely
15
the biggest cloud provider but when it comes to Big Data its solution is relatively new compared to
Google [8].
Microsoft Azure
Microsoft offers three different cloud solutions based on Azure: Hadoop based HDInsights, HDP for
Windows and Microsoft Analytics Platform System.
The following table gives a quick overview of main cloud based Hadoop distributions and their features
[8] [13].
Google Cloud Platform
Google offers also Big Data cloud services. The most popular service in GCP (Google Cloud Platform) is
known as BigQuery (which is a SQL like database), Cloud Dataflow (processing framework) and Cloud
Dataproc (Spark and Hadoop services). Google has been working on Big Data technologies for a long
time, which gives a good start point when it comes to advanced Big Data tools. GCP offers analysis
and visualization tools as well as an advanced platform to test the solutions (known as Cloud Datalab)
[8].
Table 2 Comparison of most important Big Data cloud solutions [8] [13]
Category Feature Amazon
Web
Services
Azure
(HDInsights)
IBM Cloud
Watson Data
Platform
Cloud Platform
Data access
File system
storage
Hadoop Cloud Object
Storage
Cloud Storage
NoSQL HBase HBase Cloudant Cloud Bigtable
SQL Hive
Hue
Presto
Hive DB2 on Cloud BigQuery
Cloud SQL
RDBMS Phoenix Compose Cloud SQL
Batch Pig
Spark
Map Reduce
Pig
Spark
Cloud Dataflow
Streaming Spark Storm
Spark
Streaming
Analytics
Google Cloud
Pub/Sub
Script
Pig
Search
Solr
Ingestion Ingestion Sqoop Streaming
Analytics
Cloud Dataflow
Visualization Visualization
Data Science
Experience
CloudData lab
16
Category Feature Amazon
Web
Services
Azure
(HDInsights)
IBM Cloud
Watson Data
Platform
Cloud Platform
Analytics Machine
Learning
Mahout R Server
Azure Machine
Learning
Streaming
Analytics
DSX
Analytics Engine
Google Cloud
Machine Learning
Speech API
Natural Language
API
Translate API
Vision API
Operations
Logging
Logging
Error reporting
Trace
Coordination ZooKeeper
Scheduling Oozie
Resource
Management
HCatalog
Tez
Cloud Console
Cloud Resource
Manager
Monitoring Ganglia
Monitoring
3.4 Other solutions
Predix
Predix is General Electric's software platform for the collection and analysis of data from industrial
machines. General Electric plans to support the growing industrial IoT with cloud servers and app store.
Predix as a cloud-based PaaS (Platform as a Service) is claimed to enable industrial-scale analytics for
asset performance management (APM) and operations optimization by providing a standard way to
connect machines, data, and people. Predix provides a microservices based delivery model with a
distributed architecture (cloud, and on premise) [15].
17
Figure 4 GE Predix platform
Insights Foundation for Energy
IBM® Insights Foundation for Energy is an energy analytics, data management and visualization
software solution for utility and energy companies. It provides a single energy analytics platform to
support various analytic applications. This includes situational awareness visualizing patterns,
predicting actions and connecting data points to derive insights, predictive maintenance using
historical data to determine asset repair or replacement, and asset health and risk analytics to measure
asset status and assess risk and consequences in near real-time. It is available through IBM software-
as-a-service (SaaS) subscription services or as an on premise solution [16].
IFE (Insights Foundation for Energy) creates operational insights based on energy analytics to optimize
business outcomes, provides a single energy analytics platform that can expand over time to meet
evolving analytics needs and unifies systems and business processes for more innovative, effective
business procedures [16].
18
Figure 5 IBM IFE
ABB Asset Health Center
ABB also offers an energy analytics, data management and visualization software solution based on
the Azure cloud platform and Cortana. ABB Asset Health Center uses predictive and prescriptive
analytics, as well as customized models incorporating industry expertise, to identify and prioritize
emerging maintenance needs based on probability of failure and asset criticality. ABB Asset Health
Center offers ingestion of asset and sensor data in the Azure BLOB Storage as well as Azure SQL
Database, Azure Machine Learning and Power BI visualization [17].
Cognite
Cognite is a Norwegian company specializing in customized Big Data, Analytics and IoT solutions mainly
for the Energy sector (offshore), in particular Aker BP and Kværner. It is based on several components
from the Google Cloud platform.
Kongsberg Digital
Kongsberg Digital has also build a similar platform for the Energy (offshore). The platform from
Kongsberg Digital is based on the Microsoft Azure cloud platform.
19
4 Current solution – BASELINE ARCHITECTURE
Current status for asset management has already been documented in SAMBA through reports from
WP1 [1], WP2 and WP3 [2]. Here is the summary of the most important findings.
Statnett’s ICT-support for asset management has been developed over time, in the form of different
information systems, and often based on a per need – approach. See Figure 6 for the asset
management ICT-landscape.
Figure 6 Asset management system landscape
Table 3 describes most important components of AS-IS architecture for asset management.
Table 3 Most important components in AS-IS architecture for asset management
Component Layer Comment
AutoDig Visualization
Data Store
Fault analysis tool which collects and presents data from various
sensors and systems in an efficient way
Innsikt / HIS web Visualization Visualization / analytics platform at Statnett as well
DDK-GUI Visualization Visualization of asset data from various sources in a tabular way.
Front-end to SYSBAS.
ArcGIS Visualization
Data Store
Map visualization tool at Statnett
IFS Visualization
Data Store
ERP system
Innsikt /
HIS
web
BiCycle
Analysis / visualisation
DDK-
GUIAnleggs
guiden
TPV-
T/P
FOS
webTKP
PDC
(PMU)DFR
Spider/
EMSRelays Lightning
Power
quality
meters
OIS
Innsikt/
DWH
SYS
BAS
Arc
GISIFS
FOS
common
Auto
DIGFASIT
Data store / hub
Data sources and sensors
20
Component Layer Comment
TPV-T/TPV-T Visualization Total planning tool with visualization of all stations and
switchgear
TKP Visualization Project module for overview of activities
Bicycle Visualization A maintenance DWH solution serving as visualization and
planning tool for asset management
FASIT Visualization
Data Store
System for handling the fault reports from Statnett and
Norwegian DSOs
SYSBAS Data Store Data hub for asset data from various sources
FOS common Data Store Landing area for asset data provided by the Norwegian DSOs
Innsikt DWH Data Store Storage part of Innsikt DWH
The most important systems for asset management are IFS, SYSBAS and FOS. In addition, there are a
couple of fault analysis systems, which are also important for asset management. Those include
AutoDig and FASIT. Innsikt as a common analysis platform naturally plays an important role for asset
management. His Web stores historical data.
Among all the systems, it is important to mention TPV, TKP and BiCycle. TPV-T ("Total Planning Tool")
database is practically a "mirror" of IFS, showing data for all Statnett stations, with switchgear in all
voltage levels and all components/equipment with technical data and age. Equivalent for overhead
lines and cables. The tool generates proposal for "equipment replacement measures" based on age of
the different type of components. TPV-P (project module) gives an overview of activities and is used
to group activities together, manually. BiCycle on the other hand is a specialized analytics solution for
RCM (Reliability Centered Maintenance).
The Statnetts ERP system – IFS is the kernel of the asset database and asset management functionality.
However, most of the analyses are performed in a series of additional tools which combined solve
most current user needs. However, the analyses are fragmented and mostly have different logic for
data collecting and storage. The current architecture is not a good basis for growth. The largest data
storage is a traditional data warehouse with a BI-tool on top.
Today Statnett has still not realized the possibilities that big-data-concepts can provide. The main
reasons for this is:
Data is not easily accessible for access, integration and sharing, often locked in proprietary
systems
There is no common data store / data hub which makes it possible to access and assemble
data from various sources
There is no uniform way of collecting the data from the sensors as well as the distribution
systems for collecting these data are often unreliable, not monitored, not properly maintained.
Data is often of poor quality, delayed or missing
Organizational silos which make it difficult and time consuming to integrate the systems
Use of obsolete integration paradigms (i.e. SOA – Service Oriented Architecture) which
mandate exchange data of and restrict sharing of data
21
Current analytics platform (i.e. Innsikt) do not provide sufficient capacity and performance to
implement the asset management use cases efficiently
22
5 Big Data lake – TRANSITION ARCHITECTURE
This chapter describes the BigData and Analytics platforms at Statnett: the platform to be developed
within the AutoDig 2.0 project and the ArcGis environment.
The AutoDig 2.0 project is an important first step on the road to implement the future Big Data and
Analytics platform at Statnett. The platform will be further developed in another project, the Finbeck
project, which defines the long term roadmap and high level reference architecture of the Big Data,
Analytics, IoT and adjacent areas. The Finbeck project will extend the architecture and the platform
with additional software components and features as suggested in the long-term strategy for Big Data
and Analytics.
5.1 AutoDig 2.0 project
Statnett is in the process of implementing a platform for BigData and Analytics as a part of the AutoDig
2.0 project. In this project we test a solution that will be crucial for the future development of the asset
management platform to support the needs described in the SAMBA use cases and not only.
AutoDig is a system for acquisition, sorting, presentation and analysis of information regarding power
system disturbances [18]. The software in use today is a prototype developed within an R&D project.
There prototype has been successfully taken into use but there is a need to develop an improved, more
stable and efficient tool to help perform this analysis work. Statnett has initiated a project, which will
deliver a new and improved operative solution in close integration with Statnett's ICT infrastructure.
The AutoDig 2.0 system will gather, store and analyze large amounts of data collected from multiple
sources and sensors in the network (See Table 4 and Figure 7).
Table 4 AutoDig 2.0 data sources
Data source Description
PMU data Multiple time series (1 kHz sampling)
DFR (Digital Fault Recorder) Time series
Power Quality Measurements / Elspec
Several time series containing aggregated parameters (50 Hz sampling) and raw data time series (50 kHz sampling)
Power Quality Measurements / Metrum Several time series containing aggregated parameters
Distance Relay Protection Comtrade
Operation and Maintenance Database Events, breaker positions, network configuration and operational measurements (P, Q, I, U, f)
Network Repository Power grid model
Time variable data /met.no Weather and lightning data
ERP Asset data
Operation Management Support system Operation and fault reports
The AutoDig 2.0 solution that Statnett is implementing is based on the use of a Big Data lake / Data
Lake architecture (see Figure 7) which includes the following elements:
23
Data collected from multiple sensors and data sources, after initial processing (ELT1 / ingestion). This data is stored in a BigData lake
All relevant data is stored in the BigData Lake in a structured format, both in a CIM (common information model) format or as time series
Data stored in the BigData lake must be available for reuse in new / future applications and solutions at Statnett
The solution consists of analysis and visualization components, as detailed in For the analyses
performed with the help of AutoDig 2.0 it is crucial that the time elapse from the moment the data
should be available till ingestion and the result are presented is kept as low as possible (preferably
below one minute).
Table 5.
Figure 7 AutoDig 2.0 incl. Big Data lake
1 ELT – Extract Load Transform
24
In AutoDig 2.0, collected data will be retained for a long time and be available for use in future analyses.
Statnett aims to be able to retain the raw data up to 10 years and processed/aggregated data in at
least 60 years or the lifetime of the assets.
For the analyses performed with the help of AutoDig 2.0 it is crucial that the time elapse from the
moment the data should be available till ingestion and the result are presented is kept as low as
possible (preferably below one minute).
Table 5 AutoDig 2.0 application components
Component Description
AutoDig Dashboard Tailored web app providing a consolidated and configurable work surface and integrating data visualization components, analysis components (Advanced Analytics and Self-service Analytics) as well as visualization of data stored in the Big Data lake. AutoDig Dashboard will also be used to configure and select a set of triggers/criteria for performing analysis as well as perform analyses when these criteria are satisfied
Advanced Analytics COTS2 component for visualization and self-service analytics
AutoDig Analysis Engine
Component that will analyze collected data. Currently implemented in MatLab along with a number of MatLab algorithms
AutoDig AI/Rule Engine
component that detects patterns in data (both model based analyses as well as machine learning and pattern recognition)
Figure 8 presents high-level design of the Big Data lake platform and its components.
The Big Data lake will allow APIs to access the data and data ingestion. The storage and processing
infrastructure will primarily support structured storage of the time series (used to store sensor data)
and measurements as well structured storage of files used for storing raw data. The Big Data lake will
support real time processing, batch processing and analytics functions.
The initial Big Data and Analytics platform currently being introduced at Statnett consists of the
following main software components:
IBM BigInsights and Hortonworks the acquired platform consists of IBM BigInsights component, however as IBM is in process of restructuring it, in practice the platform will consist of Hortonworks 2.6 as the main component
IBM BigSQL IBM Streams Tableau Server and Desktop IBM SPSS Modeler IBM BigR
These software components are described in the subchapters below and also presented on Figure 25.
2 COTS – Commercial Off The Shelf
25
Figure 8 High-level design of the Big Data Lake platform
deployment BDL drawing - eng
Big Data Lake
Storage and processing infrastructure
Information consumers
Information sources
FOSWebFault Management
Batch processing
CIM objects
Market and Settlement
systems
Market and Settlement
systems
GIS
Operation
Management Support
Operation
Management Support
ERPAutoDig
AnalyticsRealtime processing
Structured time seriesStructured file storage
Data Science Tools
VideoPower quality
Asset dataOscillation registrationDistance Relay
Protection
Distance Relay
Protection
API
PMU
Digital Fault RecorderSCADA
LightningMet.no
API
26
Hortonworks Data Platform
Hortonworks Data Platform (HDP) is a scalable open source Hadoop distribution and platform for
storing, processing and analyzing large amounts of data [19]. See also Chapter 3.1 for more details
about on premise Hadoop distributions.
Figure 9 Hortonworks platform [19]
IBM BigSQL
IBM provided BigSQL is a SQL layer on top of Hadoop/HDFS, which makes it possible to create tables
and query data using the SQL syntax. The SQL query engine supports joins, unions, grouping, common
table expressions, windowing functions, and other familiar SQL expressions.
Depending on the nature of the query, the data volumes, and other factors, Big SQL can use Hadoop's
MapReduce framework to process various query tasks in parallel or execute query locally within the
Big SQL server on a single node. [20]
27
Figure 10 IBM BigSQL
IBM Streams
IBMs provided advanced computing platform that allows user-developed applications to ingest,
analyze, and correlate information as it arrives from real-time sources. The solution can handle very
high data throughput rates, up to millions of events or messages per second. [21]
Tableau Server and Desktop
Tableau is an advanced and highly performant visualization tool. It is an industry leading BI tool that
focuses on data visualization, dash boarding and data discovery [22].
IBM SPSS Modeler
IBM provided SPSS3 is a statistical tool from IBM used for non-batch and batch statistical analysis [23].
IBM SPSS Modeler is a part of the SPSS suite, which provides a set of data mining tools to develop
predictive models using business expertise and deploy them into operations to improve decision-
making. IBM SPSS Modeler supports a variety of modeling methods taken from machine learning,
artificial intelligence, and statistics. [24]
5.2 ArcGIS environment
The ArcGIS environment at Statnett is also a BigData&Analytics platform with the following main
components:
- The GeoAnalytics Server
- The GeoEvent Server
- Image Server
3 SPSS was originally named Statistical Package for Social Sciences
28
- Insights for ArcGIS
GeoAnalytics Server and GeoEvent Server are a powerful combination.
Statnett uses GeoEvent as a development and production environment to streamline and analyze
lightning data and ship data in real time. GeoEvent has, among other things, great potential for use
with real-time sensor data.
GeoAnalytics, can be used in combination with scripting in Python and can use the archive data from
GeoEvent, which Statnett stores in spatiotemporal big data bars, as well as from various forms of
shares (Hadoop, AWS / Azure Cloud, etc.).
Figure 11 ArcGIS Big Data and Analytics landscape
ArcGIS GeoAnalytics Server
ArcGIS GeoAnalytics Server is designed to handle the analysis of massive datasets. GeoAnalytics tools
are a subset of Esri geoprocessing tools that use distributed and parallelized computing to run space-
time analyses on extremely large datasets. These tools can be executed using the Portal for ArcGIS
map viewer, ArcGIS Pro, the ArcGIS Server REST API, or from the new ArcGIS API for Python. ArcGIS
GeoAnalytics Server can connect to data from the Hadoop Distributed File System (HDFS), Hive, local
file shares, and data from within ArcGIS Enterprise, including using the archived spatiotemporal output
from ArcGIS GeoEvent Server as input. Because ArcGIS GeoAnalytics Server uses the base ArcGIS
Enterprise deployment to write and store analytical output, it is easy to use and share the resultant
layers and data [25] [26].
ArcGIS GeoEvent Server
ArcGIS GeoEvent Server is designed to handle high-volume, high-velocity real-time and streaming data.
It provides solutions through on-the-fly analysis and dynamic aggregation of large datasets, which
makes data visualization simple. When connected to the base ArcGIS Enterprise deployment, ArcGIS
GeoEvent Server can archive data to the spatiotemporal data store for further data analyses. [27] [28]
ArcGIS Image Server
ArcGIS Image Server provides serving, processing, analysis, and extracting value from massive
collections of imagery, rasters, and remotely sensed data. [29] [30]
29
Insights for ArcGIS
Insights for ArcGIS is a web-based, data analytics workbench where you can explore spatial and non-
spatial data
Insights for ArcGIS is somewhat similar to Tableau, and can be used for example against real-time data
stored in our internal Spatiotemporal Big Data Store via GeoEvent Server. The features in GeoAnalytics
server can also be used from Insights. [31] [32]
30
6 Reference Architecture – TARGET ARCHITECTURE
6.1 Overall Reference Architecture
The architecture for smarter asset management is aligned with the overall conceptual model for the
reference architecture at Statnett developed in the Finbeck project.
The Finbeck project has assessed several reference models defined by international institutions and
third parties. The most relevant reference architecture to be adopted by Statnett is the one defined
by the National Institute of Standards and Technology (NIST) in 2015. NIST reference model is a
supplier-neutral, technology and infrastructure-independent conceptual model for Big Data
architecture.
Figure 12 NIST Reference Model
The most important elements of a reference model as defined by NIST are:
System Orchestrator - ensures system requirements. This applies to business, architecture,
management, policy and resource requirements. In addition, the system orchestrator must
also monitor the system's compliance with the requirements. The system orchestrator role is
typically taken care of by one or more actors; which can be both human and machinery
(software), possibly a combination of the two
Data Provider - different data providers, which provide system data. An important
characteristic of a Big Data system is the ability to import and use data from a variety of
different sources in different formats. Examples of sources: internal and public documents,
31
images, audio files, video, sensor data and logs. Asset management and asset health
management systems are examples of systems that can be source of data as well
Big Data Application Provider - ensures execution of the data life cycle in accordance with the
security requirements and requirements set by the system orchestrator. The life cycle of the
data consists of five main activities that are relatively similar to those found in traditional
data processing systems. The difference now is that data characteristics in Big Data systems
(volume, speed and variation, etc.) require a radical change in the data processing
mechanisms. These must be customized and optimized to, for example, be able to reach
response time requirements in a world of ever-increasing data volumes. The five main
activities in the Big Data Application Provider are Collection, Preparation/curation, Analytics,
Visualization and Access
Big Data Framework Provider - most of the progress made in recent years has been on
frameworks that scale performance even though the data sets being processed have Big Data
characteristics (volume, velocity, variation, etc.)
Data Consumer - is the end user, which can be either a person or another system that
consumes data. Data from the analysis and visualization activities are accessed through the
service interface offered by the Big Data Application Provider. The communication can either
be pull-based where the Big Data Application Provider responds to Data Consumer requests
or be power / push based where Data Consumer listens for automated output from the Big
Data Application Provider. All decision levels within asset management are example of
systems that can be consumers of data.
Another important framework that has been used as a basis for the architecture of the Big Data Lake
in AutoDig project is the IBM Reference Model for Big Data and Analytics presented on Figure 13.
Figure 13 IBM Reference Model for Big Data and Analytics4
4 The IBM Reference Model has been created and provided to Statnett as a part of the AutoDig 2.0 project
Data Sources Analytical Data
Lake Storage
Security
Platform
Information Management & Governance
Actionable
Insight
Analytics In-Motion
Enhanced
Applications
Discovery & Exploration
Analytics Operating System
Ingestion &
Integration
Data
Access
New sources
Traditionalsources
Data acquisition & application
access
32
IBM has divided their model into 12 different areas with an increased focus on the Analytics (Analytics
in-motion and Analytical Data Lake Storage) and the consumer side (Discovery and Exploration,
Actionable Insights and Enhanced Applications).
In a course of multiple workshop and discussions in the Finbeck project and with input from the NIST
and IBM models, Statnett has defined its own reference architecture (Figure 14). Statnett’s Overall
Reference model has been divided into four main areas: data provider, big data and analytics platform,
data consumer as well as security and governance.
The Big Data and Analytics platform consists of several high level components including ingestion,
distribution, analysis, storage and access (Figure 14). The high-level architecture defined by Finbeck is
matching the NIST reference architecture except for the visualization component. Visualization
components can exist both inside and outside the reference architecture. In Statnett visualization has
been defined outside the platform. In practice there will be a few technical software components also
implemented as a part of the Big Data and Analytics platform5.
Figure 14 Overall Statnett reference architecture model for Big Data & Analytics
The descriptions of components in the high-level reference architecture and their relation to asset
management are explained in the following table.
Table 6 Descriptions of components in the High Level Reference model as defined by Finbeck project
Component Description
Data provider Considered as a component outside of the reference architecture. Detailed architecture will
still contain a description of which data sources the data platform will handle at all times.
Data sources could be systems and sensors. Asset management is an example of system that
can be data provider as well
5 I.e. Tableau or Cognos
33
Component Description
Ingestion Data can be retrieved from several different data sources, which must be collected and
integrated with the data platform for further handling. The components that will handle this
will be described under Ingestion component.
Distribution Data needs to be distributed from source to consumer using one or more distribution
mechanisms. For Statnett, distribution of data consists mainly of data processing, in addition
to data handling historically using various storage technologies
Analysis The data needs to be processed in different ways, i.e. in real time (such as data streams) and
batch wise. Parts of the data processing will also handle data storage. The analysis will also
say something about the platform's ability to ensure that data supports advanced analysis
such as machine learning, deep learning, etc.
Batch Processing data batch wise, i.e., a periodization in handling the data. This means that data is
collected over time before it is distributed in the system. Data that does not need to be
visualized or analyzed in real time will normally be handled batch wise
Real time Statnett has large amounts of data handled in real time. In order for these data to be
distributed to more consumers, the platform must be able to handle flow data to meet new
needs and analyzes. Data ingested from the data sources should be able to flow as fast as
they occur in the sensors, source systems or external parties
Storage The data platform must contain several different storage components to ensure access to
historical information, traceability and access to real time information. The data platform
must handle storage such as relational databases, distributed storage, graph databases and
time series. Some storage will also be handled in processing (intermediate storage of data)
Access Data must be made available to different consumers and the architecture must support
several different ways of making available the data, consisting of API / HMI and search
API/HMI APIs (Application Programming Interface) and Human Machine Interface (HMI) are
components that will make data on the platform available to persons / systems on the
outside of the data platform. This also includes APIs that ensure the exchange of data to
external actors
Search The data platform will provide a fast, secure and easy access to the data you need. This will
require a form of search function, or Data Catalog, containing metadata about what is stored
within the architecture
Security and
governance
Security and governance provides a description of mechanisms for access control,
monitoring and safe handling of data stored in the solution including the data exchange with
external actors.
Visualization The architecture must support visualization of data and / or analyzes. Applications for
visualization can be seen as consumers for the data contained in the data platform. These
are key applications for realizing the business needs of Statnett, and one of the key
consumers. Certain technical components of the visualization will still need to be provided
as a part of the platform.
Data consumer Consumers are stakeholders of the architecture, and are described as the people or systems
that will need access to data stored in the data platform. Asset management and asset health
management systems are example of systems that can be consumers of data
34
6.2 Strategy and motivation layers
As explained in chapter 2 ArchiMate 3.0 defines different layers to document the architecture. This
chapter will focus on Strategy and motivation layers and sums up requirements and expectations that
a Big Data platform for SAMBA has to meet including:
explicit requirements from the projects: implicit requirements gathered from various sources incl i.e. eSmart report [33] and other
reports.
The following subchapters explain the link between drivers, goals, tactics and capabilities that a future
SAMBA platform will support.
Strategy layer
The big data and analytics reference model analyzed and described in the Finbeck project is based on
TOGAF methodology and described using ArchiMate. The Finbeck reference model has been based on
the outcome of the analysis of the strategic aspects of the architecture using the ArchiMate 3.0
Strategy layer. The strategy layer explains the impact of technology changes on the business. In our
case the strategy layer explains also how the capabilities of Big Data and Analytics platform relate to
the overall strategic drivers, goals and outcomes and how they support the expectations from the
stakeholders. The strategy layer has been created based on interviews with several stakeholders in the
organization and with the input from earlier phases in SAMBA. Figure 15 presents one of the early
SAMBA models from WP1 that shows different elements including roles and stakeholders that are
important for asset management.
Figure 15 Elements in asset management – SAMBA model for asset management introduced in WP1
35
Fia6 project has also provided important input in the assessment process, which has been used to
identify and align different roles and stakeholders in the Strategy layer. Figure 16 presents main
segments and information categories in Statnett as defined by Fia. It is apparent that asset
management has been identified a one the most central segments by Fia project.
Figure 16 Segments and information categories identified in the Fia project
During the process of assessment and analysis of the strategy layer, there have been identified eight
main stakeholders/roles in Statnett for which the Big Data and Analytics is of relevance:
Grid Owner - this is one of the three main responsibilities that Statnett has been chartered from the authorities and in which Statnett acts as the owner of the Norwegian transmission grid and the cable connections to abroad. Grid owner role is also the one where asset management plays a central part.
Grid Development - this is another of the three main responsibilities of Statnett. Grid development is about planning the future grid to meet the future needs not only for Statnett but also for the complete Norwegian power system.
System Operation - is the last of three main responsibilities of Statnett. System operation is about operating the transmission system, ensuring balance in the system as well as ensuring fair and equal treatment of all the market actors.
NVE7 – Norwegian regulator
6 Fia project focuses on Information Architecture at Statnett 7 NVE stands for The Norwegian Water Resources and Energy Directorate
Pow er grid models
Market and Settlement
Longterm planning
Asset management
Operations short term planning
Operations Actors
Observationsand
measurments
36
CFO, CIO, CEO and CISO8 - internal Statnett stakeholders External stakeholders, i.e. other DSOs, TSOs, research institutions, universities, consultants
and so on, performing analysis on Statnett data or sending data to Statnett.
On the lower level, there is a number of stakeholders and roles, which support the main roles.
However, according to the evaluation performed in Finbeck and SAMBA projects, currently only a
limited set of the roles and stakeholders actually relate and are affected by adoption of Big Data and
Analytics technology. The most affected roles are fault analysis, asset management, system operation
and long term planning. There have not been identified any direct relations for e.g. CFO nor short term
planning. Neither, market nor settlement were identified as any major users of the Big Data and
Analytics technology at the time this report was written.
The strategy model presented here covers all areas, which require use of the Big Data and Analytics.
Here, for the purpose of SAMBA project mainly focuses on the asset management however due to the
WP 6 [3] focus on risk monitoring the other roles, in particular fault analysis and system operation, are
also of interest.
Figure 17 presents the strategy layer of the Big Data & Analytics reference architecture. A full size
Enterprise Architect (EA) diagram is also attached in Appendix V1.
8 CFO - Chief Financial Officer, CIO – Chief Information Officer, CEO – Chief Executive Officer, CISO – Chief Information Security Officer
37
Figure 17 Reference Model - Strategy Layer
motivation Archimate3 Strategy Layer
Fault Analysis Engineer
CEO
Grid OwnerSystem Operation
Incident costs
Strategic Use of Modern
TechnologyNetwork balanceGrid costsAnalysis costs
Increase efficiency and safety
Optimize maintenance costs
More efficient problem
management
Reduce number and
consequence of incidents
Quicker fault analysis Improved real time
monitoring
Automated and
autonomous inspection
Improved more dynamic
visualisationImproved inspections -
remote or virtual
More efficient / quicker
fault management
Optimize investment costs
Optimized condition based
maintenance Asset health based
reinvestments
More frequent and
accurate inspection
Increase Automation of
Data Quality controll
Better Data Quality
NVE CISO
Adequate security of
sensitive and important
data
Power Supply Reliability
Secure sensitive data
Improved configurable
personalized visualisation
Sufficient security of
power sensitive and
personal data
Reduce Repporting Costs
More accurate fault
analysisIncreased precision of
imbalance predictions
Optimize short term
imbalance
ConstructionAsset Management
CFOGrid Development
Long Term Planning
CIO
System Operation costs
Improved Insight and
Business Understanding
HMS
Increased Capacity
Reduce bottlenecks
Add new customers
Improved socio-economic cost
benefit analysis
Administration costs
Predictive maintenanceMore efficient access to
information
Fault Analysis CoordinatorMarket and Settlement Short Term Planning
38
The following table summarizes different drivers identified during the assessment phase and explain
their relation to the stakeholders:
Table 7 Overview strategy layer – stakeholders and drivers
Stakeholder Driver Meaning / Rationale
Grid Owner / Fault Analysis
Engineer
Grid costs / Analysis
costs
The overall costs related to fault analysis / problem
management
Grid Owner / Fault Analysis
Engineer
Grid costs / Incident
costs
The overall costs related to actual incidents
Grid Owner / Asset
management
Grid Owner / Construction
Grid Development /
Planning
Grid costs The overall costs of the grid related to asset
management, Construction and Planning. Includes also
analysis costs and incident costs.
Grid Development /
Planning
System Operation
System Operation / Fault
Analysis Coordination
NVE
Power Supply
Reliability
The reliability of the power supply as mandated by NVE
and OED.
System Operation
System Operation / Market
and Settlement
NVE
Network Balance Keeping the network in balance as a system.
Grid Development / Long
Term Planning
Increased Capacity Increasing the capacity to meet future demand
CEO Strategic Use of
Modern Technology
Use of technology, which will result in increase of
efficiency and safety in the future comprising increased
level of automation, Machine Learning and real-time
processing.
NVE Administration costs Optimizing costs of the administration (i.e. reporting)
CISO Secure sensitive data Securing data
Finally as a part of our Strategy layer model, there have been identified a number of goals and
outcomes related to the Big Data and Analytics at Statnett. The goals and outcomes and their relation
to drivers are summarized in the following table:
39
Table 8 Overview strategy layer – drivers, goals and outcomes
Driver Goal Outcome Meaning
Analysis costs More efficient
problem
management
Improved configurable
personalized visualization
Improvements in visualization
support, which will be more
configurable and can be adapted to
individual needs
Quicker fault analysis More efficient and quicker fault
analysis
Incident costs
Power Supply
Reliability
Network Balance
Reduce number
and consequence
of the incidents
Improved real time
monitoring
Reduced latency, increased data
quality and better reliability are most
important examples of
improvements in real time
monitoring.
More accurate fault analysis More accurate findings in fault
analysis
Grid costs
Power Supply
Reliability
Network Balance
More efficient /
quicker fault
analysis
Improved more dynamic
visualization
Visualization, which dynamically
shifts the focus to issues/faults in the
grid
Improved real time
monitoring
See above
Automated and
autonomous inspection
Predefined inspections, which are
initiated by operator but performed
by drones and robots as well as
inspections, which are initiated and
performed autonomously
Improved inspection
remote or virtual
Inspections, that are performed by
the operator remotely/virtually
Network Balance Optimize short
term imbalance
Improved more dynamic
visualization
See above
Improved real time
monitoring
See above
Increased precision of
imbalance predictions
Improvement of precision of
imbalance prediction down to 5
minutes
Grid costs Optimize
maintenance
costs
Automated and
autonomous inspection
See above
Predictive maintenance Predictive ML algorithms designed to
help determine the condition of in-
service equipment in order to predict
when maintenance should be
performed
40
Driver Goal Outcome Meaning
Optimized condition based
maintenance
More optimal condition based
maintenance based on analysis of
high volumes of asset data, sensor
data, both batch and real time data
More frequent and accurate
inspection
Automated inspections that can be
performed more frequently to
support more traditional condition
based maintenance and reliability
based maintenance
Improved inspection virtual
remote or virtual
See above
Improved configurable
personalized visualization
See above
Grid costs Optimize
investments costs
Asset health based
reinvestments
Investments and reinvestments
based on the actual asset health (i.e.
asset health index derived from the
sensor and asset management data).
Improved socio-economic
benefit analysis
Quicker and more accurate socio-
economic analysis due to more
performant tools and platforms
Increased capacity Reduce
bottlenecks
Improved socio-economic
benefit analysis
Improved socio-economic benefit
analysis is the most important
outcome.
Increased capacity Add new
customers
Improved socio-economic
benefit analysis
See above
Strategic use of
modern
technology
Increase
efficiency and
safety
All dependent goals The goal of increased efficiency and
safety relates to several dependent
goals (in practice all of them) and
supports the driver of strategic use
of modern technology
Grid costs
Network balance
Increase
automation of
data quality
control
Better data quality Improve data quality in all involved
systems and sensors. This includes
the improvements in the
infrastructure for collecting and
transporting the sensor data as well
as improvements in the data
consistency.
Administration
costs
Reduce reporting
costs
More accurate fault analysis See above
41
Motivation layer
The motivation layer (Figure 18) links the strategy with the actual capabilities in the Big Data and
Analytics platform. This gives a more complete connection between strategic and tactic planning levels
as well as provides explanation of why different capabilities are necessary and how they support and
affect the strategy. A full size EA diagram is also attached in Appendix V1.
42
Figure 18 Reference Model – motivation layer
motivation Archimate3 Motivation Layer
Aligning and harmonizing facts
from various sources
High volume data storage
Handling of real time information and
streaming analysis
Video and picture analysis
Audio analysis support
Machine Learning support Deep Learning support
Data Science Tools
User friendly visualization
Data Catalogue
Triple Store and Graph storage
Quicker fault analysis
Reduce latency of data
acquisition
Low latency IoT data transport
Improved real time
monitoring
Automated and
autonomous inspection
Improved more dynamic
visualisation
High CPU and GPU power
Improved inspections -
remote or virtual
Introduce condition based
visualization
Introduce digital twin
concept
Increase storage capacityIntroduce rule based
analysis
Introduce fault detelction
rules
Introduce Smart Event
Processing
Processing of batch data
Introduce augmented/virtual reality
Rule Engine support
Introduce drones and
robotics
Drone Fleet Management & Data
Capture
Optimized condition based
maintenance Asset health based
reinvestments
Event Notification, Filtering and
Distribution
Open Access to Data and Data
Sharing
introduce common data
nav/lake
More frequent and
accurate inspection
Data Quality and Consistency Check
Better Data Quality
Improve data quality
Adequate security of
sensitive and important
data
Redundancy and disaster
recovery
implement measures to
secure the data
Fine-grained access
control and perimeter
security/AAA
Allow new high frequency
sensors
High throughput IoT data
transport
Improved configurable
personalized visualisation
More accurate fault
analysis
Introduce fault
classification rules
Introduce configurable
and personalised
visualisation
Increased precision of
imbalance predictions
Introduce predictive
analytics
Configurable and
personalized visualization
Actor Framework
Improved Insight and
Business Understanding
Introduce Data Science
Improved socio-economic cost
benefit analysis
Introduce probabilistic
reliability assesment
Predictive maintenance
Natural language
understanding
Chatbot conversation
support
Document Storage
introduce interactive
information access
More efficient access to
information
Classic Rule Engine
Support
Sensor Time Synchronization support
Map visualization
43
The following table describes how the different tactics (course of action) support the outcomes from
the strategy layer and how they are realized by different capabilities of the Big Data and Analytics
platform. Since the assessment cover the whole Finbeck project and all initiatives at Statnett, the last
column on the right explains how the different capabilities relate to the SAMBA project. This
assessment has been performed based on the analysis of the strategy and motivation layer models.
Table 9 Overview motivation layer – outcome, course of action and capability
Outcome Course of
action
Capability Meaning Relation to
SAMBA use
cases and
expectations
Improved
configurable and
personalized
visualization
Introduce
configurable
and
personalized
visualization
Configurable
and
personalized
visualization
Self-service, personalized
visualization, which is highly
configurable, is crucial to achieve
sufficient flexibility to be able to
explore the data without delay
and without involving the IT
department.
Yes, for
analysis
purposes.
Quicker fault
analysis
Improved real time
monitoring
Reduce latency
of data
acquisition
Low latency IoT
data transport
Handling of real
time
information and
streaming
analysis
Low latency in the data transport
capability as well as support for
handling of real time information
and streaming analysis capability
are important capabilities to
reduce the latency of data
acquisition and to achieve
quicker fault analysis and
improved real time monitoring.
Yes, as an
important
basic
prerequisite
Improved more
dynamic
visualization
Introduce
condition based
visualization
User friendly
visualization
Handling of real
time
information and
streaming
analysis
Rule Engine
support (Classic,
ML, DL, High
CPU/GPU)
Introduce condition based
visualization, that automatically
shifts focus to issues/faults in the
grid. Condition based
visualization requires handling of
real time information and
streaming analysis, user friendly
visualization and rule engine
support capabilities.
WP6
Improved real time
monitoring
Improved more
dynamic
visualization
Predictive
maintenance
Introduce smart
event
processing
Handling of real
time
information and
streaming
analysis
Event
notification,
Introduce smart event
processing, in particular
streaming analytics that can
predict the deviations and faults
before they occur.
Smart event processing and
streaming analytics requires
handling of real time information,
streaming analysis, event
WP2
44
Outcome Course of
action
Capability Meaning Relation to
SAMBA use
cases and
expectations
Optimized
condition based
maintenance
Asset health based
reinvestments
filtering and
distribution
Processing of
batch data
notification, filtering and
distribution as well processing of
batch data capabilities.
More accurate fault
analysis
Introduce fault
classification
rules
Machine
Learning
support
Introduce the fault classification
rules, which require machine
learning support capability.
Weak
More accurate fault
analysis
Optimized
condition based
maintenance
Introduce fault
detection rules
Machine
Learning
support
Introduce fault detection rules,
which require machine learning
support capability.
WP 2
Automated and
autonomous
inspection
Predictive
maintenance
Optimized
condition based
maintenance
Introduce rule
based analysis
Rule Engine
support (Classic,
ML9, DL10, High
CPU11/GPU12)
Introduce classic rule based
analysis support, which require
machine learning support
capability.
WP 2 and
WP 6
More accurate fault
analysis
Allow new high
frequency
sensors
High throughput
IoT transport
Sensor time
synchronization
support
High volume
data storage
Allow new high frequency
sensors, which require high
throughput IoT transport, sensor
time synchronization support and
high volume data storage.
Notice that also existing
infrastructure and sensors
require improvements with
respect to these capabilities.
Related
More accurate fault
analysis
Improved insights
and business
understanding
Increase storage
capacity
High volume
data storage
Increase storage capacity, which
requires high volume data
storage platform capability
WP 2
9 ML – Machine Learning 10 DL – Deep Learning 11 CPU – Central Processing Unit 12 GPU – Graphics Processing Unit
45
Outcome Course of
action
Capability Meaning Relation to
SAMBA use
cases and
expectations
Improved
inspections -
remote and virtual
Predictive
maintenance
More frequent and
accurate inspection
Improved
inspections -
remote and virtual
Introduce
drones and
robotics
Video and
picture analysis
Drone fleet
management
and data
capture
Introduce drones and robotics,
which requires video and picture
analysis capability as well as
drone fleet management and
data capture capability.
WP 6
Asset health based
reinvestments
Increased precision
of imbalance
predictions
Predictive
maintenance
Introduce
predictive
analytics
Data Science
Tools (DL, ML,
High CPU/GPU)
Introduce predictive analytics,
which requires data science tools
including deep learning, machine
learning and high cpu/gpu
capabilities.
WP 2
Improved insights
and business
understanding
Introduce Data
Science
Data Science
Tools (DL, ML,
High CPU/GPU)
Introduce predictive analytics,
which requires data science tools
including deep learning, machine
learning and high cpu/gpu
capabilities.
Related
Improved socio-
economic benefit
analysis
Introduce
probabilistic
reliability
assessment
Data Science
Tools (DL, ML,
High CPU/GPU)
Introduce probabilistic reliability
assessment, which requires data
science tools including deep
learning, machine learning and
high cpu/gpu capabilities.
WP2 and
WP6
Improved
inspections -
remote and virtual
Introduce digital
twin concept
High CPU and
GPU power
Actor
framework
Triple store and
graph storage
Aligning and
harmonizing
facts from
various sources
Introduce digital twin concept,
that enables real time 3D
visualization and control of the
assets as well as means to model
and reproduce the condition of
the grid and assets at a given
point of time.
Digital twin requires high CPU
and GPU power, actor
framework, triple store and graph
storage as well as aligning and
harmonizing facts from various
WP 6
46
Outcome Course of
action
Capability Meaning Relation to
SAMBA use
cases and
expectations
(Data
Catalogue)
sources including data catalogue
capabilities.
Improved
inspections -
remote and virtual
Introduce
augmented/virt
ual reality
High CPU and
GPU power
Introduce augmented/virtual
reality, which requires high CPU
and GPU power capability in the
platform.
WP 6
Optimized
condition based
maintenance
Asset health based
reinvestments
Improved insights
and business
understanding
Increased precision
of imbalance
predictions
Introduce
common data
nav/lake
Triple store and
graph storage
Open access to
data and data
sharing
Aligning and
harmonizing
facts from
various sources
(Data
Catalogue)
Introduce common data
nav/lake, which requires triple
store and graph storage
capability, open access to data
and data sharing, aligning and
harmonizing facts from various
sources including data catalogue.
WP 2
Better data quality Improve data
quality
Data quality and
consistency
check
Improve data quality in all
involved systems and sensors.
This includes the improvements
in the infrastructure for collecting
and transporting the sensor data
as well as improvements in the
data consistency.
This requires data quality checks
and consistency checks
capabilities in the platform.
Related
More efficient
access to
information
Introduce
interactive
information
access
Natural
language
understanding
Document
storage
Chatbot
conversation
support
Introduce interactive information
access, which requires natural
language understanding
capability, document storage and
chatbot conversation support
capabilities.
Related
Adequate security
of sensitive and
important data
Implement
measures to
secure the data
Fine grained
access control
and perimeter
security/AAA
Implement measures to secure
the data, which requires fine-
grained access control and
perimeter security/AAA
Related
47
Outcome Course of
action
Capability Meaning Relation to
SAMBA use
cases and
expectations
Redundancy
and disaster
recovery
capabilities as well as redundancy
and disaster recovery capability.
6.3 Capabilities and information needs
This chapter summarizes all the platform capabilities identified in the Finbeck project. As concluded in
chapter 6.2 most of these capabilities are also required for asset management and SAMBA project.
Figure 19 presents all the capabilities identified by Finbeck project [34].
Figure 19 Big Data and Analytics platform capabilities [34]
The table below explains each of the capabilities in detail.
motivation CapabilityMap - simple
Aligning and harmonizingfacts from various sources
High volume datastorage
Handling of real timeinformation and streaming
analysis
Video and picture analysis Audio analysis supportMachine Learning support Deep Learning support Data Science Tools
User friendlyvisualization
Data Catalogue
Triple Store and Graphstorage
Low latency IoT datatransport
High CPU and GPU power
Processing of batch data
Rule Engine support
Drone Fleet Management& Data Capture
Event Notification,Filtering andDistribution
Open Access to Data andData Sharing
Data Quality andConsistency Check
Redundancy and disasterrecovery
Fine-grained access controland perimeter security/AAA
High throughput IoT datatransport
Configurable andpersonalizedvisualization
Actor FrameworkDocument Storage
Chatbot conversationsupport
Natural languageunderstanding
Sensor TimeSynchronization support
Classic Rule EngineSupport
Map visualization
48
Table 10 Big Data and Analytics platform capabilities
Capability name Related /
required for
asset
management?
Comment
Triple Store and Graph
storage
Yes Ability to store and process Triples and graph data.
Video and picture analysis Yes Ability to analyze and match patterns in the video and
photo files.
Open Access to Data and
Data Sharing
Yes Open API and access data. The system provides
unconstrained access to data using a number of different
ways.
Processing of batch data Yes Processing the data (often static data) in a periodic / batch
way.
Data Quality and
Consistency Check
Yes Ability to asses, control and rate the quality and accuracy
of the information stored in the data lake. In addition,
mechanisms allowing consistency checking of the
information stored in the lake.
Configurable and
personalized visualization
Yes Visualization tools that provide high level of customization
and configurability on personal level.
Redundancy and disaster
recovery
Yes Ability to continue operation of the system despite losing
some of the computational power and storage.
Actor Framework Yes Framework allowing implementation of concurrent
computation model with actors as universal primitives of
concurrency.
Deep Learning support Yes Ability to simulate / run deep neural networks in order to
analyze / train and run the predictive models.
Drone Fleet Management &
Data Capture
Yes Feature allowing steering / controlling and managing a
fleet of drones and acquiring captured data.
High CPU and GPU power Yes High computational power both CPU and GPU (graphical)
Low latency IoT data
transport
Less relevant Ability to transport the data with low delay.
The sensor data are important for asset management;
however, it is less relevant that data has very low latency.
This capability is important for the operations and Fault
Analysis, but less important for the asset management
Self Service Analytics Yes Analytics and visualization tools and views that can be
tailored to meet the needs of each individual and can be
adapted individually by each user.
Granular access control and
perimeter security/AAA
Yes Basic security features of the system allowing sufficient
control of the authentication, authorization and audit.
49
Capability name Related /
required for
asset
management?
Comment
Rule Engine support Yes Feature allowing creating configurable rules that alter the
business logic of the application. Comprises use of AI,
Machine Learning and Deep Learning.
Data Catalogue Yes Metadata store providing information which enables
finding right information in the data lake
Data Science Tools Yes Various tools used by the Data Scientist including
Jupyter/zeppelin notebooks, R studio, SPSS, SparkML and
Python/scikit.
Event Notification, Filtering
and Distribution
Yes Ability to receive, filter and distribute events.
Handling of real time
information / streaming
analysis
Yes Ability to process streams of data, detect patterns and
generate events based on that.
High volume data storage Yes Data storage capable of storing amounts of data not
practical to store on process in traditional databases (i.e.
relational databases)
Machine Learning support Yes Libraries allowing use of statistical methods to analyze and
predict the output based on given parameters, using i.e.
libraries like SparkML.
User friendly visualization Yes User-friendly visualization.
Map visualization Yes Integrated support for map visualization.
High throughput IoT data
transport
Yes Data transport capability that allows sending high volumes
of data in short time.
The sensor data are important for asset management,
however it is less relevant that data transport has very
high throughput. This capability is important for the
operations and Fault Analysis, but less important for the
asset management
Aligning and harmonizing
facts from various sources
Yes Ability to relate, combine and align the information from
multiple sources/silos.
Audio analysis support Yes Ability to analyze and match patterns in the audio file
Natural language
understanding
Yes Ability to comprehend the meaning of the natural
language, i.e. the documents/documentation
Chatbot conversation
support
Yes Support for interaction using the chatbot conversations
Document storage Yes Support for storing the documents
Classic Rule Engine support Yes Classic rule engine support without AI, ML/DL
50
Capability name Related /
required for
asset
management?
Comment
Sensor Time
Synchronization support
Less relevant Capability to ensure time synchronization and time
alignment of data from various sources, as well as
preserving the time delay, time gap and jitter in the stored
data at microsecond level.
Very precise (sub second) time synchronization is less
important for asset management, however it is less
relevant that data has very low latency. This capability is
important for the operations and Fault Analysis, but less
important for the asset management.
6.4 Business architecture
The following diagram (Figure 20) shows dependencies between some SAMBA WP2 use cases and
limited set of identified capabilities on a more detailed level. Such detailed assessment has only been
done for a limited number of WP2 use cases:
51
Figure 20 Business Architecture – WP2 use case mapped to Big Data & Analytics platform capabilities
The following table explains the relationship in more detail; these are also documented in the
Enterprise Architect at Statnett.
Area /
Function
Use Case Capability Meaning / Rationale
Transformer T2.1 Online gas
data analysis
Event notification, filtering
and distribution
Rule Engine support
Machine Learning support
User friendly visualization
Handling of real time
information/ streaming
analysis
Online gas data analysis use case requires a
set of platform capabilities, in particular
event notification, filtering and distribution,
rule engine support, machine learning, user
friendly visualization, handling of real time
information/streaming analysis, aligning and
harmonizing facts from various sources as
well as processing of batch data.
These capabilities are necessary to ensure
efficient real time data collection, flexible
business BusinessLayer
Transformation
T3.6 Yearly health indexT3.5 Periodic oil and gas
analysis
T3.1 Thermal winding agingT2.1 Online gas data
analysis
Cabel
C2-4 DTS
Breaker
Reignition monitor of
reactor breakers
C2.3 Oil filled termination
Processing of batch data
Rule Engine support
User friendly visualisation
High volume data storage
Machine Learning support
Handling of real time
information and streaming
analysis
Alligning and harmonizing
facts from various sources
Video and picture analysis
Event Notification,
Filtering and Distribution
52
Area /
Function
Use Case Capability Meaning / Rationale
Aligning and harmonizing
facts from various sources
Processing of batch data
analysis of online gas data as well as providing
notification of detected deviations.
T3.1 Thermal
winding aging
T3.5 Periodic oil
and gas analysis
Rule engine support
Aligning and harmonizing
facts from various sources
Processing of batch data
Thermal winding aging use case and periodic
oil and gas analysis use case require a set of
platform capabilities, in particular rule engine
support, aligning and harmonizing facts from
various sources as well as processing of batch
data.
These capabilities are necessary to ensure
efficient data collection and flexible analysis
of the data and pattern detection.
T3.6 health
index
Rule engine support
Video and picture analysis
Processing of batch data
Health index use case requires a set of
platform capabilities, in particular rule engine
support, video and picture analysis as well as
processing of batch data.
These capabilities are necessary to ensure
efficient data collection and flexible analysis
of the data and collected video and pictures.
Cable C2.4 DTS Event notification, filtering
and distribution
Rule Engine support
High volume data storage
Aligning and harmonizing
facts from various sources
Processing of batch data
DTS use case requires a set of platform
capabilities, in particular event notification,
filtering and distribution, rule engine support,
high volume data storage, aligning and
harmonizing facts from various sources as
well as processing of batch data.
These capabilities are necessary to ensure
efficient collection of high volume of data,
flexible analysis of DTS data and data from
other sources as well as providing notification
of detected deviations.
C2.3 Oil filled
termination
Event notification, filtering
and distribution
Rule Engine support
Machine Learning support
Aligning and harmonizing
facts from various sources
Processing of batch data
Oil filled termination use case requires a set
of platform capabilities, in particular event
notification, filtering and distribution, rule
engine support, machine learning support,
aligning and harmonizing facts from various
sources as well as processing of batch data.
These capabilities are necessary to ensure
efficient collection of data, flexible analysis of
data from multiple sources as well as
providing notification of detected deviations.
Breaker Reignition
monitor of
Event notification, filtering
and distribution
Reignition monitoring of reactor breakers use
case requires a set of platform capabilities, in
particular event notification, filtering and
53
Area /
Function
Use Case Capability Meaning / Rationale
reactor
breakers
Rule Engine support
Machine Learning support
Handling of real time
information/ streaming
analysis
High volume data storage
Aligning and harmonizing
facts from various sources
distribution, rule engine support, machine
learning support, handling of real time
information/ streaming analysis, high volume
data storage as well as aligning and
harmonizing facts from various sources.
These capabilities are necessary to ensure
efficient real time collection of high volume of
data, flexible analysis of data from multiple
sources as well as providing notification of
detected deviations.
As explained in previous chapter, in addition to the WP2 related capabilities there is much larger set,
which relates to smarter asset management and SAMBA project indirectly through the assessment of
the outcomes of the WP6 report.
6.5 Overall Strategy and Motivation layer
The following diagram (Figure 21) shows the complete strategy and motivation model layer as
described above in this chapter. A full size Enterprise Architect (EA) diagram is also attached in
Appendix V1.
54
Figure 21 Reference model - strategy and motivation - summary
motivation Archimate3 Strategy&Motivation Layer
Aligning and harmonizing facts
from various sources
High volume data storage
Handling of real time information and
streaming analysis
Video and picture analysis
Audio analysis support
Machine Learning support Deep Learning support
Data Science Tools
User friendly visualization
Data Catalogue
Triple Store and Graph storage
Fault Analysis Engineer
CEO
Grid OwnerSystem Operation
Incident costs
Strategic Use of Modern
TechnologyNetwork balanceGrid costsAnalysis costs
Increase efficiency and safety
Optimize maintenance costs
More efficient problem
management
Reduce number and
consequence of incidents
Quicker fault analysis
Reduce latency of data
acquisition
Low latency IoT data transport
Improved real time
monitoring
Automated and
autonomous inspection
Improved more dynamic
visualisation
High CPU and GPU power
Improved inspections -
remote or virtual
More efficient / quicker
fault management
Introduce condition based
visualization
Introduce digital twin
concept
Increase storage capacityIntroduce rule based
analysis
Optimize investment costs
Introduce fault detelction
rules
Introduce Smart Event
Processing
Processing of batch data
Introduce augmented/virtual reality
Rule Engine support
Introduce drones and
robotics
Drone Fleet Management & Data
Capture
Optimized condition based
maintenance Asset health based
reinvestments
Name: Archimate3 Strategy&Motivation Layer
Author: Leslaw Lopacki
Version: 1.0
Created: 11.10.2017 13:34:11
Updated: 03.12.2017 20:51:52
Event Notification, Filtering and
Distribution
Open Access to Data and Data
Sharing
introduce common data
nav/lake
More frequent and
accurate inspection
Data Quality and Consistency Check
Increase Automation of
Data Quality controll
Better Data Quality
Improve data quality
NVE CISO
Adequate security of
sensitive and important
data
Power Supply Reliability
Redundancy and disaster
recovery
implement measures to
secure the data
Secure sensitive data
Fine-grained access
control and perimeter
security/AAA
Allow new high frequency
sensors
High throughput IoT data
transport
Improved configurable
personalized visualisation
Sufficient security of
power sensitive and
personal data
Reduce Repporting Costs
More accurate fault
analysis
Introduce fault
classification rules
Introduce configurable
and personalised
visualisation
Increased precision of
imbalance predictions
Introduce predictive
analytics
Optimize short term
imbalance
Configurable and
personalized visualization
Actor Framework
ConstructionAsset Management
CFOGrid Development
Long Term Planning
CIO
System Operation costs
Improved Insight and
Business Understanding
Introduce Data Science
HMS
Increased Capacity
Reduce bottlenecks
Add new customers
Improved socio-economic cost
benefit analysis
Introduce probabilistic
reliability assesment
Administration costs
Predictive maintenance
Natural language
understanding
Chatbot conversation
support
Document Storage
introduce interactive
information access
More efficient access to
information
Fault Analysis Coordinator
Classic Rule Engine
Support
Sensor Time Synchronization support
Market and Settlement Short Term Planning
Map visualization
55
6.6 Application Architecture
This chapter presents the long-term application architecture of the Big Data and Analytics platform in
Statnett. As explained earlier this is the overall platform as defined in the Finbeck project. Asset
management is an important user of this platform.
The platform supports the principles of context mapping and anti-corruption layer. It provides a
number of APIs and interfaces to access the data including:
API - HTTPS, JSON and programmatic - this is the main API that the internal systems can use to access the data
Notification/streaming API - this API is used to provide notifications and stream data from the platform to other internal systems
Data Exchange API - this API is use to offer the data for public access as well as to third parties as for instance other TSOs, DSOs and regulators
Cloud GW - this component represents the gateway to the cloud, however it is also crucial for integration of any cloud based services
Data Science Tools - this is a set of tools required by the data scientists Self Service visualization - this represents the generic visualization component which is
provided as a part of the platform, there will also be visualization components implemented within each client
Classic Visualization - visualization components which were used / introduced prior to establishing the Big Data and Analytics platform
API - ingestion - this is the API and set of tools for ingestion of the data, both streaming and batch, from internal and external sources
The platform itself provides a number of components supporting capabilities described in earlier
chapters. There are following components defined as a part of the platform:
Storage components - include multiple types of storage including file store, graph/RDF store, time series data store, metadata store, relational data store to store various, structured and unstructured data
Analysis components - include Analytics engine as well as Deep learning and Machine learning engines where Statnett will implement and execute advanced AI algorithms
Processing components - batch and real time - includes Batch processing engine and Streaming processing engine which will detect the patterns in the real time / streaming data
Access components - includes Data Catalogue which is important for structuring the data store in the platform and for being able to find the information
There is a number of internal systems and sources, which will communicate with the platform. These
systems can act as both the provider and consumer of data. The notification and streaming API
provides a means for more complex interactions, i.e. when an internal system need to be notified
about a pattern implemented and detected within the platform.
The architecture for asset management functionality is planned as a hybrid architecture. Some of the
functionality will be placed in the Big Data and Analytics platform, but not only. It is clear that certain
algorithms in the area of asset management will require specialized systems. The Big Data and
Analytics platform cannot meet all these needs. Therefore, there will still be need for other internal
components like asset health management component and Bi-Cycle.
56
technology TechnologyPlatformViewpoint
BigData&Analytics platform
Hadoop/Hortonworks cluster
Spark 2
Zeppelin
Ambari
YARN
Streams cluster
IBM Streams
HBase
HDFS
Solr
IGC cluster
InformationGovernanceCatalogue
JanusGraph
OpenTSDB
BigSQL
Kafka
Knox
Ranger
Cognos cluster
Cognos
Oracle DBMS cluster
Oracle DBMS
Tableau cluster
Tableauserver
User PC
Tableau desktop
DataScience PC
Browser (DS) Browser (User)
SPSS cluster
SPSS server
SPSS client
ZooKeeper
BigIntegrate Sqoop Flume
Hive
BigR
Ingestion Batch&Streaming
API HTTP andProgramatic
Notificationand Streaming
API
Insights for ArcGIS node
Insights forArcGIS
Atlas
57
Figure 24 describes the application layer of Big Data & Analytics Reference Model.
technology TechnologyPlatformViewpoint
BigData&Analytics platform
Hadoop/Hortonworks cluster
Spark 2
Zeppelin
Ambari
YARN
Streams cluster
IBM Streams
HBase
HDFS
Solr
IGC cluster
InformationGovernanceCatalogue
JanusGraph
OpenTSDB
BigSQL
Kafka
Knox
Ranger
Cognos cluster
Cognos
Oracle DBMS cluster
Oracle DBMS
Tableau cluster
Tableauserver
User PC
Tableau desktop
DataScience PC
Browser (DS) Browser (User)
SPSS cluster
SPSS server
SPSS client
ZooKeeper
BigIntegrate Sqoop Flume
Hive
BigR
Ingestion Batch&Streaming
API HTTP andProgramatic
Notificationand Streaming
API
Insights for ArcGIS node
Insights forArcGIS
Atlas
58
59
Figure 22 Reference Model - application layer
application ApplicationPlatformViewpoint
Cloud
External Consumers
Third Party
Sensors
Internal Consumers
BigData&Analytics PlatformInternal Systems and Sources
External Sensors and SourcesInternal Sensors
API - HTTPS,
JSON &Programatic
Timeseries
DatastoreFile Store
Metadata Store
Topology -
Graph - RDFDatastore
Machine LearningEngine
Self Service Visualisation
Deep LearningEngine
EMS
HIS LYNPDC (PMU-data)
DataScience Tools
AutoDIG
API - ingestion - batch&streaming
FASIT2018
Elspec - PQScada
IFS
Digital FaultRecorder
Asset HealthManagement
StreamingProcessingEngine
Notification/
StreamingAPI
DataCatalogue
DroneFleet
Management
Metrum
OIS
Fifty MMS
Fifty HVDC NOIS
Rule Engine
DataScience
Analytics Engine
BatchProcessingEngine
Fault Analysis MaintenanceManagement
AssetManagement
Operations
Classic Visualisation
Relational
Datastore
Innsikt
DataExchange
API
Public Access
Cloud GW
TSO
DSO
Regulator
Public
Datastore
Private
Datastore
Asset ConditionMonitoring &
Analysis
Risk Analysis &Monitoring
Renewal Cost-Benefit Analysis
IMPALA
met.noDistance RelayProtection
ArcGIS
Other Systems
Drone
BI-Cycle
CloudAnalytics
MACE - AFRR-MFRR
60
It is important to point out that several of the components above are not yet supported by the current
Big Data and Analytics platform implemented as a part of the AutoDig 2.0 project. In particular, the
Cloud GW integration and Data Exchange API are not yet implemented and there is no support for
those components.
6.7 Capability to application component mapping
The diagram in Figure 23 explains how the different application components relate to the identified
capabilities. It is important to observe that this mapping reflects the long-term target model, beyond
the currently implemented Big Data lake platform (AutoDig 2.0 project).
The current platform does not support several of the necessary components. We have identified key
areas from the current state assessment, which identifies existing gaps and future platform
improvement opportunities.
Area / Component Gap Opportunity
Data Exchange API component Insufficient means of exchanging
the data with other parties in
secure and reliable way, i.e.
lacking the gateway functionality
to isolate the data exposed to
external users
Support for data exchange both
for public access as well as access
for third party companies, i.e.
DSOs, TSOs and regulators
Cloud GW and cloud support
component
Insufficient means of integrating
the platform with the cloud
services, i.e. IaaS and PaaS
New complex services that can be
provided as PaaS or SaaS services
in the cloud, i.e. natural language
understanding and chatbot
conversation support
API – ingestion component Limited functionality as
implemented in the AutoDig
project and there are several data
sources with significant delays
w.r.t. data transport causing
delays in detection of events, the
data is of poor quality and often
missing. This in turn limits the
value delivered to the Fault
Analysis. As a result, the “Low
latency IoT transport” capability is
poorly supported.
More accurate sensor data. Less
delay in data collection and
quicker analysis. Important
prerequisite for low latency real
time data processing 13
Drone Fleet Management & Data
Capture capability
Current application architecture
does not include this capability
Automated capturing and
processing of data collected by
the drones as well as quicker,
automated and unassisted drone
deployment
13 See also Figure 28 for a suggested detailed architecture for streaming sensor data
61
Figure 23 Reference Model - capability to component ma
application CapabilityMap - application mapping
BigData&Analytics Platform
Cloud
API - HTTPS,JSON &
Programatic
TimeseriesDatastore
File Store
Metadata Store
Topology -
Graph - RDFDatastore
Machine LearningEngine
Self Service Visualisation
Deep LearningEngine
DataScience Tools
API - ingestion - batch&streaming
StreamingProcessingEngine
Notification/Streaming
API
DataCatalogue
Rule Engine
Analytics Engine
BatchProcessingEngine
Classic Visualisation
RelationalDatastore
Innsikt
DataExchange
API
Cloud GW
Public
Datastore
Private
Datastore
Cloud Analytics
Aligning and
harmonizing facts fromvarious sources
Actor Framework Deep Learning
support
Audio analysissupport
Chatbot
conversationsupport
Classic RuleEngine Support
Configurable and
personalized visualization
Data Catalogue
Data Quality andConsistency Check
Data Science
Tools
High CPU and GPU
power
DocumentStorage
Drone FleetManagement & Data
Capture
Event Notification,
Filtering andDistribution
Fine-grainedaccess control and
perimeter
security/AAA
Handling of real time
information andstreaming analysis
Processing ofbatch data
High throughput
IoT data transportHigh volume data
storage
Low latency IoT
data transport
Machine Learning
support
Natural language
understanding
Open Access to
Data and DataSharing
Video and picture
analysis
Redundancy anddisaster recovery
Rule Enginesupport
Self Service
Analytics
Sensor Time
Synchronizationsupport
Triple Store and
Graph storage
User friendly
visualization
Map visualization
62
6.8 Technology Architecture
This chapter presents current technology architecture of the Big Data and Analytics platform in
Statnett. This chapter only focuses on the architecture as it looked at the time of writing of this report
and based on the Big Data platform acquired within the AutoDig 2.0 project. As explained earlier in
chapter 5 and chapter 6.1, this is the same overall platform defined in the Finbeck and AutoDig 2.0
projects. Asset management is expected to become one of the most important users of this platform.
The most important components within this platform are:
Hortonworks platform consisting of several standard components. Notice that only the
components that are in use or planned to be used are presented and not all of the
Hortonworks components.
The Hortonworks platform runs also IBM specific components like BigIntegrate, which is an
ETL14 tool used for ingestion of data into the data lake and the IBM Big SQL, which is an SQL
interface to query data stored in Hive or Hbase.
IBM Streams component used for processing of streaming data and streaming analytics
Tableau visualization component
IBM SPSS server for designing analytics and machine learning functions
Information Governance Catalogue which provides means of structuring the data in the data
lake
Cognos and Oracle RDBMs which are part of the current Innsikt data warehouse portfolio at
Statnett
Table 11 describes the components in the platform in detail:
Table 11 Technology components
Area Component SW Environment
Description15
Access Hive Apache Hadoop Hortonworks (HDP)
Apache Hive is an access tool for providing data summarization, query, and analysis. SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop [19].
Phoenix HDP Apache Phoenix is a massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix hides the intricacies of the NoSQL store enabling users to create, delete, and alter SQL tables, views, indexes, and sequences; insert and delete rows singly and in bulk; and query data through SQL [19]
Pig HDP Apache Pig is a high-level platform for creating programs that run on Apache Hadoop [19]
BigSQL Hadoop IBM
IBM provided BigSQL is a software layer for creating tables and query data in BigInsights using SQL similar to Phoenix and based on Hive [20]
14 ETL – Extract, Transform, Load 15 Suppliers description of the SW component
63
Area Component SW Environment
Description15
Zeppelin HDP Apache Zeppelin is a data science tool. It is a multi-purposed web-based notebook enabling data ingestion, data exploration, visualization, sharing and collaboration features to Hadoop and Spark [19]
Solr HDP Apache Solr is a highly scalable full-text search engine [19]
Kafka HDP Apache Kafka is a distributed streaming platform developed by the Apache Software Foundation written in Scala and Java [19]
Storage JanusGraph/Titan (Atlas)
The Linux Foundation
The Linux Foundation provided JanusGraph is a distributed graph database [35].
Oracle Innsikt/DWH Oracle provided Relational Database Management System
Accumulo HDP Apache Accumulo is a distributed key-value store based on Google's Bigtable [19]
HBase HDP Apache Hbase is a NoSQL/non-relational, distributed database modeled after Google's Bigtable and is written in Java [19]
OpenTSDB Open Source LPGL
OpenTSDB is a scalable time series database built on top of Hadoop and HBase. It simplifies the process of storing and analyzing large amounts of time-series data generated by endpoints like sensors or servers [36]
Storm HDP Apache Storm is a distributed platform for processing streaming data in real time [19]
Ingestion Sqoop HDP Apache Sqoop is a command-line interface application for transferring data between relational databases and Hadoop [19]
Flume HDP Apache Flume is a distributed, reliable, and highly available service for efficiently collecting, aggregating, and moving/streaming large amounts of log data [19]
Kafka HDP Apache Kafka is a distributed streaming platform developed by the Apache Software Foundation written in Scala and Java [19]
Big Integrate IBM IBM Big Integrate is an advanced ETL tool, which is a flavor of IBM DataStage
Streams IBM IBM Streams is an advanced streaming platform that can ingest large amounts of continuous data streams [21]
Big SQL IBM IBM provided BigSQL is a software layer for creating tables and query data in BigInsights using SQL similar to Phoenix and based on Hive [20]
64
Area Component SW Environment
Description15
Operations Oozie HDP Apache Oozie is a server-based workflow scheduling system to manage Hadoop jobs [19]
Ambari HDP Apache Ambari is a system for provisioning, managing, and monitoring Apache Hadoop clusters [19]
YARN HDP Apache YARN is one of the key features in the Hadoop. YARN is now characterized as a large-scale, distributed operating system for big data applications [19]
ZooKeeper HDP Apache ZooKeeper is a distributed configuration service, synchronization service, and naming registry for Hadoop [19]
Security Ranger HDP Apache Ranger is a centralized platform to define, administer and manage security policies consistently across Hadoop components [19]
Knox HDP Apache Knox is a perimeter security gateway system, which 'authenticates' user credentials (mostly against AD/LDAP). Only the successfully authenticated user are allowed access to Hadoop cluster [19]
Visualization Tableau Server Tableau Tableau provided Tableau server component is a high performance data visualization software capable of processing data from various sources incl. Hadoop/BigSQL enabling self-service analytics [22]
Tableau Desktop Tableau Non-server/desktop version of Tableau [22]
Cognos IBM IBM Cognos is a web-based, integrated business intelligence suite by IBM [37]
IBM SPSS IBM IBM provided tool for modelling of predictive algorithms using data from Hadoop distributions and Spark applications [23]
Insights for ArcGIS Esri Esri provided Insights for ArcGIS is a data analytics visualization tool for spatial and non-spatial data [32]
Data catalogue
IBM Information Governance Catalogue
IBM IBM provided catalogue service for storing the metadata and making it possible to structure the data in the Big data lake [38]
Falcon HDP Apache Falcon is a framework for managing data life cycle in Hadoop clusters [19]
Atlas HDP Apache Atlas is a scalable and extensible set of core governance services. Catalogue service for storing the metadata and making it possible to structure the data in the Big data lake, similar to IBM Information Governance Catalogue [19]
Processing Spark 2 HDP Apache Spark is a fast and general engine for large-scale data processing [19]
65
Area Component SW Environment
Description15
MapReduce HDP MapReduce is an original framework for writing applications that process large amounts of structured and unstructured data stored in the Hadoop Distributed File System [19]
IBM BigR Hadoop IBM
IBM provided Big R is a library of functions that provide end-to-end integration with the R language and BigInsights [39]
IBM Streams IBM IBM provided Streams is an advanced stream processing platform that can ingest, filter, analyze and correlate massive volumes of continuous data streams [21]
66
Figure 24 describes the technology layer of Big Data and Analytics Reference Modell.
technology TechnologyPlatformViewpoint
BigData&Analytics platform
Hadoop/Hortonworks cluster
Spark 2
Zeppelin
Ambari
YARN
Streams cluster
IBM Streams
HBase
HDFS
Solr
IGC cluster
InformationGovernanceCatalogue
JanusGraph
OpenTSDB
BigSQL
Kafka
Knox
Ranger
Cognos cluster
Cognos
Oracle DBMS cluster
Oracle DBMS
Tableau cluster
Tableauserver
User PC
Tableau desktop
DataScience PC
Browser (DS) Browser (User)
SPSS cluster
SPSS server
SPSS client
ZooKeeper
BigIntegrate Sqoop Flume
Hive
BigR
Ingestion Batch&Streaming
API HTTP andProgramatic
Notificationand Streaming
API
Insights for ArcGIS node
Insights forArcGIS
Atlas
67
Figure 24 Reference Model – technology layer
technology TechnologyPlatformViewpoint
BigData&Analytics platform
Hadoop/Hortonworks cluster
Spark 2
Zeppelin
Ambari
YARN
Streams cluster
IBM Streams
HBase
HDFS
Solr
IGC cluster
InformationGovernanceCatalogue
JanusGraph
OpenTSDB
BigSQL
Kafka
Knox
Ranger
Cognos cluster
Cognos
Oracle DBMS cluster
Oracle DBMS
Tableau cluster
Tableauserver
User PC
Tableau desktop
DataScience PC
Browser (DS) Browser (User)
SPSS cluster
SPSS server
SPSS client
ZooKeeper
BigIntegrate Sqoop Flume
Hive
BigR
Ingestion Batch&Streaming
API HTTP andProgramatic
Notificationand Streaming
API
Insights for ArcGIS node
Insights forArcGIS
Atlas
68
Within the AutoDig project there has also been created an alternative simplified model based on IBM
Reference Architecture for Analytics. This model is presented on Figure 25.
Figure 25 Alternative model for technology layer based on IBM Reference Architecture for Analytics
6.9 Technology to application component mapping
The following diagram explains how the different technology components relate to the application
components described in previous chapters.
It is important to observe that this mapping reflects primarily the transition model as being currently
implemented in the Big Data lake platform (AutoDig 2.0 project) and not the long term model. As
explained in chapter 6.7 the current platform does not support several of the necessary components,
in particular:
Data Exchange API Cloud GW and cloud support API - ingestion has currently limited functionality w.r.t. data transport latency
The following table describes how the application components are mapped to corresponding components in the technology architecture and what technologies are used to implement the architecture.
Data Sources Analytical Data
Lake Storage
Security
Platform
Information Management & Governance
Actionable
Insight
Analytics In-Motion
Enhanced
Applications
Discovery & Exploration
Analytics Operating System
Ingestion &
Integration
Data
Access
Machine &Sensor data
Image & Video
Content Services
Social Data
WeatherData
Commercial Data Sets
New sources
Traditionalsources
Third-PartyData
Transactional Data
System of Record Data
Data
acquis
itio
n &
applic
ation a
ccess
InternetData
Sets
ApplicationData
DataStage
BigIntegrate
BigSQL
Cognos Analytics
SPSS
PCIBBCI
New Business Models
TM1
OpenPages
Fraud& Operations
PMQCMAIBM Streams
Spark
Governance Catalog
On-Premise
Tableau
Kafka
IBM Streams
HBase
Spark Streaming
Hive
YARN
OpenTSDB
Zeppelin
JanusGraph
Solr
Oracle DBMS(Innsikt)
Ambari
HDFS
Flume
Sqoop
Knox Ranger
Oozie
Kafka
69
Table 12 Mapping of application components to corresponding technology components
Application Component Technology Component
API HTTP/JSON and programmatic IBM Big SQL Hbase OpenTSDB Hive Phoenix Spark 2
Notification and Streaming API Kafka IBM Streams
Topology Graph store JanusGraph/Titan (Atlas)
Relational Data store Oracle
Time series data store Hbase OpenTSDB
Data Science Tools Zeppelin Tableau Server Tableau Desktop IBM SPSS
Self-service visualization Tableau Server Tableau Desktop Insights for ArcGIS
Classic Visualization Cognos
Data Catalogue IBM Information Governance Catalogue
Analytics Engine Spark 2 BigR
Machine Learning IBM Streams Spark 2
Rule Engine IBM Streams Spark 2
Batch Processing Engine Spark 2
Deep Learning Engine Spark 2
Streaming processing Engine Kafka Spark 2 IBM Streams
Data Exchange API Not mapped
Cloud GW Not mapped
Public Data store Not mapped
Private Data store Not mapped
Cloud Analytics Not mapped
Other / platform Oozie Ambari YARN ZooKeeper
Security Ranger Knox
70
Application Component Technology Component
Not in use Falcon MapReduce Accumulo Pig Solr Storm Sqoop Flume NiFi16 MiNifi16 Schema Registry16
Although the technology platform as defined at the moment seems to cover most of the current needs,
there are number of new technologies, which might be of interest to better cover deficiencies in the
platform. One of the important supplements that is relevant for inclusion is the other part of the
Hortonworks platform, Hortonworks Dataflow (HDF) and technologies like Nifi/MiNifi and Schema
Registry, which provide support for collecting, curating, analyzing and acting on the data in flow [9].
16 NiFi, MiNifi and Schema Registry are not part of current platform. These components are included in Hortonworks Dataflow (HDF)
71
Figure 26 Reference Model – technology to application component mapping
technology TechnologyPlatformViewpoint - application mapping
BigData&Analytics platform
Hadoop/Hortonworks cluster
Spark 2
Zeppelin
Ambari
YARN
Streams cluster
IBM Streams
HBase
HDFS
Solr
IGC cluster
InformationGovernanceCatalogue
JanusGraph
OpenTSDB
BigSQL
Kafka
Knox
Ranger
Cognos cluster
Cognos
Oracle DBMS cluster
Oracle DBMS
Tableau cluster
Tableauserver
SPSS cluster
SPSS server
ZooKeeper
BigIntegrate Sqoop Flume
Hive
BigR
API - HTTPS,JSON &
Programatic
Cloud GW
API - ingestion - batch&streaming Notification/ Streaming API
Analytics Engine BatchProcessingEngine
Classic Visualisation
Cloud Analytics
DataCatalogue
DataScience ToolsDeep LearningEngine
File Store
Machine LearningEngine
Metadata Store
Private Datastore Public Datastore
Relational Datastore
Rule Engine Self ServiceVisualisation
StreamingProcessingEngine
TimeseriesDatastore
Topology -Graph - RDFDatastore
Security
Data Exchange API
Insights for ArcGIS node
Insights forArcGIS
Atlas
72
6.10 Governance Principles
Statnett has established a number of Architecture Governance principles, which also apply to systems
and platform at Statnett. Table 13 describes the most important principles that affect the Big Data and
Analytics platform and adjacent systems.
Table 13 Architecture Governance Principles affecting the Big Data and Analytics platform
ID Principle Name Explanation
O5 Information Management
Information shall be handled in a comprehensive manner as a
common asset for Statnett. Data and metadata should be uniquely
identifiable using common keys across systems.
D1 Comprehensive information architecture Statnett's information is an independent value independent of the
ICT system and will be linked to common structure and
management.
D2 Information security and business
criticality
All information shall be used, stored and shared according to
confidentiality, integrity, availability and preservation requirements.
It must be control of what information is mission critical and it must
be stored and protected to meet accessibility requirements.
D3 Data Quality All information must have a known quality state and similar
information should have similar quality tests. Known data quality
ensures proper use and composition of data.
D4 Master data management and life cycle Statnett will have a comprehensive and unified management of the
information's source (source) and ownership (master database /
system), even when this changes over the life of the information.
D5 Storage and sharing of information Data storage and integration must be done according to common
rules and architecture. The business should always know where data
is generated, flowed, shared, changed and saved
6.11 Principles for Big Data and Analytics platform
Looking at the data platform framework, the Finbeck projects defined as well several principles to
address different concerns at Statnett:
Table 14 Principles for Big Data and Analytics platform
Principle Description Rationale
The data platform and data
sets in this should be a
common resource for the
business
The data platform should be developed in
line with the business needs and be a
common resource for the business.
Data sets that are collected, structured and
stored in the data platform should be used
across different purposes - and must be
organized and managed based on this
principle.
Complies with:
O5 – Information Management
Data sets shall be described
and classified
All data sets handled through the data
platform should have a description
(metadata) describing at least the content,
ownership, origin and valuation.
Complies with:
O5 – Information Management
D1 – Comprehensive information
architecture
73
Principle Description Rationale
D4 – Master data management and life cycle
Data sets shall have
ownership and
management
All data sets that are collected and
structured in the data platform should have a
defined ownership, and have well-
functioning governance and management.
Complies with:
D5 - Storage and sharing of information
Data sets shall be subject to
access control and tracking
Access to datasets shall be restricted to
identified users explicitly authorized to
create, modify, read and delete. If necessary,
it should be possible to make access control
on the subset within a data set, for example.
columns and / or rows in tabular datasets.
The data platform will ensure that all access
to data sets is traceable.
Complies with:
D2 - Information security and business
criticality
Use of data sets comes with
responsibility
The use of datasets must be in line with
policies, and according to the interests of the
business. Access to data sets must be
protected, and further processing of data
sets beyond data platform control should be
in accordance with the instructions on the
use of data sets agreed with information
owners.
Responsibility also includes understanding of
the data sets used, and whether these
quality standards meet the quality
requirements that are the basis for the use.
The data platform will eventually contain a
collection of large amounts of data sets. The
fact that these are collected and easily
accessible will in itself constitute both an
opportunity and a risk. The person given
access to parts of these data sets must
understand and exercise accountability using
this approach. It is equally important that the
data user understands which data sets are
used and if they are suitable for the
particular application.
Data sets shall be
processed and have a
retention period in
accordance with guidelines
for information processing
Datasets must be processed in accordance
with established guidelines. The individual
datasets should have a defined storage time
set by information owners in accordance
with established guidelines.
General instructions at Statnett
Data quality is a common
responsibility
Data sets should have ownership and
management, and the main responsibility for
data quality lies in this dimension. However,
anyone using the data set is responsible for
ensuring that data quality is reported back to
the owner / manager and corrected in the
source.
It is not cost effective if the individual data
users individually work to fix data quality
challenges for a data set. It should be a
shared responsibility to return and ensure
that we develop good and correct datasets
that can be used by several individuals. Data
must be corrected in the source and the
work processes associated with this.
Data must be stored in a
cost-effective manner
Data should be saved cost-effectively, and
this is achieved through a well-defined
information architecture that follows given
standards and best practice. This
architecture is defined by the FIA project.
Complies with:
O5 – Information Management
74
6.12 APIs for ingestion and integration
Integration architecture is a central aspect in a Big Data and Analytics platform and in particular when used in a hybrid architecture where business logic related to asset management would be implemented in specialized asset management systems outside this platform. There is a need for several types of integrations, both the interfaces to access the data in the Big Data lake as well as interfaces for streaming data and sending notifications.
The technology platform provides a number of tools and integration technologies to satisfy the integration needs covered by application components “API –HTTPS, JSON and programmatic” as well as “Notification/Streaming API”. In addition to these, there are also integration platforms available at Statnett, which can also be used to integrate the platform with other systems at Statnett.
Table 15 Integration components to integrate Big Data and Analytics platform with specialized asset management systems
Application Component Integration/technology Component
Type of integration
API HTTP/JSON and programmatic IBM Big SQL Hive
SQL
Hbase OpenTSDB
Web services/ReST
Spark 2 Programmatic: - Java - Scala - Python
Notification and Streaming API Kafka IBM Streams
Streaming Notifications
Ingestion IBM Big Integrate ETL
Other (available at Statnett RedHat JBoss Fuse Notifications Web services
Moreover, there are new emerging technologies, which Statnett should consider taking into use, in particular the Hortonworks Dataflow and NiFi.
In practice the future solutions will use a combination of these methods. Figure 27 presents an example of possible, future integration of a Big Data and Analytics platform with asset health management system and visualization of alerts related to asset management.
75
Figure 27 Example of possible integration of asset health management system with Big Data and Analytics platform
In relation to the “API – ingestion component“ and gap related to the quality of sensor data collection, Figure 28 presents a more detailed view on the future architecture of the ingestion and distribution of sensor data, which will provide a better solution to address the issues related to quality of sensor data and reliability of the infrastructure.
Figure 28 A detailed view on the future architecture of the ingestion and distribution of sensor data
Qualitrol Fault Recorders
PMU
PQ Elspec/Metrum
PROT
Other Sensors
Oil/Gas
IEC61850Adapter
IEEE C37.118Adapter
Other Adapters
IEC61850Adapter
Adapter
PQScada/Metrum Adapter
AutoDig
Data Provider Ingestion(microservice/container)
Distribution(Pub-Sub/Kafka)
Monitoring / Operations
Data Science
PredictiveMaintenance
Machine Learning
Data Usage/ Consumer
Asset Health Management
Big Data and Analytics platform
ERP
Sensor Historian
(ETL)
Health Score(ETL/WebService)
Notifications(Kafka/WebService)
Sensors Adapters
Monitoring System
Alerts(Kafka)
Sensor data(Kafka)
Sensor data(61850)
Asset data(ETL)
76
7 Concluding remarks
The WP4 report defines and describes the future architecture for asset management in Statnett.
TOGAF methodology has been used for assessing, analyzing and documenting the architecture. The
architecture has been described using multiple layers and viewpoints of ArchiMate 3.0 modelling
language including strategy and motivation, application layer and technology layer.
The WP4 architecture is based on conclusions made in the Finbeck project and is anchored in the long
term Statnett Reference Model for Big Data and Analytics. The report defines and describes a number
of capabilities that are required from such a platform. Although the focus in WP4 was on the Big Data
and Analytics architecture, the asset management solution itself is a hybrid solution based on Big Data
and Analytics platform and combined with functionality implemented in several existing internal
systems as well as new components.
Although there is a Big Data and Analytics platform being introduced within the AutoDig 2.0 project at
Statnett, this platform is not yet covering all the future needs. There are several areas that need to be
explored. This includes cloud integration, advanced PaaS and SaaS cloud services offering advanced AI
services like natural language comprehension, data exchange APIs and gateways with third parties as
well as improving the infrastructure for ingestion of sensor data.
77
8 References
[1] Statnett, Status and further work - Results from WP1 in the SAMBA project, Oslo: Statnett,
2016.
[2] Statnett, "Use case collection - SAMBA WP2 and WP3 report," Statnett, Oslo, 2018.
[3] Statnett, "Risk monitoring in Statnett - SAMBA WP6 report," Statnett, Oslo, 2018.
[4] Wikipedia, "The Open Group Architecture Framework," [Online]. Available:
https://en.wikipedia.org/wiki/The_Open_Group_Architecture_Framework. [Accessed 22
December 2017].
[5] NimbleMind, "ArchiMate 3.0 – a modern modeling language for digital age," [Online].
Available: http://www.nimblemind.no/2017/09/05/archimate-3-0-a-modern/. [Accessed 22
December 2017].
[6] Smart Grids Coordination Group. Reference Architecture for the Smart Grid.
CEN/CENELEC/ETSI, 2012.
[7] M. Turck, "Matt Turck," [Online]. Available: http://mattturck.com/bigdata2017/. [Accessed 22
December 2017].
[8] NimbleMind, "Big Data - quick overview," [Online]. Available:
http://www.nimblemind.no/2016/09/21/big-data-quick-overview/. [Accessed 22 December
2017].
[9] Hortonworks, "Hortonworks Dataflow," [Online]. Available:
https://hortonworks.com/products/data-platforms/hdf/. [Accessed 29 12 2017].
[10] Wikipedia, "SAP Hana," [Online]. Available: https://en.wikipedia.org/wiki/SAP_HANA.
[Accessed 22 December 2017].
[11] SAP, "SAP Vora," [Online]. Available: https://www.sap.com/products/hana-vora-hadoop.html.
[Accessed 22 December 2017].
[12] Wikipedia, "OSIsoft," [Online]. Available: https://en.wikipedia.org/wiki/OSIsoft. [Accessed 22
December 2017].
[13] Ubuntu/Canonical, "Ubuntu," [Online]. Available: https://insights.ubuntu.com/wp-
content/uploads/HadoopBuyersGuide_sm.pdf. [Accessed 22 December 2017].
[14] IBM, "Watson Data Platform," [Online]. Available:
https://www.ibm.com/analytics/us/en/watson-data-platform/. [Accessed 26 02 2018].
[15] GE, "Predix," [Online]. Available: https://www.ge.com/digital/predix. [Accessed 22 December
2017].
78
[16] Engerati, "IBM Insights Foundation for Energy," [Online]. Available:
https://www.engerati.com/sites/default/files/Day2-1640-Etienne%2520Pelletier-
IBM.compressed.pdf. [Accessed 22 December 2017].
[17] ABB, "ABB launches next-generation asset management solution to improve efficiency and
optimize costs," [Online]. Available:
http://www.abb.com/cawp/seitp202/ed4ee9084a2f169fc12580ba0039beaa.aspx. [Accessed
22 December 2017].
[18] Sintef, "NEF Teknisk Møte 2014," 2014. [Online]. Available:
http://www.sintef.no/projectweb/nef-tm/presentasjoner/. [Accessed 28 12 2017].
[19] Hortonworks, "Maximize the value of data-at-rest to deliver Big Data Analytics," [Online].
Available: https://hortonworks.com/products/data-platforms/hdp/. [Accessed 28 12 2017].
[20] IBM, "What's the big deal about Big SQL?," [Online]. Available:
https://www.ibm.com/developerworks/library/bd-bigsql/index.html. [Accessed 28 12 2017].
[21] IBM, "IBM Streams," [Online]. Available:
https://www.ibm.com/support/knowledgecenter/en/SSCRJU/SSCRJU_welcome.html.
[Accessed 28 12 2017].
[22] Tableau, "2017 Gartner Magic Quadrant," [Online]. Available:
https://www.tableau.com/resource/2017-gartner-magic-quadrant. [Accessed 28 12 2017].
[23] IBM, "SPSS statistical software," [Online]. Available: https://www.ibm.com/analytics/data-
science/predictive-analytics/spss-statistical-software. [Accessed 28 12 2017].
[24] IBM, "About IBM SPSS Modeler," [Online]. Available:
https://www.ibm.com/support/knowledgecenter/en/SS3RA7_18.1.1/modeler_mainhelp_clie
nt_ddita/clementine/entities/clem_family_overview.html. [Accessed 28 12 2017].
[25] Esri, "Geoanalytics Server," [Online]. Available:
https://www.esri.com/arcgis/products/geoanalytics-server.
[26] Esri, "What is ArcGIS GeoAnalytics Server?," [Online]. Available:
http://server.arcgis.com/en/server/latest/get-started/windows/what-is-arcgis-geoanalytics-
server-.htm. [Accessed 22 December 2017].
[27] Esri, "ArcGIS GeoEvent Server," [Online]. Available:
http://www.esri.com/arcgis/products/geoevent-server. [Accessed 22 December 2017].
[28] Esri, "GeoEvent Server," [Online]. Available: https://server.arcgis.com/en/geoevent/.
[Accessed 22 December 2017].
[29] Esri, "ArcGIS Image Server," [Online]. Available: https://www.esri.com/arcgis/products/image-
server. [Accessed 22 December 2017].
79
[30] Esri, "What is ArcGIS Image Server?," [Online]. Available:
http://server.arcgis.com/en/server/latest/get-started/windows/what-is-arcgis-image-server-
.htm. [Accessed 22 December 2017].
[31] Esri, "Insights for ArcGIS," [Online]. Available: http://www.esri.com/products/arcgis-
capabilities/insights. [Accessed 22 December 2017].
[32] Esri, "Inisghts for ArcGIS," [Online]. Available: https://server.arcgis.com/en/insights/.
[Accessed 22 December 2017].
[33] eSmart Systems, "Strategiske innspill på hvordan ny teknologi kan brukes til smartere
anleggsforvaltning," Statnett, Halden, 2016.
[34] Statnett, "Finbeck – Roadmap for IKT-arkitektur, fremtidig analyseplattform - Sluttraport fase
1," Statnett, Oslo, 2018.
[35] Linux Foundation, "Janus Graph," [Online]. Available: http://janusgraph.org/. [Accessed 29 12
2017].
[36] The Open TSDB, [Online]. Available: http://opentsdb.net/. [Accessed 29 12 2017].
[37] IBM, "Cognos Analytics," [Online]. Available: https://www.ibm.com/products/cognos-
analytics. [Accessed 29 12 2017].
[38] IBM, "IBM Infosphere Information Governance Catalog," [Online]. Available:
https://www.ibm.com/us-en/marketplace/information-governance-catalog. [Accessed 29 12
2017].
[39] IBM, "Overview of IBM BigInsights Big R," [Online]. Available:
https://www.ibm.com/support/knowledgecenter/. [Accessed 29 12 2017].
80
V1 EA Diagrams
8.1 Strategy and Motivation
82
83
84
85
86
87
88
89
90
Statnett SF
Nydalen allé 33, Oslo
PB 4904 Nydalen, 0423 Oslo
Telefon: 23 90 30 00
Fax: 23 90 30 01
E-post: firmapost@statnett.no
Nettside: www.statnett.no
Recommended