41
Browse the Book In this chapter, you’ll see how data is governed and managed in SAP Data Intelligence. You’ll learn how to use the Metadata Explorer to disco- ver, profile, and catalog your data, as well as create data quality rules, run rulebooks, and perform data lineage analysis. Atluri, Bardhan, Ghosh, Ghosh, Saha SAP Data Intelligence: The Comprehensive Guide 783 pages, 2022, $89.95 ISBN 978-1-4932-2162-2 www.sap-press.com/5369 First-hand knowledge. “Metadata-Driven Data Governance” Contents Index The Authors

“Metadata-Driven Data Governance” Contents Index The Authors

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Browse the BookIn this chapter, you’ll see how data is governed and managed in SAP Data Intelligence. You’ll learn how to use the Metadata Explorer to disco-ver, profile, and catalog your data, as well as create data quality rules, run rulebooks, and perform data lineage analysis.

Atluri, Bardhan, Ghosh, Ghosh, Saha

SAP Data Intelligence: The Comprehensive Guide783 pages, 2022, $89.95 ISBN 978-1-4932-2162-2

www.sap-press.com/5369

First-hand knowledge.

“Metadata-Driven Data Governance”

Contents

Index

The Authors

193

5

Chapter 5

Metadata-Driven Data Governance

You’re now ready to learn about data governance and data quality

management. In this chapter, you’ll learn how to use SAP Data Intelli-

gence metadata governance to manage your data and generate data-

driven insights. We’ll guide you through each step in the process with

practical examples.

Data governance over your organization’s data and building a unified view of data

stored across multiple systems in silos are key activities in creating a consistent infor-

mation management ecosystem. A well-structured data governance framework enables

the following benefits:

� Unified metadata catalog to gain visibility into the data assets in the enterprise infor-

mation management (EIM) landscape

� Easy governance and management of metadata across disparate sources

� The ability to explore, analyze, and consume information on your data assets with

the ability to share, version management, and lineage assessment

� Data quality monitoring and active data governance to improve reliability and trust-

worthiness of enterprise data

� Provide better insight into privacy-related data

� Quick turn around on information requests with easy access to the information in

the data and data models

� Self-service and data-driven decision-making by business users

� Support nondomain experts and business users to relate IT data assets to business

terminology

In this chapter, we’ll walk you through how to use SAP Data Intelligence to discover and

curate your data, perform data quality assessments, and enrich your data with business

semantics using other data sets. We’ll first discuss using the Metadata Explorer, a key

data governance tool within SAP Data Intelligence, before we cover data profiling, man-

agement via the data catalog, data quality rulebooks, and data lineages.

2162.book Seite 193 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

194

Note

To use the exercises in this book, you’ll need to work with your system administrator to

ensure you have the correct authorizations and roles to access several features in the

Metadata Explorer in SAP Data Intelligence according to your user persona—whether

you’re a data engineer, data or information steward, or business user. For more informa-

tion related to roles and authorizations, refer to Chapter 17, Section 17.2.1.

5.1 Metadata Explorer for Data Governance

Once you log on to SAP Data Intelligence, tiles provide access to different sets of activi-

ties. A good thing about this modularization is that access to tiles is controlled by sys-

tem administrators with the help of roles and authorizations, thereby ensuring that

data security and privacy are managed effectively. A user can only access to what they

need. Refer to Chapter 17, Section 17.2.1, for more information on roles and user access

control. The tile we’ll explore in this chapter is the Metadata Explorer. Figure 5.1 shows

the homepage, which displays some cards to help you navigate to your desired area,

which we discussed in detail in Chapter 4, Section 4.2.2.

Figure 5.1 Metadata Explorer Homepage

In this section, we’ll show you how to extract or crawl metadata from the different

source and target systems in your information ecosystem, manage this metadata, and

2162.book Seite 194 Mittwoch, 22. September 2021 8:49 20

195

5.1 Metadata Explorer for Data Governance

5

also generate a complete picture of you various connected systems through the intui-

tive Discovery Dashboard.

5.1.1 Intelligent Information Management with the Discovery Dashboard

Let’s begin with the Discovery Dashboard, shown in Figure 5.2, which can be accessed

from the Monitor tile in the Metadata Explorer homepage. This useful link provides a

set of metrics to assess the performance and usage of the Metadata Explorer. You’ll see

charts, graphs, tables, and hyperlinks to other tiles providing access to more details

about each metric, such as the following:

� Memory Usage

Shows the memory utilization for data preparation, data cataloging, and data profil-

ing. The Metadata and Preparation charts show the amount of memory used versus

free memory and uses different colors to indicate the level of utilization and alerts.

� Dataset Distribution

Displays data set distribution across connections. You can click on each section of

the pie chart to display the number of data sets for a particular connection. You also

use the Manage Publications link, which will take you to Manage Publications and

show you the published data sets.

� Monitoring

Shows you the overall status of various tasks being performed in the Metadata

Explorer like Profile, Publish, Rulebook, and Preparation, all classified by status (i.e.,

Error, Running, Completed, and Partial). Click Manage to go to the Monitoring page,

where you can filter by date range, task type, or task status.

� Recently Run Rulebooks

Shows the number of available rulebooks and the statuses of the last five rulebooks.

You’ll also see the number of rules contained in each rulebook. Click the number to

see the rules and categories themselves.

� Catalog Metrics

Displays the number of available data sets by connection. You can also see trends in

how the number of data sets has changed, by connection, in the last 7 days. This

information gives you an idea of the most used connections in your enterprise infor-

mation landscape.

� Profiling Metrics

A graphical representation of the number of data sets that have been profiled, by

connection. Also displays the total number of successful fact sheets.

� Recently Published

The tile shows 5 links to the catalog where the published data sets are stored.

� Glossary Metrics

The Glossary Metrics tile shows up to five links to the user’s most recently created or

2162.book Seite 195 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

196

updated terms in the business glossary. The top of the tile shows the number of

terms and categories that have been created.

� Tags Usage

Provides the number of tags created in a tag hierarchy and the usage of tags by object

(data sets or columns). You can search tags as well.

� Tags Hierarchy

Shows the default hierarchy and five most recently used hierarchies. With each hier-

archy, you can access further metadata about tags, like last changed or usage statis-

tics, by data set and column.

Figure 5.2 Sample Discovery Dashboard

5.1.2 Metadata Crawlers to Explore, Classify, and Label Data Assets

With SAP Data Intelligence’s Connection Management application, you can create con-

nections, view metadata, and preview data in real time to start understanding your data

set. This process is also referred to as crawling. You don’t need to stage the data set asset

physically with SAP Data Intelligence to view the metadata.

Via Browse Connections, you can view data profiling fact sheets and explore informa-

tion about the data set, review the column metadata, preview the data in real time, and

more. We’ll discuss these capabilities in detail in Section 5.2.3.

5.1.3 Managing Metadata Data across a Connected System Landscape

Data landscapes of organizations today are quite complex and disparate. For example,

in the same landscape you may find SAP ERP, SAP Business Warehouse (SAP BW), Ama-

zon Redshift for data warehouse, clouds like Amazon Web Services (AWS) or Microsoft

2162.book Seite 196 Mittwoch, 22. September 2021 8:49 20

197

5.2 Data Profiling to Understand Data

5

Azure for storage, and Microsoft Power BI as a reporting solution. With Connection

Management in SAP Data Intelligence, SAP has provided connectivity options for vari-

ous technologies thereby giving options to easily bring in the metadata from these dis-

tributed components into a central view. In this way, data managers can enjoy com-

plete transparency into data processes across all connected components.

As shown in Figure 5.3, SAP Data Intelligence gives connectivity to various types of sys-

tems, both on-premise and in the cloud, as most data landscapes nowadays are hybrid

in nature. To arrive at the Connection Management application, go to Browse Connec-

tions under the Catalog section. We’ll discuss how to create a connection in Chapter 6,

Section 6.2.

Figure 5.3 Various Connection Types in SAP Data Intelligence

5.2 Data Profiling to Understand Data

Data profiling is the process of analyzing and providing a detailed statistical report of

the data set in question. The Metadata Explorer has a built-in feature for data profiling

that provides additional information about the data stored in the object, including

minimum-maximum, average length, null values, blank values, and distinct values.

This information helps data engineers and data specialists assess the quality of the data

and identify the nature of data transformation or data preparation required before the

data set can be made available for reporting.

This section will teach you how to profile your data and understand the nature of your

data using additional tools like fact sheets.

2162.book Seite 197 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

198

5.2.1 Profiling Data Sets from Connections

To profile your data set, follow these steps:

1. From the SAP Data Intelligence landing page, access the Metadata Explorer tile and

select Catalog • Browse Connections.

2. Select the connection (DI_DATA_LAKE, in this example) and navigate to the data

object to be profiled. For our scenario, we’ll profile Items.csv in /shared/SAPDIMeta-

dataExplorer, as shown on the left side of Figure 5.4.

3. Click on and select Start Profiling, and confirm this action. The profiling task

should be initiated, as indicated by the message shown at the bottom of Figure 5.4.

Figure 5.4 Executing a Profiling Task

5.2.2 Profiling Actions and Monitor

Once the profiling is initiated, you can check the status of the profiling task by navigat-

ing to Monitor • Monitor Tasks from the Metadata Explorer homepage. Figure 5.5 shows

the various statuses of a profiling task for the Items.csv, as initiated in Section 5.2.1. The

top screen shows the profiling task in the Running status, and the bottom screen, the

profiling task in the Completed status.

All profiled data sets can be seen from the Catalog • View Profiled Datasets option

within the Metadata Explorer, as shown in Figure 5.6. You can check the history of data

profiling executed on a data set from the Version History field, arranged in descending

order of runtimes.

2162.book Seite 198 Mittwoch, 22. September 2021 8:49 20

199

5.2 Data Profiling to Understand Data

5

Figure 5.5 Data Profiling Task Statuses

Figure 5.6 Displaying the Data Profiling Version History of a Data Set

5.2.3 Viewing Profile Fact Sheets

Fact sheets provide detailed information of the metadata of the data set after data pro-

filing has been completed successfully. It provides information on the data columns,

data types, tags, unique keys, and description of the data set as well as the connection

ID, type of data set, data set size, last modified, last published, and much more. Fact

sheets provide trends on the row count and size, including charts to provide a better

view of data spread and the metadata of the data. The Data Preview tab provides sample

data of the data source.

You can access a fact sheet from Monitoring, Browse Connections, or Catalog. Choose

the data set for the fact sheet, click on , and then select the View Fact Sheet option.

2162.book Seite 199 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

200

Note

By default, the Data Preview tab is set to show only 100 records. The limit can be

increased up to 1,000 records. This value can be changed in the Data Preview tab using

the Maximum number of rows preview dropdown list.

Information is displayed in the following tabs and sections in the fact sheet, which

shows different details about the data set:

� Overview

The Overview tab, shown in Figure 5.7, is organized into the following three sections:

– Dataset Overview: Displays information on the Connection ID, Last Published,

Last Modified, Last Profiled, Number of Columns, Number of Rows, Delimiters, and

Owner.

– Dataset Metrics: Provides the distribution of columns by data type, trend analysis

of count of records profiled, number of assigned data glossary terms, and any tags

or hierarchies associated.

– People and Reviews: Provide details of any rating, commentary, or discussion

associated with the data sets that may have been provided by users using them.

Figure 5.7 Fact Sheet Overview Tab

� Columns

Displays metadata like Name, Type, Minimum-Maximum, Average Length, % of Null

or Blank Fields, Distinct Values, Uniqueness, and Number of Tags, as shown in Figure

5.8.

2162.book Seite 200 Mittwoch, 22. September 2021 8:49 20

201

5.2 Data Profiling to Understand Data

5

Figure 5.8 Fact Sheet Columns Tab

� Data Preview

Displays a set of records from the data set.

� Reviews

Shows ratings and additional information like comments and the comment history

for the data set, as shown in Figure 5.9.

Figure 5.9 Fact Sheet Reviews Tab

� Relationships

Displays the Business Glossary, Terms and Tags, and Associated Data Quality Rule-

books for the data set. You can also assign tags from the Relationships tab, as we’ll

discuss in Section 5.3.2. This tab has three sections:

2162.book Seite 201 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

202

– Terms and Tags: Provides details on any glossary term or tags associated to the

classify the data set or to individual attributes or columns on the data set.

– Data Quality: Gives details of any data quality rulebooks that have been set up on

the data object.

– More Relationships: Displays any additional data objects that have been created

from it.

5.3 Managing Publications and Data Catalogs

In this section, we’ll take you through the various steps for creating and managing

metadata related to various source and target data sets available for your organization

to use via a method of publishing data sets. This section will also show you how to orga-

nize data and related attributes or fields by tagging them and organizing these tags.

5.3.1 Catalog of Published Data Sets

Publishing a data set makes a local copy of the data set’s metadata in the Metadata

Explorer. A published data object, also known as a published data set, can be generated

from various source object types: a connection; a schema or folder on a connection; or

an object such as a view, table, or file. In this section, we’ll teach you, step by step, how

to browse a connection, publish a data set, and create tags to classify and label it.

Note

For our exercises, we’ll be using DI_DATA_LAKE connection, which is configured with

Semantic Data Lake (SDL). For details on how to create this connection, refer to Chapter

6, Section 6.2.1. This option is available for Amazon Simple Storage Service (Amazon S3),

Google Cloud Storage, Hadoop Distributed File System (HDFS), Azure Data Lake, Micro-

soft Windows Azure Storage Blob (WASB), and SDL. Make sure you have the right autho-

rization and roles to perform this activity.

Figure 5.10 and Figure 5.11 show you how to create folders and upload files to them. Fol-

low these steps:

1. Click on Browse Connections in the Metadata Explorer and select the connection

where you want to create the folder. Drill down to the location where you want to

create the folder. In this case, we’ll create a folder under DI_DATA_LAKE/shared.

2. Click on the New Folder icon , as shown in Figure 5.10 1.

3. Provide a Folder Name and click OK 2. Once the folder is created successfully, a mes-

sage will be displayed.

2162.book Seite 202 Mittwoch, 22. September 2021 8:49 20

203

5.3 Managing Publications and Data Catalogs

5

Figure 5.10 Creating Folders in Metadata Explorer for Supported Systems

4. Click on the Upload Files icon .

5. Click on , browse to the location where files were saved, select a file you want to

upload, and click Upload, as shown in Figure 5.11.

Note

Before uploading a file, you can also edit the name of the data set, or you can rename

the data set once uploaded to the folder by clicking on the three dots icon shown in the

list view of the files.

As shown in Figure 5.11, you can rename a file by clicking on Edit 1. In our example, we

renamed this file to Contacts1.csv. Once you click the Upload button, the file is uploaded

2 and can be found in the file list 3.

2162.book Seite 203 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

204

Figure 5.11 Renaming and Uploading Files in Metadata Explorer

Figure 5.12 shows the various actions you can perform on a data set. Now that we’ve

uploaded our data sets, let’s see how we can publish one. A data set is available to other

users for further analysis only after published. Once an object has been published, the

metadata for the object is available for exploratory analysis under Catalog.

Figure 5.12 List of Actions You Can Perform on a Data Set

To publish a data set, follow these steps:

1. Go to Browse Connections and go to the location where the data set is located, in this

case, under DI_DATA_LAKE/shared/SAPDIMetadataExplorer.

2. Click on the icon for the data set.

2162.book Seite 204 Mittwoch, 22. September 2021 8:49 20

205

5.3 Managing Publications and Data Catalogs

5

3. Select + (New Publication) and provide a Name and Description, as shown on the

right side of Figure 5.13.

4. Click Publish.

Figure 5.13 Publishing a Data Set

Once published, the data set should be visible under Catalog and available for other

users to access, as shown in Figure 5.14. Also, the published data set, including its parent

folder and subfolders, will be display as Published in the Browse Connections screen

even if all the data sets in the folder or subfolder are not published, as displayed in the

top and bottom screens shown in Figure 5.15, respectively.

Figure 5.14 Published Data Set in Catalog

Note

You can publish a group of objects organized by folders when browsing connections or

for an individual data set. You can explore more options in the documentation on the

Metadata Explorer, available at http://s-prs.co/v536910.

2162.book Seite 205 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

206

Figure 5.15 SAPDIMetadataExplorer Published as a Connection

Once the data set is published, you can view its metadata from the Catalog, as shown in

Figure 5.16.

Figure 5.16 Viewing the Metadata of a Published Object in Catalog View

2162.book Seite 206 Mittwoch, 22. September 2021 8:49 20

207

5.3 Managing Publications and Data Catalogs

5

Use the Browse Catalog feature in the Metadata Explorer under the Catalog. Click on

the icon beside the object and select View Metadata. Figure 5.16 shows two views

of the metadata:

1 Properties, which shows generic information on the data set like Name, Description,

Type, Size, Last Modified, Owner, Connection ID, Schema, Folder, Status, Search Rank

Matched Terms, Last Profiled, and Last Published.

2 Columns to see metadata for columns, like Name and Type.

As shown in Figure 5.17, all the published data sets shown earlier in Figure 5.16 are also

available in the Catalog. Since we’ve already profiled Items.csv in Section 5.2.1, this file

has the status of PROFILED.

Figure 5.17 List of Published Data Sets in the Catalog

5.3.2 Automatic Tags and Hierarchical Tagging

Once published, the data set is available in the corresponding connection folder under

Catalog. The Metadata Explorer in SAP Data Intelligence provides a hierarchical tagging

method for data sets and data elements or columns, which allows you to organize, man-

age, and find relevant information. The Metadata Explorer includes a preexisting Con-

tentType hierarchical tagging structure. Two tagging methods are available: automatic

and manual.

After a data set is published, when you profile a data set, the Metadata Explorer is intel-

ligent enough to understand the type of data elements in the data set and assign tags

automatically from predefined ContentType tags. For example, as shown in Figure 5.18,

when data profiling is run, automatic tags are assigned in the Number of Tags column

1. By clicking the > icon in a row in the Columns preview in a fact sheet, you’ll open a

screen to view the associated tag 2. You can decide whether to delete that tag and then

assign a tag manually by clicking the Manage Tags button 3.

2162.book Seite 207 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

208

Figure 5.18 Tags in a Fact Sheet

To assign a tag manually to a data set, after clicking the Manage Tags button 3, follow

these steps:

1. Browse for the connection and data set that you want to tag.

2. Click on the icon for the data set and select View Fact Sheet.

3. Go to the Relationships tab and click on Manage Tags, as shown in Figure 5.19.

4. In the Manage Tags window, select the tag you want to associate with the data. In this

case, we’re working with the Customer.csv file, to which we want to assign PERSONAL

INFORMATION.

5. To validate the search with tag, go back to the Catalog landing page and click the

icon next to PERSONAL INFORMATION. The Customer.csv should appear, as shown in

Figure 5.20.

2162.book Seite 208 Mittwoch, 22. September 2021 8:49 20

209

5.3 Managing Publications and Data Catalogs

5

Figure 5.19 Manual Tagging of Published Data Set from Fact Sheet

Figure 5.20 Verifying the Manual Tag

2162.book Seite 209 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

210

To assign a tag manually to a column in the data set, as shown earlier in Figure 5.18, fol-

low these steps:

1. Browse the connection and data set which you want to tag.

2. Click on the icon next to the data set and select View Fact Sheet.

3. Go to the Columns view and click on the field row.

4. If automatic tagging was performed, you can delete these tags from the Content-

Type.

5. Go to Manage Tags and select the tag from the ContentType.

6. To delete a tag, go to Manage Tags view and click the X icon beside the tag.

You can also create new tag hierarchies if the default ContentType hierarchy does not

suit the purpose, or you may want to create a new hierarchy to classify data sets and

data elements differently, for example, by functional area or business domain.

To create a new tag hierarchy, follow these steps:

1. Go to the Catalog view in the Metadata Explorer.

2. Click on the icon next to Select Tag Hierarchy and select More Actions.

3. Select Manage Tag Hierarchies, as shown on the left side of Figure 5.21, click on the +

sign, and provide a Name and Description.

4. Click Save and close.

Figure 5.21 Creating a New Tag Hierarchy

2162.book Seite 210 Mittwoch, 22. September 2021 8:49 20

211

5.3 Managing Publications and Data Catalogs

5

To add child tags to the new hierarchy, select the new hierarchy, click More Actions and

select Add Tag to Hierarchy. After maintaining the Name and Description fields, click

Save or Save and New to create the new tag.

Note

Using Add Tag to Hierarchy for a defined parent hierarchy, you can create tags that are

children and grandchildren. You can perform other actions on tags, like Edit Tag Proper-

ties, Delete Tag from Hierarchy, and Add Tag as Search Filter on the child nodes of the

top-level tag hierarchy, that is, for the FunctionalDomain case, from ContentType under

Catalog.

5.3.3 Using Tags as Search Filters

You can use tags to search for data set(s) with a particular tag or set of tags. As shown in

Figure 5.22, you can search data elements using tags by clicking the filter icon next to a

tag 1. The search tag filter is added to the top 2. The list of data elements will update in

the results pane.

Figure 5.22 Searching Data Objects Using Tags

5.3.4 Managing Publications in the Catalog

To view all publications for a particular connection, follow these steps:

1. Go to the Metadata Explorer and select Catalog • Browse Connections.

2162.book Seite 211 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

212

2. If you want to view all publications under a connection, click on New Publication and

drag and drop the connection from the left side, as shown in Figure 5.23. Click on

to see all folders, subfolders, and data sets that have been published, as shown in

Figure 5.24.

Figure 5.23 Browsing Connections for Published Data Sets

Figure 5.24 Displaying All Published Data Sets for a Connection

2162.book Seite 212 Mittwoch, 22. September 2021 8:49 20

213

5.3 Managing Publications and Data Catalogs

5

To update or delete a publication, after performing the previous steps, continue with

the following steps:

1. Click on the published data object you want to update or delete, as shown in Figure

5.25.

Figure 5.25 Navigating a Publication

2. For updating the name or description of a publication without republishing the data

set, the Update Publication button is turned on once you change any of these two

attributes. Once changed, click on Update Publication, as shown in Figure 5.26.

Figure 5.26 Modifying a Publication

2162.book Seite 213 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

214

3. If you want to include more files in a publication or select/deselect Include Subfolder

(which is available when you create a new publication), you’ll need to use Update and

Publish.

4. If you want to delete a publication, use the Delete option.

You can also manage publications from Data Intelligence Metadata Explorer • Adminis-

tration • Manage Publications. This screen displays a different view of the publications,

organized by connection, as shown in Figure 5.27. You can create a publication from this

view as well via the Create Publication button.

Figure 5.27 Creating Publications from Manage Publications

5.3.5 Lineage Depth Set in Publication Processing

This optional setting is available for lineage analysis and, when selected, shows the

source in a lineage graph. If you set the Lineage Depth as 0, no lineage analysis is

returned. You can set the value between 1 and 100 to show depth levels up to 100 levels.

For example, if you set the depth as 50 and the lineage has 15 levels, you will see all 15

levels. However, if you set the lineage depth at 5 but the actual depth is 15, only the first

5 levels are shown.

5.4 Defining Data Quality Rules and Running Rulebooks

In previous sections, you learned how to publish your data sets and organize them

using tags, how to profile your data sets to gather characteristics of your data attributes,

and how to make this information available for further analysis and usage. However,

you’ll need to continuously monitor the quality of your data from the start to ensure

your data is useful and provides the insights you’re expecting. Thus, you’ll require rules

created around data attributes or elements, and you’ll need to assess your data against

those rules and then quantify and present the outcome of the assessment using dash-

boards. This section will show you how to execute all these tasks.

2162.book Seite 214 Mittwoch, 22. September 2021 8:49 20

215

5.4 Defining Data Quality Rules and Running Rulebooks

5

5.4.1 Rules Determining Business Data Compliance

As a data steward, you must ensure that your data follows the data quality standards

defined by your organization’s master data management and data governance guide-

lines. End users and business users often need to assess or confirm the data used day to

day against specific business rules to improve its quality. A simple example would be

checking the completeness of contact information like the address or contact details of

a customer. A rule must be created and implemented to perform this check.

You’ll need to follow a sequence of steps to successfully implement a business rule. In

this section, we’ll go through each step for showing you how to create a rule, create a

rulebook, bind a rule to a data set, and execute the rulebook. A dashboard can then be

created to reflect the outcome of the rules as scorecards. To work with data quality

rules, click the Rules tile, shown earlier in Figure 5.1, from the Metadata Explorer home-

page.

When you access the Rules tile, you’ll arrive at the screen shown in Figure 5.28. With SAP

Data Intelligence’s Metadata Explorer, you can import existing SAP Information Stew-

ard rules as well, which we’ll explain in Chapter 12, Section 12.5.1. As shown in Figure 5.28,

rules are usually organized under Rules Categories. SAP provides a predefined set of cat-

egories, but you can also create new categories, which we’ll explain in detail in Section

5.4.2. A sample data set, shown in Figure 5.29, will be used for our example exercises.

Figure 5.28 Rules Page from Metadata Explorer

2162.book Seite 215 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

216

Figure 5.29 Sample Data Contact Data

Let’s say we would like to check the accuracy of the Country field to determine if the file

has the correct country code for Australia, as defined by the business, which in this case

is AU. To create the rule, follow these steps:

1. Click on the icon next to the Accuracy category, shown earlier in Figure 5.28, and

choose Create Rule. You can also click on the Create Rule icon.

2. On the Create Rule – Completeness screen, shown in Figure 5.30, provide a Rule ID,

Name, and Description and then click Save. The Rule ID is free text but should be con-

sistent with the defined data and information standards for your organization.

Figure 5.30 Creating a Data Quality Rule

3. On the next screen, you can add a parameter by clicking the icon to accept the

value and Save. In our example, we’ve chosen the parameter to check for case sensi-

tivity, as shown in Figure 5.31. You need to fill out necessary details like Name, Type,

whether it is case sensitive or not, and Description.

2162.book Seite 216 Mittwoch, 22. September 2021 8:49 20

217

5.4 Defining Data Quality Rules and Running Rulebooks

5

4. Once you create a parameter, only then can you add the condition to check by click-

ing the icon. Assign the P_CC parameter we created earlier by selecting Operator

Condition from the Parameter Name dropdown list. Depending on the nature of the

operator condition being checked or validated, you may have to fill in additional

details. The Mode field has two options: User Entry where you can define the values

or formats to be used in the condition and Parameter Value where you must identify

one or more additional parameters with the same data type as the selected parame-

ter. In our example, we’re checking that the value of the Country Code field is equal

to “AU.”

5. If the rule has been defined correctly, the Rule is valid message will be displayed at

the top of the screen.

Figure 5.31 Defining a Data Quality Rule

You can decide to apply the rule on a specific set of records from a data set using the Fil-

ters option on the Rule Definition screen.

The next step in the process is to test the rule we just created to ensure that it is working

as expected. This test can be performed by clicking the Test Rule button. A new screen

will open where you can define some test cases and test the rule by clicking the + but-

ton, as shown in Figure 5.32.

In this case, we’ve updated the rule to be case sensitive. In the top screen, shown in

Figure 5.33, you can enter test parameter values as inputs in the numbered rows. Then,

click the Run Tests button to review the results, as shown in the bottom screen. The only

test case that passes is where the value is “AU.”

2162.book Seite 217 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

218

Figure 5.32 Adding a Test Case

Figure 5.33 Creating Test Cases for Rule Validation

2162.book Seite 218 Mittwoch, 22. September 2021 8:49 20

219

5.4 Defining Data Quality Rules and Running Rulebooks

5

You can delete or edit header properties, parameters, conditions, and filters from the

Rule Definition dashboard. You can execute similar actions for these test cases.

5.4.2 Categories to Organize Business Rules

SAP has provided a predefined set of categories for organizing business rules. SAP’s pre-

defined rule categories are shown in Table 5.1.

However, you may have rules that don’t fall into any of these categories. In this case,

you can create a new rule category from the Rule Overview screen, as shown in Figure

5.34. To create the category, click the + button, shown at the top of the screen. Then,

enter a Name for the category and a Description and then click the Save button. You can

then see the new category, in our example, Sensitivity, as shown in the bottom screen.

You can edit or delete the category by clicking on the icon next to the rule category.

Rule Category Category Description Example

Accuracy Data has a standard value. Country code is populated as standard

value for all records.

Completeness All necessary data is present. Customer record should have address,

email, and contact number.

Conformity Confirm correctness of data

type and format.

Contact number should be 9 digits.

Consistency The data value is same across

data sets.

If a record is inactive, the Inactive field

is filled with X across data sets.

Integrity Validate data relationships. Check customer records have child

records in customer contacts.

Timeliness Validate data is current and

available

Report quarterly sales by a certain

date.

Uniqueness Check for duplicate records

or primary keys.

Check that there is only one record for

a product in a product data set.

Validity Validate if data supports a

policy or measurement.

A new product should be of a particu-

lar color.

Table 5.1 SAP-Defined Rule Categories

2162.book Seite 219 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

220

Figure 5.34 Creating a New Rule Category

5.4.3 Using the Match Pattern Operator

In some scenarios, you may want to perform data quality checks for some specific pat-

terns for the values or strings in a particular field. For example, you may want to check

that a particular string has only alphanumeric characters or that the contact number

provided is a 9-digit phone number. The Match Pattern operator facilitates the imple-

mentation of similar data validation rules using the Metadata Explorer.

The example we’ll use in this section is validating a 9-digit phone number. As shown in

Figure 5.35, this rule has been defined using the steps described in Section 5.4.1. Once a

rule is defined, you can enter a test input value to ensure the rule works as expected.

The test results are shown in Figure 5.36.

2162.book Seite 220 Mittwoch, 22. September 2021 8:49 20

221

5.4 Defining Data Quality Rules and Running Rulebooks

5

Figure 5.35 Setting Up a Rule to Match the Pattern of a Contact Number

Figure 5.36 Test Result Showing the Correct Validation of Contact Numbers

5.4.4 Running and Monitoring Rulebooks

A rulebook is an object created in the Metadata Explorer to manage a set of rules that

can be run on one or more data sets. You may often want to run a set of data quality

checks on a specific set of data that is relevant for your department or domain of busi-

ness. The best approach for this goal is defining rules, creating rulebooks, binding rules

2162.book Seite 221 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

222

to various entities in your data sets, and then executing the rulebook. A rulebook can

have rules belonging to one or more category, and rules may be bound to one or more

data sets in the same rulebook. For example, we’ll show you how two different rules are

bound to different data sets but included in the same rulebook.

First, let’s create a rulebook. Rulebook creation can be done from the rulebooks link on

the Rules tile, shown earlier in Figure 5.1. You’ll arrive at the Rulebook Overview screen,

shown in Figure 5.37. Click the + button to create a rulebook, enter a Name and Descrip-

tion, and click the Save button.

Figure 5.37 Creating a Rulebook

Once the rulebook is created, you’ll need to import the rules you want to execute in the

rulebook. Your new rulebook will appear as a tile on the Rulebook Overview screen.

Click the tile to arrive at the screen shown in Figure 5.38. Now, click the Import Rules

icon on the right to open the screen shown in Figure 5.39, where you can select the

required rules. Click Save.

Figure 5.38 Importing Rules into a Rulebook

2162.book Seite 222 Mittwoch, 22. September 2021 8:49 20

223

5.4 Defining Data Quality Rules and Running Rulebooks

5

Figure 5.39 Selecting and Adding Rules to a Rulebook

After your rules are imported, the next step is to bind the rules to data sets and col-

umns. In our example, we’ll bind our two rules to two different data sets. To bind a rule,

click on the icon next to the imported rule and select View Rule Bindings, as shown

in Figure 5.40.

Figure 5.40 Viewing Rule Bindings

Click on the + icon to open the Create Rule Binding screen, shown in Figure 5.41. In the

Qualified Name field, provide the full path of the data set to which you want to bind the

specific rule. The Binding Name is a unique identifier for the specific rule binding cre-

ated. You can also add a Description to set its context. Finally, map the field in the data

set to the parameter assigned to the rule to complete this step.

2162.book Seite 223 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

224

Figure 5.41 Creating Rule Bindings

Figure 5.42 shows the first rule binding we created earlier, for checking the accuracy of

the country code, and Figure 5.43 shows our second rule binding, for validating the pat-

tern of the contact number.

Figure 5.42 First Rule Binding

2162.book Seite 224 Mittwoch, 22. September 2021 8:49 20

225

5.4 Defining Data Quality Rules and Running Rulebooks

5

Figure 5.43 Adding a Second Rule Binding

Once the rulebook is created and rules are bound, run the rulebook using the Run All

option. Once the execution is completed, click View Results to check the results, as

shown in Figure 5.44.

Figure 5.44 Displaying the Number of Records Passing the Criteria for Each Rule Binding

As shown in Figure 5.44, the percentage of rows that passed the data quality check is

60%. A list of rows for which validation failed is also provided.

Thresholds determine the passing and failure values for the rulebook. You can also

change thresholds in the rulebook by clicking the icon shown in Figure 5.45.

2162.book Seite 225 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

226

Figure 5.45 Setting Rulebook Thresholds

You can see the status of all completed activities via Monitoring • Monitoring Tasks in

the Metadata Explorer, as shown in Figure 5.46.

Figure 5.46 Checking Task Statuses

You can further create a quality dashboard from Rules • View Rules Dashboard to mon-

itor rulebook results. You can click the + button to create a new dashboard, arriving at

the set of screens shown in Figure 5.47. First, add a dashboard Name and Description and

click Save 1. Then, click the + icon to add a new data quality scorecard 2 and use the

Scorecard Wizard to set up the dashboard 3. The wizard has five steps:

1. Select the rulebook for which you would want to create the scorecard.

2. Select the type of reporting you would like to perform (i.e., reporting on rule catego-

ries, data sets, or the rulebook itself).

3. Choose a scorecard type. For more details, refer to http://s-prs.co/v536911.

4. Select one or more data sets, depending on the Scorecard Type option.

5. Maintain the Title and Subtitle fields and click Save.

2162.book Seite 226 Mittwoch, 22. September 2021 8:49 20

227

5.4 Defining Data Quality Rules and Running Rulebooks

5

Figure 5.47 Creating a New Scorecard

Figure 5.48 shows our example dashboard comparing two rule categories.

Figure 5.48 Data Quality Dashboard

2162.book Seite 227 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

228

5.4.5 Business Glossary of Terms and Definitions

For every organization, maintaining a central repository of terms and what they mean

from a business context for sharing is an important activity. Defining a business glos-

sary ensures a consistent set of terminology is used to refer to data sets, entities, and

relationships. The main aim of using a glossary across the enterprise is to ensure a bet-

ter understanding of the information used across the organization.

A business glossary consists of three main areas:

� A term template defines additional information that is required or optional when the

terms are defined.

� A category groups various terms.

� The defined terms provide clarity for the business.

In the Metadata Explorer, a default glossary placeholder is provided by SAP. You’ll need

to define the necessary categories and terms in this placeholder. For starters, a group of

individuals within the organization should agree on a set of terms and their definitions

for the glossary. This set of individuals could be data stewards, business reps, or end

users.

You can define a new glossary category by going to the Business Glossary tile, shown

earlier in Figure 5.1, and clicking glossaries. Then, click the + icon beside Category and

click the Create Category button.

To create a new term in the glossary, click Create Term to arrive at the screen shown in

Figure 5.49, where you’ll provide a Name for the term, a Definition of the term for busi-

ness users, and Keywords to identify data sets or attributes that can be identified by the

term. Click Save when you’re done with these settings.

Figure 5.49 Defining a New Term

2162.book Seite 228 Mittwoch, 22. September 2021 8:49 20

229

5.4 Defining Data Quality Rules and Running Rulebooks

5

A term can be linked to other terms, rules, rulebooks, published data sets, and columns.

With term relationships, you can visualize related information in a graph. This graph

can provide a complete picture of a term’s relevance in the EIM landscape. If a relation-

ship is no longer relevant, you can remove the link. Likewise, when related objects are

removed from the catalog, they are automatically updated in the related objects for the

associated terms. For example, if the contacts table is removed from the connection,

then those terms linked to the table or the columns within the table are removed from

the term’s Relationships tab.

Once a term is defined and saved, click Edit and go to the Relationships tab, where you

can click the Edit Related Objects button. Now, you can associate any Terms, Datasets/

Columns, Rules, or Rulebooks, as shown in Figure 5.50, by making selections and click-

ing Save Related Objects.

Figure 5.50 Editing Related Objects

Note

Data sets/columns are only available for association to terms when they have been

published to the catalog.

A graphical representation of the relationships created between terms is shown in

Figure 5.51.

2162.book Seite 229 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

230

Figure 5.51 Viewing Relationships

You can set a business glossary as default or delete or edit the glossary. Terms should

be reviewed regularly to ensure they are up to date with latest definitions and associa-

tions relevant to your organization or industry. If a term is no longer needed, it can be

deleted from the glossary. You could also create categories and associate with terms in

the glossary. These categories can be then used as filter conditions.

5.5 Data Lineage from Transformation History

By now, you’ve seen how to bring in data from various systems, extract metadata,

assess data quality, and make data sets available to end users. However, one important

aspect of all these activities is the ability to quickly identify the root causes of issues

that may impact reports and end users. To perform this triage, you’ll need to under-

stand the relationships between these organized data sets, which is where data lineage

analysis comes into play. This section will help you understand how to work with rela-

tionships between data sets.

5.5.1 Lineage Analyses for Tracing Data Sets to Sources

Let’s consider a scenario where you’re using a report developed from a reporting tool

for end-of-the-month reporting, but you realize the data doesn’t look correct. Or you

open a report you used successfully yesterday, but today, it stops working because of an

issue with a field on a data set you’ve used. Finding out which source system the data is

coming from or what transformation has been done on the data can be painstakingly

2162.book Seite 230 Mittwoch, 22. September 2021 8:49 20

231

5.5 Data Lineage from Transformation History

5

difficult analysis. With lineage analysis in the Metadata Explorer, you can quickly iden-

tify the source data set, and the turnaround for fixing data could be greatly reduced.

You can see where your data is coming from when using multiple source systems and

complex transformations in your graphs. For example, Figure 5.52 shows the data lin-

eage for an SAP Business Explorer (SAP BEx) query for an SAP BW system, which might

be used to build a report in SAP Analytics Cloud, SAP BusinessObjects, or another

reporting tool. The data lineage of the query shows the associated SAP BW InfoProvider

object as the source, shows intermediate SAP BW objects, and shows the transforma-

tion steps that finally resulted in the output of SAP BEx query.

Figure 5.52 SAP BEx Query Data Lineage

5.5.2 Lineage Extraction and Supported Sources

Data lineage information can be extracted for several types of sources, such as the fol-

lowing:

� SAP BW: Data stores, InfoProviders, and SAP BW queries

� SAP HANA: SQL views, column views, and synonyms

� SAP Vora: Data source tables and views

Lineages can also be extracted from operators in Modeler graph tasks, which uses a data

set referenced through a connection defined in the Connection Management applica-

tion.

Notes

Some lineage extraction limitations exist with some operators in graphs. The list of

operators supporting graph extractions can be found at http://s-prs.co/v536912.

2162.book Seite 231 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

232

Lineages can be extracted using two different methods, which we’ll discuss in the fol-

lowing sections.

Extracting Lineage while Publishing Data Sets

To extract the lineage of a data set, enable the Lineage option (if supported) while cre-

ating a publication or when publishing a data set, as shown in Figure 5.53. You can refer

back to Section 5.3.1 for more details on publication of data sets. To extract lineage from

Modeler graphs, you can use similar steps.

Figure 5.53 Lineage Toggle for Support Data Set

Often, when a lineage extraction is enabled on a data set used in a graph, a lineage

extraction graph is automatically triggered, as shown in Figure 5.54. In this example, an

SAP HANA table has been used as a target system in the graph, and an SAP HANA data-

base metadata extractor graph has been initiated.

2162.book Seite 232 Mittwoch, 22. September 2021 8:49 20

233

5.5 Data Lineage from Transformation History

5

Figure 5.54 Lineage Extractor Graph Initiated

Automatic Lineage Extraction

You can enable automatic lineage extraction on Modeler graphs and during Metadata

Explorer data preparation. With automatic lineage enabled, you can create a history of

lineage analysis, showing details on how the graph has changed with respect to sources

or targets that have been added or removed as well as the transformations that have

been performed. A few additional settings must be enabled in the System Management

application for SAP Data Intelligence. Under the General tab, the options shown in

Figure 5.55 should be configured.

For Metadata Explorer: Automatic lineage extraction of Modeler Graphs and Metadata

Explorer: Automatic lineage extraction of Data Preparations, you must choose one of

the following options:

� enabled_and_publish_datasets

Extracts lineage and publishes the data set to the catalog in the Metadata Explorer.

You can access this the lineage by browsing the connection or the catalog.

� enabled_and_do_not_publish_datasets

Extracts lineage but does not publish the data set to the catalog. You can access this

lineage by browsing the connection in the Metadata Explorer.

� disabled

Will not automatically extract lineage.

For the Metadata Explorer: Days until deletion of automatic lineage option, set the

value to “-1” to ensure all automatic lineage is maintained.

2162.book Seite 233 Mittwoch, 22. September 2021 8:49 20

5 Metadata-Driven Data Governance

234

For the Metadata Explorer: Automatic lineage extraction frequency setting, ensure the

number of minutes is set for the extraction interval.

Figure 5.55 Enabling Automatic Lineage Extraction

5.5.3 Understanding and Configuring the Lineage View

The view for data lineage can be configured to suit your needs through the settings

under the Lineage tab in the Catalog. Click on the Settings , as shown in Figure 5.56,

and select the options shown in Table 5.2.

Figure 5.56 Lineage View Settings

2162.book Seite 234 Mittwoch, 22. September 2021 8:49 20

235

5.6 Summary

5

5.6 Summary

This chapter explored SAP Data Intelligence’s Metadata Explorer in depth, including its

features that you can implement to develop your organization’s data governance

framework. We discussed features like data quality assessment, data lineage tracking,

and cataloging and showed you how to implement these capabilities.

In the next chapter, you’ll learn how to create data pipelines and ingest, cleanse, trans-

form, and store data.

Settings Description

Fixed Node Width Adjusts how object names are displayed (on or off).

Orientation Changes the orientation of how nodes are displayed.

Node Placement Adjusts the number of straight edges and the placement of nodes.

Three options are available:

� Brandes-Koepf

� Linear Segments

� Simple

Node Spacing Adjust the distance between the nodes appropriately.

Line Types Manage how the lines connecting the nodes are displayed:

� Merge: This option combines lines that go in the same direction

and then split when necessary.

� Split: This option separates each line.

Table 5.2 Lineage View Settings: Options

2162.book Seite 235 Mittwoch, 22. September 2021 8:49 20

7

Contents

Preface ....................................................................................................................................................... 21

Part I Getting Started

1 The Data Fabric for the Intelligent Enterprise 33

1.1 Data Fabric ................................................................................................................................ 34

1.1.1 Trends ......................................................................................................................... 35

1.1.2 Benefits ...................................................................................................................... 37

1.2 Data Orchestration ............................................................................................................... 38

1.3 SAP Business Technology Platform ............................................................................... 40

1.4 SAP Data Intelligence .......................................................................................................... 43

1.5 Summary ................................................................................................................................... 50

2 Architecture and Capabilities 51

2.1 Genesis of SAP Data Intelligence .................................................................................... 52

2.1.1 Features from SAP Leonardo Machine Learning Foundation .................. 54

2.1.2 Evolution from SAP Data Hub to SAP Data Intelligence ............................ 58

2.2 SAP Data Intelligence Architecture ............................................................................... 60

2.3 Deployment Options and Bring Your Own License Model .................................. 63

2.4 Kubernetes Cluster and Containers .............................................................................. 68

2.4.1 Overview of Kubernetes ....................................................................................... 68

2.4.2 Kubernetes Cluster Architecture ....................................................................... 75

2.4.3 Container Runtimes ............................................................................................... 78

2.4.4 Pods and Workloads .............................................................................................. 79

2.4.5 Resources and Policies .......................................................................................... 81

2.4.6 Kubernetes and SAP Data Intelligence ............................................................ 83

2.5 SAP Data Intelligence Launchpad .................................................................................. 86

2.5.1 Persona-Based Application ................................................................................. 86

2.5.2 Overview of Applications ..................................................................................... 88

2.6 Summary ................................................................................................................................... 91

2162.book Seite 7 Mittwoch, 22. September 2021 8:49 20

Contents

8

3 Setup and Installation 93

3.1 Landscape Sizing .................................................................................................................... 93

3.1.1 Sizing Various SAP Data Intelligence Components .................................... 94

3.1.2 Minimum Sizing and Initial Sizing for SAP Data Intelligence .................. 95

3.1.3 Understanding the T-Shirt Sizing Approach ................................................. 99

3.2 SAP Cloud Appliance Library ............................................................................................. 99

3.2.1 Getting Started with SAP Cloud Appliance Library ..................................... 101

3.2.2 Deploying SAP Solutions in the Cloud ............................................................. 103

3.2.3 Activating and Creating Solution Instances .................................................. 105

3.2.4 Security Considerations for SAP Cloud Appliance Library ......................... 106

3.3 On-Demand Cloud Provisioning and Instance Sizing ............................................ 107

3.3.1 Sizing with SAP Cloud Appliance Library ........................................................ 108

3.3.2 Supported Cloud Providers for SAP Cloud Appliance Library ................... 109

3.3.3 Understanding Costs and Payments ............................................................... 109

3.3.4 Backing Up, Restoring, and Terminating an Instance ................................ 112

3.4 Setting Up SAP Data Intelligence on SAP Cloud Appliance Library ................. 113

3.4.1 Prerequisites for Cloud Provider Account ...................................................... 114

3.4.2 Connecting to SAP Cloud Appliance Library .................................................. 122

3.4.3 Creating and Accessing the Solution ............................................................... 124

3.4.4 Accessing the Jump Box for Monitoring and Troubleshooting .............. 136

3.4.5 Running the Solution ............................................................................................ 145

3.4.6 Access through Browser Using Local Hosts File ........................................... 148

3.4.7 Personalization ........................................................................................................ 149

3.5 SAP Data Intelligence 3.0 Installation On-Premise ................................................ 150

3.5.1 Planning and Prerequisites for an On-Premise Installation ..................... 150

3.5.2 Modular Deployment with SLC Bridge ............................................................ 151

3.5.3 Installing SAP Data Intelligence with the Maintenance Planner

and SLC Bridge ......................................................................................................... 154

3.6 Summary ................................................................................................................................... 168

4 Using SAP Data Intelligence Applications 169

4.1 SAP Data Intelligence Launchpad Applications ....................................................... 169

4.2 Applications for Data Engineers ..................................................................................... 172

4.2.1 Connection Management ................................................................................... 172

4.2.2 Metadata Explorer ................................................................................................. 174

4.2.3 Modeler ...................................................................................................................... 175

2162.book Seite 8 Mittwoch, 22. September 2021 8:49 20

9

Contents

4.2.4 Customer Data Export .......................................................................................... 176

4.3 Applications for Data Scientists ...................................................................................... 177

4.3.1 ML Scenario Manager ........................................................................................... 177

4.3.2 Vora Tools ................................................................................................................. 178

4.4 Applications for Modelers and Auditors ..................................................................... 179

4.4.1 Monitoring Applications ...................................................................................... 180

4.4.2 Audit and System Logs ......................................................................................... 181

4.5 Applications for System Administrators ..................................................................... 182

4.5.1 Policy Management ............................................................................................... 182

4.5.2 Handling Privileges ................................................................................................ 184

4.5.3 System Management ............................................................................................ 184

4.5.4 License Management ............................................................................................ 188

4.6 Summary ................................................................................................................................... 189

Part II Data Management, Orchestration, and Machine Learning

5 Metadata-Driven Data Governance 193

5.1 Metadata Explorer for Data Governance .................................................................... 194

5.1.1 Intelligent Information Management with the

Discovery Dashboard ............................................................................................ 195

5.1.2 Metadata Crawlers to Explore, Classify, and Label Data Assets ............. 196

5.1.3 Managing Metadata Data across a Connected System Landscape ...... 196

5.2 Data Profiling to Understand Data ................................................................................ 197

5.2.1 Profiling Data Sets from Connections ............................................................. 198

5.2.2 Profiling Actions and Monitor ............................................................................ 198

5.2.3 Viewing Profile Fact Sheets ................................................................................. 199

5.3 Managing Publications and Data Catalogs ................................................................ 202

5.3.1 Catalog of Published Data Sets ......................................................................... 202

5.3.2 Automatic Tags and Hierarchical Tagging ..................................................... 207

5.3.3 Using Tags as Search Filters ................................................................................ 211

5.3.4 Managing Publications in the Catalog ............................................................ 211

5.3.5 Lineage Depth Set in Publication Processing ................................................ 214

5.4 Defining Data Quality Rules and Running Rulebooks .......................................... 214

5.4.1 Rules Determining Business Data Compliance ............................................ 215

5.4.2 Categories to Organize Business Rules ........................................................... 219

2162.book Seite 9 Mittwoch, 22. September 2021 8:49 20

Contents

10

5.4.3 Using the Match Pattern Operator ................................................................... 220

5.4.4 Running and Monitoring Rulebooks ................................................................ 221

5.4.5 Business Glossary of Terms and Definitions ................................................. 228

5.5 Data Lineage from Transformation History .............................................................. 230

5.5.1 Lineage Analyses for Tracing Data Sets to Sources ..................................... 230

5.5.2 Lineage Extraction and Supported Sources ................................................... 231

5.5.3 Understanding and Configuring the Lineage View .................................... 234

5.6 Summary ................................................................................................................................... 235

6 Modeling Data Processing Pipelines 237

6.1 Using the SAP Data Intelligence Modeler ................................................................... 237

6.1.1 Flow-Based Paradigm as a Network of Information .................................. 238

6.1.2 Data Pipeline Engine in the Flow-Based Modeler ....................................... 239

6.1.3 Navigating the Modeler Panes and Toolbars ................................................ 240

6.1.4 Built-In Operators ................................................................................................... 242

6.1.5 Creating and Validating Graphs ........................................................................ 244

6.2 Creating and Managing Connections ........................................................................... 250

6.2.1 Creating Connections ........................................................................................... 250

6.2.2 Connecting to Cloud Foundry ............................................................................ 251

6.2.3 Managing Certificates .......................................................................................... 253

6.2.4 Authorizations for Connections ........................................................................ 254

6.3 Self-Service Data Preparation with the Metadata Explorer ............................... 255

6.3.1 Preparing Data for Accurate Results and Better Insights ......................... 255

6.3.2 Self-Service Data Preparation with the Metadata Explorer ..................... 255

6.3.3 Transforming Structured Data Sets ................................................................. 256

6.3.4 Managing Data Preparation Actions ............................................................... 258

6.3.5 Processing Data Preparation Actions .............................................................. 259

6.4 Integrating, Processing, and Orchestrating Workflows ....................................... 261

6.4.1 Graph Snippets as a Group of Operators ....................................................... 262

6.4.2 Working with Data Workflow Operators ....................................................... 264

6.4.3 Integrating SAP Cloud Applications ................................................................. 266

6.4.4 Change Data Capture Graph .............................................................................. 267

6.4.5 Custom Operators .................................................................................................. 267

6.5 Scheduling and Monitoring Data Pipelines ............................................................... 270

6.5.1 Scheduling and Monitoring Data Pipelines ................................................... 270

6.5.2 Trace Messages ....................................................................................................... 272

2162.book Seite 10 Mittwoch, 22. September 2021 8:49 20

11

Contents

6.5.3 Tracking Model Metrics ........................................................................................ 273

6.5.4 Kubernetes Dashboard and Cluster Logs ....................................................... 273

6.6 Summary ................................................................................................................................... 273

7 Creating Operators and Data Types 275

7.1 Creating Custom Operators .............................................................................................. 276

7.1.1 Visibility of Events .................................................................................................. 277

7.1.2 Compatibility of Port Types ................................................................................. 277

7.1.3 Creating and Editing Operators ......................................................................... 281

7.2 Implementing Runtime Operators ................................................................................ 288

7.2.1 Subengines in SAP Data Intelligence Modeler .............................................. 288

7.2.2 Working with Subengines to Create Operators ........................................... 289

7.3 Creating Data Types ............................................................................................................. 290

7.3.1 Predefined Global Scalar Types .......................................................................... 291

7.3.2 Defining Your Own Custom Data Types ......................................................... 292

7.3.3 Leveraging Data Types in Graphs ...................................................................... 293

7.4 Summary ................................................................................................................................... 293

8 Building Docker Images 295

8.1 Containers in Pods and Pods in Clusters ..................................................................... 295

8.1.1 Delivery of Data-Driven Applications .............................................................. 295

8.1.2 Helm: Package Manager for Kubernetes ........................................................ 296

8.1.3 Dockerfiles: Predefined Runtime Environments .......................................... 297

8.2 Assembling a Docker Image ............................................................................................. 298

8.2.1 Building Docker Images through Dockerfiles ............................................... 298

8.2.2 Enhancing Docker Images with Different Package Managers ................ 302

8.3 Dockerfile Inheritance ......................................................................................................... 303

8.4 Using Docker with Python ................................................................................................. 305

8.5 Summary ................................................................................................................................... 308

2162.book Seite 11 Mittwoch, 22. September 2021 8:49 20

Contents

12

9 Machine Learning 309

9.1 Machine Learning with SAP .............................................................................................. 310

9.1.1 Machine Learning Solutions in the SAP Landscape .................................... 311

9.1.2 TEI Methodology in Machine Learning ........................................................... 313

9.1.3 Transforming Business Use Cases with Machine Learning ..................... 318

9.1.4 Data-Driven Approach versus Traditional Rule-Based Approach ........... 319

9.1.5 Machine Learning Tasks in Enterprise Contexts .......................................... 321

9.1.6 Architectural Principles for Machine Learning ............................................. 325

9.2 Machine Learning with SAP Data Intelligence ......................................................... 328

9.2.1 Scalable Data Pipelines in Complex Data Landscapes ............................... 329

9.2.2 Data and Algorithms as Assets for Machine Learning ............................... 331

9.2.3 Leveraging Open-Source Environments and Skills ...................................... 331

9.3 Using the ML Scenario Manager ..................................................................................... 333

9.3.1 ML Scenario Manager Overview ........................................................................ 333

9.3.2 Setting Up a Scenario in ML Scenario Manager ........................................... 334

9.3.3 Integrating Hyperscale Data and Targets ...................................................... 339

9.3.4 Leveraging Scenario Templates for Machine Learning .............................. 340

9.3.5 Dockerfile Building and Grouping .................................................................... 345

9.3.6 Implementing TensorFlow Pipelines ............................................................... 347

9.3.7 Training and Deploying Models with New Versions .................................. 350

9.3.8 Metrics Explorer and Machine Learning Tracking SDK .............................. 360

9.3.9 Run Collection and Run Performance .............................................................. 363

9.3.10 Visualizing SAP Data Intelligence Metrics with SAP Analytics Cloud ... 363

9.4 ML Data Manager in Data Workspaces and Data Collections ........................... 365

9.4.1 Data Workspaces and Data Collections .......................................................... 365

9.4.2 Organizing Data Sets in Data Lakes ................................................................. 367

9.4.3 Curating a Data Collection .................................................................................. 368

9.4.4 Registering a Data Set ........................................................................................... 369

9.5 Summary ................................................................................................................................... 371

10 Jupyter Notebook 373

10.1 Jupyter Notebook Fundamentals ................................................................................... 374

10.1.1 Interactive Tool for Data Science Projects ...................................................... 374

10.1.2 Jupyter Notebook Dashboard and User Interface ....................................... 379

10.1.3 Data Analysis in Jupyter Notebook ................................................................... 381

2162.book Seite 12 Mittwoch, 22. September 2021 8:49 20

13

Contents

10.2 Working with SAP HANA Cloud ...................................................................................... 386

10.2.1 SAP HANA Cloud: Cloud Database as a Service ............................................ 387

10.2.2 Exploring SAP HANA Cloud on an SAP BTP Trial Account ......................... 389

10.2.3 Understanding the SAP HANA Cockpit and SAP HANA

Database Explorer .................................................................................................. 391

10.2.4 Using Jupyter Notebook in SAP BTP and Integration with SAP

HANA Cloud .............................................................................................................. 393

10.2.5 SAP Data Intelligence Connection .................................................................... 402

10.3 Data Science Experiments with Jupyter Notebook ................................................ 405

10.3.1 SAP HANA Embedded Machine Learning ....................................................... 406

10.3.2 Machine Learning Core Operators .................................................................... 413

10.3.3 SAP HANA ML Training Operator ...................................................................... 423

10.3.4 SAP HANA ML Inference Operator .................................................................... 425

10.4 JupyterLab as the Next-Gen Jupyter Notebook ....................................................... 430

10.4.1 JupyterLab: The Next-Gen User Interface with Built-In Libraries .......... 431

10.4.2 Accessing Jupyter Notebook Artifacts from JupyterLab ............................ 434

10.4.3 SAP HANA Python Client API .............................................................................. 436

10.5 Summary ................................................................................................................................... 437

11 SAP Data Intelligence Python SDK 439

11.1 Using SAP Data Intelligence Python SDK .................................................................... 440

11.1.1 Setting a Context in Jupyter Notebook ........................................................... 440

11.1.2 Data Lake API for SDL ............................................................................................ 441

11.1.3 Retrieving Machine Learning Scenario Metadata ....................................... 443

11.1.4 Training Container Using the SDK .................................................................... 444

11.1.5 Executing and Deploying Pipelines .................................................................. 447

11.2 Accessing Artifacts Using Methods ............................................................................... 448

11.3 Machine Learning Tracking SDK ..................................................................................... 450

11.3.1 Initializing Run for an Experiment .................................................................... 451

11.3.2 Grouping Runs in Run Collections .................................................................... 451

11.3.3 Analyzing Metrics and Logs ................................................................................ 454

11.4 Summary ................................................................................................................................... 454

2162.book Seite 13 Mittwoch, 22. September 2021 8:49 20

Contents

14

Part III Integration

12 Integrating with ABAP Systems 459

12.1 Integration Scenarios ........................................................................................................... 459

12.1.1 Scenarios and Use Cases for Integration ........................................................ 460

12.1.2 ABAP Metadata in the Metadata Explorer ..................................................... 461

12.2 Provisioning Data from ABAP Systems ........................................................................ 465

12.2.1 Exposing the CDS View ........................................................................................ 465

12.2.2 Connection Prerequisites for Data Extraction .............................................. 466

12.2.3 Connecting On-Premise Systems with the Cloud Connector .................. 467

12.3 Using Operators to Trigger Execution in an ABAP System ................................. 472

12.3.1 ABAP Operators to Trigger Function Modules or BAPIs ............................. 472

12.3.2 Prerequisites for ABAP Operators in Remote Systems ............................... 474

12.4 SAP BW/4HANA and SAP Data Intelligence Hybrid Data Virtualization ...... 478

12.4.1 Prerequisites in SAP Business Warehouse ..................................................... 478

12.4.2 Using Connection Type HANA_DB ................................................................... 480

12.4.3 Authorization Check for Services ...................................................................... 481

12.4.4 SAP BW Operator for Pipeline ............................................................................ 484

12.5 Additional Connectivity ...................................................................................................... 485

12.5.1 SAP Information Steward .................................................................................... 485

12.5.2 SAP HANA for SQL Data Warehousing ............................................................ 489

12.6 Summary ................................................................................................................................... 495

13 Integrating with Non-SAP Systems 497

13.1 Non-SAP Cloud System Connectivity ............................................................................ 497

13.1.1 Amazon S3 ................................................................................................................ 498

13.1.2 Amazon Redshift .................................................................................................... 500

13.1.3 Windows Azure Storage Blob ............................................................................. 501

13.1.4 Microsoft Azure SQL Data Warehouse ............................................................ 502

13.1.5 Microsoft Azure Data Lake .................................................................................. 503

13.1.6 Google Cloud Storage ........................................................................................... 506

13.1.7 Google BigQuery ..................................................................................................... 508

13.1.8 IBM Cloud Storage ................................................................................................. 509

2162.book Seite 14 Mittwoch, 22. September 2021 8:49 20

15

Contents

13.2 Non-SAP On-Premise System Connectivity ............................................................... 510

13.2.1 Oracle Relational Database Management System ..................................... 510

13.2.2 Microsoft SQL Server ............................................................................................. 512

13.3 Summary ................................................................................................................................... 513

14 Integrating Big Data Workloads with SAP Vora 515

14.1 SAP Vora in Kubernetes Framework ............................................................................. 516

14.1.1 System Management ............................................................................................ 516

14.1.2 SAP Vora Engine Architecture ............................................................................ 517

14.1.3 Accessing SAP Vora User Interface ................................................................... 520

14.1.4 SAP Vora Data Preview ......................................................................................... 521

14.1.5 Using SQL Editor ..................................................................................................... 522

14.1.6 Using SQL Scripts .................................................................................................... 523

14.2 Data Modeling in SAP Vora ............................................................................................... 524

14.2.1 Creating Database Schemas ............................................................................... 524

14.2.2 Creating Partition Schemes ................................................................................ 525

14.2.3 Creating Tables and Views .................................................................................. 527

14.2.4 Creating Calculated Columns ............................................................................ 532

14.2.5 Additional Functions for Views .......................................................................... 533

14.3 Hierarchies in SAP Vora ...................................................................................................... 536

14.3.1 SAP Vora SQL for Hierarchical Data Analysis ................................................. 537

14.3.2 Using Adjacency Table to Render a Hierarchy .............................................. 539

14.3.3 Caching Hierarchies with Materialized Views .............................................. 539

14.4 Full-Text Search in SAP Vora ............................................................................................. 540

14.4.1 Text Analysis Graphs in Modeler ...................................................................... 540

14.4.2 Linguistic and Semantic Analysis ...................................................................... 541

14.4.3 Full-Text Search on a Document Collection .................................................. 542

14.5 Summary ................................................................................................................................... 542

15 Integrating with SAP Data Warehouse Cloud 543

15.1 Overview of SAP Data Warehouse Cloud ................................................................... 543

15.1.1 SAP Cloud Services Ecosystem ........................................................................... 544

15.1.2 Setting Up the Trial Tenant ................................................................................. 546

15.2 Understanding Spaces ......................................................................................................... 549

15.2.1 Spaces as Virtual Workspaces ............................................................................ 549

2162.book Seite 15 Mittwoch, 22. September 2021 8:49 20

Contents

16

15.2.2 Development in a Space ....................................................................................... 554

15.2.3 Managing Spaces ................................................................................................... 556

15.3 Exploring Connections and Using the Data Builder ............................................... 561

15.3.1 Available Connection Types ................................................................................ 561

15.3.2 Data Builder: Model to Business Catalog ....................................................... 562

15.3.3 Space-Aware Integrated Story Builder ............................................................ 566

15.4 Data Builder in SAP Data Warehouse Cloud versus Pipelines in

SAP Data Intelligence .......................................................................................................... 570

15.5 Summary ................................................................................................................................... 570

16 Integrating with SAP Analytics Cloud 571

16.1 Overview of SAP Analytics Cloud ................................................................................... 571

16.1.1 Solution to Analyze, Plan, Predict, and Collaborate .................................... 572

16.1.2 Fundamental Components: Data, Models, and Stories ............................ 574

16.2 Use Operators: Read File, Formatter, and Producer .............................................. 582

16.2.1 Read File Operator .................................................................................................. 583

16.2.2 Decode Table Operator ......................................................................................... 584

16.2.3 SAP Analytics Cloud Formatter .......................................................................... 585

16.2.4 SAP Analytics Cloud Producer ............................................................................ 586

16.3 Pipelines to Train, Predict, and Visualize Data ......................................................... 587

16.3.1 Using the Dataset API ........................................................................................... 587

16.3.2 Data Set Provision and Consumption ............................................................. 589

16.4 Summary ................................................................................................................................... 591

Part IV System Management, Security, and Operations

17 Administration 595

17.1 System Management Command-Line Client Reference ...................................... 595

17.1.1 Command-Line Client for SAP Data Intelligence ......................................... 596

17.1.2 Using the VCTL Tool: JavaScript Utility ........................................................... 597

17.1.3 Useful Commands for Command-Line Client ............................................... 598

17.2 Administration Applications ............................................................................................ 599

17.2.1 Administrator Access ............................................................................................ 600

17.2.2 System Management ............................................................................................ 600

2162.book Seite 16 Mittwoch, 22. September 2021 8:49 20

17

Contents

17.2.3 License Management ............................................................................................ 611

17.2.4 Connection Management ................................................................................... 613

17.3 Monitoring the SAP Data Intelligence Modeler ....................................................... 616

17.3.1 Monitoring the Status of Graph Execution ................................................... 616

17.3.2 Tracing Messages to Isolate Problems and Errors ....................................... 621

17.3.3 Downloading Diagnostic Information for Graphs ...................................... 623

17.4 SAP Data Intelligence System Logging ........................................................................ 626

17.4.1 Kubernetes Cluster-Level Logging Mechanism ............................................ 627

17.4.2 Browsing Application Logs in the Diagnostics Kibana Web

User Interface .......................................................................................................... 629

17.4.3 Aggregating Logs in External Logging Service .............................................. 630

17.5 System Diagnostics ............................................................................................................... 631

17.5.1 SAP Data Intelligence Diagnostics: Diagnostics Grafana ......................... 631

17.5.2 Kubernetes Cluster Metrics ................................................................................ 633

17.5.3 Integrating Diagnostics with External APM Solution ................................ 635

17.6 Summary ................................................................................................................................... 637

18 Security 639

18.1 Approach to Data Protection ............................................................................................ 639

18.1.1 Business Semantics for Industry-Specific Legislations .............................. 640

18.1.2 Functions for Data Privacy Compliance .......................................................... 641

18.1.3 Security Features for Data Protection and Privacy ...................................... 641

18.2 Authenticating Services and Users ................................................................................ 642

18.2.1 Roles and Scope-Driven User Access Control ................................................ 642

18.2.2 SAP BTP User Account and Authentication ................................................... 644

18.2.3 Self-Signed Certificate Authority and TLS ...................................................... 649

18.2.4 Leveraging Policy Management for Access Control .................................... 649

18.2.5 Enabling Security Features on Kubernetes Cluster ..................................... 657

18.3 Securely Connecting On-Premise Systems ................................................................. 658

18.3.1 Cloud Connector ..................................................................................................... 658

18.3.2 Site-to-Site Virtual Private Network ................................................................ 659

18.3.3 Virtual Private Cloud Peering ............................................................................. 659

18.4 Summary ................................................................................................................................... 659

2162.book Seite 17 Mittwoch, 22. September 2021 8:49 20

Contents

18

19 Maintenance 661

19.1 Understanding Operational Modes or Run Levels .................................................. 661

19.2 Switching the Platform to Maintenance Mode ....................................................... 662

19.2.1 Enabling or Disabling Maintenance Mode .................................................... 663

19.2.2 Restarting SAP Data Intelligence Services ...................................................... 664

19.2.3 Setting Up a Remote Connection to SAP ........................................................ 664

19.3 Increasing System Management Persistent Volume Size ................................... 665

19.3.1 Persistent Volume Error Handling .................................................................... 665

19.3.2 Changing the Persistent Storage Size of the SAP Vora Disk Engine ...... 667

19.3.3 Changing the Buffer and File Size of the SAP Vora Disk Engine ............. 668

19.4 Performing Backups ............................................................................................................. 668

19.5 Summary ................................................................................................................................... 671

20 Application Lifecycle Management 673

20.1 Version Control System ...................................................................................................... 673

20.2 Git ................................................................................................................................................. 674

20.2.1 Git Basics and Terminology ................................................................................. 675

20.2.2 Git Integration and CI/CD Process .................................................................... 678

20.2.3 Setting Up Your Environment for Git Workflows ........................................ 697

20.3 Continuous Integration and Continuous Delivery ................................................. 707

20.3.1 Continuous Integration Best Practices ............................................................ 707

20.3.2 Leveraging SAP Solutions for CI/CD ................................................................. 712

20.4 DevOps Fundamentals and Tools ................................................................................... 713

20.4.1 The Core Tenets of DevOps ................................................................................. 715

20.4.2 Implement Tooling for DevOps ......................................................................... 718

20.4.3 DevOps for Hybrid Architectures ...................................................................... 719

20.5 SAP Data Intelligence as the MLOps Platform .......................................................... 723

20.5.1 Production Lifecycle of Machine Learning Models ...................................... 724

20.5.2 MLOps Challenges .................................................................................................. 726

20.5.3 MLOps Capabilities ................................................................................................ 727

20.6 Migrating from SAP Leonardo Machine Learning Foundation ......................... 730

20.6.1 Bring Your Own Model ......................................................................................... 731

20.6.2 Migrating the Training Data ............................................................................... 733

20.6.3 Adding the Training Data to a Data Lake ....................................................... 734

20.7 Summary ................................................................................................................................... 734

2162.book Seite 18 Mittwoch, 22. September 2021 8:49 20

19

Contents

21 Business Content and Use Cases 737

21.1 Digital Transformation and SAP Data Intelligence ................................................ 737

21.2 Business Content by Industry .......................................................................................... 740

21.3 Finance Use Cases .................................................................................................................. 746

21.4 Supply Chain Use Cases ...................................................................................................... 747

21.5 Manufacturing Use Cases .................................................................................................. 749

21.6 Summary ................................................................................................................................... 751

Appendices 753

A Outlook and Roadmap ........................................................................................................ 753

B The Authors .............................................................................................................................. 763

Index .......................................................................................................................................................... 765

2162.book Seite 19 Mittwoch, 22. September 2021 8:49 20

765

Index

/vflow directory ..................................................... 687

/vhome folder ........................................................ 699

A

ABAP .......................................................................... 459

best practices ..................................................... 710

CDS views ............................................................ 465

certificate ............................................................ 466

connect with SAP BTP ..................................... 470

connect with SAP Data Intelligence ......... 471

connection prerequisites ............................... 466

data provisioning ............................................ 465

execute functions ............................................. 460

operator prerequisites .................................... 474

operators ............................................................. 472

use cases .............................................................. 460

ABAP CDS Reader operator ............ 465, 473, 477

ABAP Converter operator .................................. 472

ABAP integration ..................................................... 60

ABAP ODP operator ............................................. 474

Access control ........................................................ 642

manage policies ................................................ 649

Access control list (ACL) ..................................... 107

Access key ................................................................ 118

Access point ............................................................ 129

Account ........................................................... 100, 104

active ..................................................................... 106

assign users ........................................................ 126

choose ................................................................... 125

create .................................................................... 122

owner .................................................................... 100

user ........................................................................ 114

Adam algorithm .................................................... 348

Adjacency list ......................................................... 539

Administration ...................................................... 595

applications ........................................................ 599

monitoring ......................................................... 616

system diagnostics .......................................... 631

system logging .................................................. 626

tile ........................................................................... 175

Administrative service ....................................... 266

Administrator ........................... 100, 182, 600, 661

Algorithm ....................................................... 320, 331

APL ......................................................................... 410

data-driven approach .................................... 320

deep learning ..................................................... 348

embedded in SAP HANA ................................ 425

Algorithm (Cont.)

examples ............................................................. 409

PAL ......................................................................... 407

personalize ......................................................... 385

Alias ........................................................................... 534

Amazon Elastic Container Registry

(Amazon ECR) ................................................... 144

Amazon Redshift .................................................. 500

Amazon Simple Storage Service

(Amazon S3) .............................................. 498, 734

Amazon Web Services (AWS) ........................... 109

connect ................................................................ 122

console URL ........................................................ 114

monitor ................................................................ 144

policies ................................................................. 115

quota error ......................................................... 110

register as cloud provider ............................. 114

sizing .................................................................... 108

Analytics .................................................. 42, 618, 739

processing ........................................................... 326

SAP Analytics Cloud ....................................... 572

stories ................................................................... 567

usage .................................................................... 648

Anonymization ..................................................... 392

Apache Kafka ................................................. 332, 381

API server .................................................................... 76

Appliance ................................................................. 100

Application development and integration ... 42

Application development machine

learning ............................................................... 327

Application Function Library (AFL) ............... 407

Application instance ........................................... 188

Application integration ........................................ 34

Application lifecycle management ............... 673

CI/CD .................................................................... 707

DevOps ................................................................. 713

Git .......................................................................... 674

MLOps .................................................................. 723

Application log ...................................................... 629

Application management ................................. 608

properties ............................................................ 610

Application management services

(AMS) .................................................................... 722

Application performance management

(APM) .................................................................... 635

Application programming interface (API) .... 56

Google Cloud Platform .................................. 120

public .................................................................... 758

2162.book Seite 765 Mittwoch, 22. September 2021 8:49 20

Index

766

Architecture ....................................................... 51, 60

decision points ..................................................... 64

Kubernetes ............................................................. 73

Kubernetes clusters ............................................ 75

microservices ........................................................ 74

Artifact ...................................................................... 415

class ............................................................. 439, 448

Artifact Consumer operator ................... 416, 428

inputs and outputs .......................................... 417

Artifact Producer operator ............. 276, 368, 414

configuration parameters ............................ 415

inputs and outputs .......................................... 416

Artificial intelligence (AI) ..................... 39, 53, 309

Attribute ................................................................... 555

Auditing .................................................................... 181

Auditor ............................................................ 179, 182

Authentication ...................................................... 642

Authorization ............................................... 107, 359

check ...................................................................... 481

connections ........................................................ 254

Google Cloud Storage ..................................... 507

OAuth .................................................................... 589

SAP BW users ..................................................... 484

SAP HANA users ................................................ 483

scenarios .............................................................. 555

type .............................................................. 120, 123

Automated acceptance testing ....................... 708

Automated Predictive Library (APL) ... 311, 410

prerequisites ....................................................... 410

Automatic invoice posting ............................... 744

Automatic lineage extraction .......................... 233

Automation ............................................................. 718

AutoML .................................................... 62, 312, 315

Autoscaling ............................................................. 755

B

Backup .................................................... 112, 167, 668

files ......................................................................... 669

Banking ..................................................................... 741

Base operator .......................................................... 282

Base strategy ........................................................... 603

Benchmark .............................................................. 319

Best practices .......................................................... 707

Bias ................................................................... 321, 384

Big data .................................................... 46, 178, 515

Binary Large Object (BLOB) file ....................... 412

Binary target ........................................................... 410

Blocking .................................................................... 640

Bokeh ......................................................................... 377

Box plot ..................................................................... 378

Brainstorming workshop .................................. 318

Branch ........................................................................ 701

Bring your own license (BYOL) .......................... 63

Bring your own model (BYOM) ....................... 731

Bugfix branch ......................................................... 704

Build server .................................................... 680, 690

Build step .................................................................. 695

Build trigger ............................................................. 694

Business Builder .......................................... 552, 565

artifacts ................................................................ 555

Business catalog ..................................................... 562

Business content ......................................... 737, 740

Business entity ....................................................... 555

Business Entity Recognition ............................... 58

Business glossary ........................................ 175, 228

create new term ................................................ 228

Business model innovation .............................. 739

Business purpose .................................................. 640

Business user .......................................................... 100

C

Caching ...................................................................... 539

Calculated column ...................................... 532, 533

Calendar .................................................................... 573

Canvas ........................................................................ 580

Capabilities ................................................................ 51

Cash flow analysis ................................................. 317

Catalog ............................................................. 175, 205

browse connections ......................................... 197

manage publications ...................................... 211

metrics .................................................................. 195

view metadata ................................................... 207

Certificate authority (CA) ................................... 642

self-signed ............................................................ 649

Certificates ............................................................... 253

ABAP ...................................................................... 466

import ................................................................... 254

manage ................................................................. 615

self-signed CAs ................................................... 649

Change data capture (CDC) ...................... 267, 461

Chart ................................................................. 296, 378

create ........................................................... 361, 581

SAP Vora .............................................................. 521

stories .................................................................... 568

Chemicals ................................................................. 741

Classification ........................................................... 425

model ..................................................................... 418

Client-server architecture .................................... 75

Client-side library .................................................. 436

Cloud application .................................................. 266

Cloud connector ................................ 251, 467, 710

access ..................................................................... 468

2162.book Seite 766 Mittwoch, 22. September 2021 8:49 20

767

Index

Cloud connector (Cont.)

connect with SAP BTP ..................................... 469

exposed backend systems ............................. 470

features ................................................................ 468

security ................................................................. 658

use .......................................................................... 658

Cloud data integration .......................................... 60

API .......................................................................... 266

Cloud deployment ........................................ 66, 103

Cloud Foundry ............................................. 251, 713

enable ................................................................... 644

Cloud integration ................................................. 710

Cloud Native Computing Foundation

(CNCF) ............................................................ 68, 296

Cloud provider ................................................ 44, 109

account ................................................................ 100

AWS ........................................................................ 114

costs ............................................................. 102, 109

Google Cloud Platform .................................. 120

Microsoft Azure ................................................ 119

monitoring ......................................................... 144

prerequisites ....................................................... 114

register ................................................................. 114

select ...................................................................... 122

sizing ..................................................................... 108

Cloud provisioning .............................................. 107

Cloud services ........................................................ 544

architecture ........................................................ 545

Cloud vendor ............................................................. 43

Cloud-enabled profile ......................................... 721

Cloud-native profile ............................................. 720

Cluster ................................................................ 68, 295

admin .................................................................... 604

IPython ................................................................. 380

manage ................................................................ 602

metrics .................................................................. 633

security ................................................................. 657

storage .................................................................. 665

subnet ................................................................... 129

view ........................................................................ 601

Cluster Overview dashboard ............................ 633

Clustering ................................................................. 425

Cluster-level logging ............................................ 627

Code management .................................................. 73

Cold data ................................................................... 388

Collaboration .......................................................... 573

Column ..................................................................... 532

transform ............................................................ 576

Command-line interface (CLI) ...... 186, 296, 595

commands .......................................................... 152

Communciation scenario .................................. 467

Communication protocol ................................. 107

Communication security .................................. 642

Complex materials .............................................. 750

Concept drift .......................................................... 730

Configuration Manager ..................................... 329

Configuration pane ....... 240, 244, 284, 346, 414

Configuration type .............................................. 241

Connection Management ......... 61, 88, 172, 250,

402, 613

ABAP systems .................................................... 471

authorizations .................................................. 254

connect to Cloud Foundry ............................ 251

create connections .......................................... 250

manage certificates ............................... 253, 615

manage connections ...................................... 613

metadata crawling ......................................... 196

non-SAP cloud systems ................................. 497

options ................................................................. 173

SAP HANA ........................................................... 480

WASB .................................................................... 501

Connection Manager .......................................... 179

Connection type ............. 172, 250, 251, 497, 614

ABAP ..................................................................... 461

ADLS ...................................................................... 503

Amazon S3 .......................................................... 498

AZURE_SQL_DB ............................................... 502

cloud connector gateway ............................. 471

GCP_BIGQUERY ............................................... 508

HANA_DB ........................................ 402, 480, 489

MSSQL .................................................................. 512

Oracle ................................................................... 511

SAP BW ................................................................. 478

SAP Data Warehouse Cloud ........................ 561

SDL ......................................................................... 509

tables .................................................................... 528

TLS ......................................................................... 403

Consistency check ................................................ 669

Constant Generator operator ................ 286, 342,

417, 428

Consumer products ............................................. 741

Consumption model ........................................... 555

Container ............................ 68, 71, 78, 83, 295, 297

create .................................................................... 691

Docker .................................................................. 298

images ..................................................................... 75

registry ........................................................ 152, 167

runtimes ................................................................. 79

Container-based deployment ..................... 69, 71

containerd .................................................................. 79

ContentType tag ................................................... 207

Contextual AI ............................................................ 62

2162.book Seite 767 Mittwoch, 22. September 2021 8:49 20

Index

768

Continuous integration/continuous

delivery (CI/CD) ............ 72, 315, 673, 678, 707

best practices ..................................................... 707

pipelines ............................................................... 690

SAP solutions ..................................................... 712

Controller manager ................................................ 77

Core data services (CDS) view ................ 461, 465

expose ................................................................... 465

operator ............................................................... 473

Cost calculator .......................................................... 66

Cost Explorer API .................................................. 111

Cost forecast ........................................................... 126

Crawling .................................................................... 196

CRI-O ............................................................................. 79

Cron job ....................................................................... 82

Cross channel integration ................................. 739

Cross industry ........................................................ 742

Custom ABAP operator ...................................... 474

Custom operator ................................................... 275

add ports ............................................................. 283

base ........................................................................ 282

configuration ........................................... 284, 285

create .......................................................... 276, 281

deploy ................................................................... 286

documentation ................................................. 286

edit ......................................................................... 287

output probability ........................................... 287

script ...................................................................... 284

subengines .......................................................... 289

Custom resource ................................................... 667

Customer Data Export ........................................ 176

D

DaemonSet ....................................................... 82, 630

Data analysis ........................................................... 381

statistical modeling .............................. 384, 386

Data Attribute Recommendation ..................... 58

Data Builder ......................................... 552, 561, 570

artifacts ................................................................ 554

connection types .............................................. 561

create graphical view ..................................... 563

create SQL view ................................................. 564

create table ......................................................... 563

import files .......................................................... 552

Data category .......................................................... 369

Data collection ............................................. 323, 365

create .................................................................... 366

curate .................................................................... 368

Data composition .................................................... 39

Data consumption ............................ 174, 482, 559

Data crawling .......................................................... 461

Data deletion ........................................................... 640

Data democratization ............................................ 38

Data drift ................................................................... 730

Data engineer .................................................. 59, 374

applications ........................................................ 172

Data exploration ......................................... 374, 375

Data fabric .................................................... 33, 34, 37

benefits ................................................................... 37

trends ....................................................................... 35

Data flow ......................................................... 239, 562

Data Frame API ....................................................... 312

Data governance .............................. 39, 61, 94, 193

sizing ....................................................................... 97

Data ingestion ................................................ 45, 574

Data integration ............................................ 60, 326

Data lake ...................................... 367, 388, 439, 441

access ..................................................................... 442

add training data ............................................. 734

SAP HANA ............................................................ 757

storage system ................................................... 367

Data Lake API ................................................ 440, 441

Data lineage ............................................................. 230

extract ......................................................... 231, 233

view ........................................................................ 234

Data modeler .......................................................... 179

Data orchestration ................... 34, 38, 48, 61, 237

connections ......................................................... 172

Data pipeline .......... 48, 49, 61, 89, 175, 239, 288,

325, 329, 413

best practices ...................................................... 709

CI/CD ..................................................................... 690

create ..................................................................... 340

schedule ................................................................ 270

sizing ....................................................................... 98

Data platform ........................................................... 36

Data preparation ................................................... 255

actions .............................................. 256, 258, 259

manage tasks ..................................................... 258

monitor ................................................................. 260

Data preview ................................................. 199, 404

SAP Vora .................................................... 521, 528

Data privacy ............................................................. 641

Data profiling ................................................ 197, 255

actions and monitor ....................................... 198

Data protection ...................................................... 639

Data provider service ........................................... 266

Data provisioning ................................................. 460

ABAP ...................................................................... 465

Data quality rule .................................................... 214

Data science ............................................................. 309

experiments ........................................................ 405

projects ....................................................... 318, 374

2162.book Seite 768 Mittwoch, 22. September 2021 8:49 20

769

Index

Data scientist ....... 49, 56, 59, 310, 316, 319, 323,

365, 374, 406

applications ........................................................ 177

approaches ......................................................... 320

Data serialization .................................................. 154

Data set ........................................................... 193, 333

ABAP ...................................................................... 461

actions ........................................................ 257, 259

balance ................................................................. 385

create collection ............................................... 366

distribution ......................................................... 195

document ............................................................ 519

exploratory analysis ....................................... 381

extract lineage ........................................ 232, 233

fact sheet ............................................................. 199

hierarchies .......................................................... 536

import ................................................................... 575

inference .............................................................. 425

manage tags ...................................................... 210

metadata ............................................................. 462

metrics .................................................................. 200

organize in data lakes .................................... 367

outliers ................................................................. 384

profile ................................................. 198, 207, 255

publish ........................................................ 202, 204

register ................................................................. 369

trace ....................................................................... 230

train and test ..................................................... 384

transform ............................................................ 256

view fact sheet ................................................... 464

view metadata .................................................. 206

visualize ............................................................... 383

Data source ....................................................... 46, 323

inference .............................................................. 425

streaming ............................................................ 340

tables ..................................................................... 529

Data sprawl ................................................................. 36

Data steward ................................................. 174, 215

Data tiering ............................................................. 544

Data Transfer operator ....................................... 484

Data Transform operator ................................... 492

Data Transport operator .................................... 482

Data type .................. 241, 242, 275, 278, 290, 382

create .................................................................... 292

leverage ................................................................ 293

Data visualization ................................................. 377

Data volume .............................................................. 94

Data workflow ........................................................ 264

Data workspace ...................................................... 365

Data wrangling ............................................. 324, 576

Database and data management ....................... 42

Database schema .................................................. 524

Database view ........................................................ 530

catalog ................................................................. 532

Data-driven application ..................................... 295

Data-driven approach ...................... 310, 319, 321

benefits ................................................................ 320

Dataset API .............................................................. 587

Debugging ............................................................... 709

Decode Table operator .............................. 364, 584

Deep learning ......................................................... 348

Default branch .............................................. 702, 703

Delivery team ........................................................ 708

Delta load ................................................................. 473

Deploy model ........................................................ 421

Deployment .............................................................. 63

cloud ..................................................................... 103

controller ............................................................... 81

custom operators ............................................ 286

decision making .................................................. 64

evolution ................................................................ 70

Kubernetes ..................................................... 73, 85

machine learning models ............................. 354

modular ............................................................... 151

on-premise ......................................................... 150

pipeline ................................................................ 448

pods .......................................................................... 79

stack.xml ............................................................. 162

traditional to container-based ...................... 70

URL ........................................................................ 429

version control system .................................. 674

Develop branch ..................................................... 703

Development environment ................................ 96

DevOps ........................................... 85, 327, 713, 715

design ................................................................... 716

hybrid architecture ......................................... 719

phases .................................................................. 716

six pillars ............................................................. 714

tools ............................................................. 718, 721

versus MLOps .................................................... 724

DI_DATA_LAKE .................................. 202, 367, 441

Diagnostic report ................................................. 621

download ............................................................ 623

structure .............................................................. 624

Diagnostics Grafana ..................................... 89, 631

cluster metrics ................................................... 633

dashboard .......................................................... 632

Diagnostics Kibana ........... 89, 181, 628, 631, 719

features ................................................................ 629

Digital transformation .............................. 737, 739

Dimension .............................................................. 577

Discovery dashboard .......................................... 195

Disk engine ............................................................. 667

sizing .................................................................... 668

2162.book Seite 769 Mittwoch, 22. September 2021 8:49 20

Index

770

Distributed data management ........................... 98

Distributed Logs (DLogs) ................................... 664

Docker .................................................. 46, 77, 79, 297

containers ................................................. 295, 298

create container ............................................... 691

images .............. 74, 80, 295, 297, 298, 302, 691

registry ................................................................. 152

use with Python ................................................ 305

Dockerfiles ............................................ 295, 297, 298

add to Python operator ................................. 346

best practices ..................................................... 711

build ....................................................................... 345

create ................................................. 298, 300, 345

create tags .......................................................... 301

inheritance .......................................................... 303

library installation .......................................... 345

Document Classification ...................................... 58

Document Information Extraction ........ 58, 747

Document store engine ........................... 518, 519

tables ..................................................................... 520

E

Eclipse ........................................................................ 396

ECMAScript ............................................................. 597

Elasticsearch ........................................................... 628

End of purpose ....................................................... 640

Endpoint ................................................ 134, 145, 643

Enterprise data .......................................................... 35

Enterprise information management

(EIM) ............................................................... 51, 544

Enterprise platform ............................................. 327

Enterprise strategy .................................................. 67

Entitlements ........................................................... 646

Entity relationship model ................................. 554

create .................................................................... 564

Epoch ......................................................................... 348

etcd ................................................................................ 76

Event .......................................................................... 277

Evolution ............................................................. 52, 58

Execution log .......................................................... 523

execution.json file ................................................ 625

Experience data ........................................................ 42

Expert sizing .............................................................. 95

Exploratory data analysis .................................. 381

input data set .................................................... 381

Export/import ....................................................... 186

Extended Machine Learning Library

(EML) ..................................................................... 311

Extended strategy ................................................. 603

eXtensible Access Control Markup

Language (XACML) .......................................... 649

Extension manager .............................................. 432

External logging service ..................................... 630

F

Fact model ................................................................ 555

Fact sheet .................................................................. 199

view .............................................................. 404, 464

Feasibility study ..................................................... 319

Feature branch ....................................................... 704

Feature release cycle .............................................. 67

File browser ............................................................. 432

File system ............................................................... 605

FileHandler class .................................................... 448

Files ................................................................... 186, 379

engine .................................................................... 667

Finance ...................................................................... 746

Flow-based programming ................................. 238

Fluentd ............................................................ 628, 630

Food storage and maintenance ....................... 745

Force build ............................................................... 302

Freestyle project .................................................... 692

Full-text search ....................................................... 542

G

Garbage collection ........................................ 82, 710

Git ................................................................................ 674

/vflow folder ....................................................... 687

best practices ...................................................... 677

branching .................................................. 701, 702

commands ................................................. 675, 676

enable client ....................................................... 680

environment ....................................................... 679

file statuses ......................................................... 678

generate token ................................................... 683

integration ...................................... 678, 698, 728

repository ..................... 682, 685, 688, 700, 705

set up environment .......................................... 697

workflows .................................................. 697, 699

GitHub ................................................... 333, 682, 719

Jupyter ................................................................... 374

repository ............................................................ 685

trigger .................................................................... 695

GitOps .......................................................................... 85

approach ................................................................ 73

Global account ........................................................ 390

Glossary category .................................................. 228

Glossary metrics .................................................... 195

Google BigQuery .................................................... 508

Google Cloud Platform ....................................... 109

connect ................................................................. 124

2162.book Seite 770 Mittwoch, 22. September 2021 8:49 20

771

Index

Google Cloud Platform (Cont.)

console URL ........................................................ 114

quota error .......................................................... 110

register ................................................................. 120

sizing ..................................................................... 109

Google Cloud Storage .......................................... 506

Governance ............................................... 39, 61, 193

GPU support ........................................................... 167

Gradient boost classifier .................................... 410

Graph ............................................ 180, 239, 244, 248

categories ............................................................ 246

create .......................................................... 245, 413

dead instance ..................................................... 619

diagnostics ................................................ 621, 623

editor ..................................................................... 240

engine ......................................................... 518, 519

execute ................................................................. 248

inference .............................................................. 357

leverage data types ......................................... 293

monitor ................................................................ 616

operators ................................................... 239, 247

process logs ........................................................ 249

Push to SAP Analytics Cloud ....................... 582

reuse ...................................................................... 261

section .................................................................. 241

statuses ................................... 248, 287, 617, 618

templates ................................................... 330, 415

training ................................................................ 352

validate ................................................................ 248

Graph snippet ......................................................... 262

create .................................................................... 264

operators ............................................................. 262

types ...................................................................... 262

Graph Terminator operator .......... 239, 245, 446,

494, 710

graphs.json file ....................................................... 624

Grid search .............................................................. 321

Guided vendor onboarding .............................. 744

H

Hadoop ............................................. 43, 45, 178, 518

hana_ml .................................................................... 399

Handshake ............................................................... 123

Hash partitioning ................................................. 526

hdbcli ......................................................................... 398

Helm .......................................................................... 296

install .................................................................... 296

Hibernation ............................................................. 755

Hierarchical tagging ............................................ 207

Hierarchy ....................................................... 535, 536

build using adjacency tables ....................... 539

Hierarchy (Cont.)

caching ................................................................ 539

SQL for data analysis ..................................... 537

Hold out data set .................................................. 349

Home page .............................................................. 170

Horizontal scaling ................................................... 85

Hot data .................................................................... 388

Hotfix branch ........................................................ 704

development ...................................................... 705

Human resources ................................................. 742

Hybrid data processing ......................................... 34

Hybrid data virtualization ................................ 478

Hybrid landscape .......................................... 36, 720

Hyperparameter grid search technique ...... 321

Hyperscale data ..................................................... 339

Hyperscaler ............................................. 47, 387, 389

Hypervisor ................................................................. 71

I

IBM Storage ............................................................ 509

Ideate phase ............................................................ 717

Image composer ...................................................... 85

Implicit access ....................................................... 440

Industry .................................................................... 740

use cases .............................................................. 741

Inference model .................................................... 421

Inference pipeline ....................................... 169, 355

deploy ................................................................... 357

Information Access (InA) protocol ................ 478

Ingress ....................................................................... 165

controller ............................................................ 165

Inheritance .............................................................. 303

Initial load ............................................................... 473

Initial sizing ........................................................ 95, 97

Inner join ................................................................. 537

Innovations ............................................................ 754

Installation ................................................ 64, 94, 154

download and initialize SLC Bridge ......... 155

on-premise ......................................................... 150

postinstallation configuration .................. 165

prepare Kubernetes environment ............. 154

prerequisites ...................................................... 150

run maintenance planner ............................ 158

test ......................................................................... 167

troubleshooting ............................................... 164

use SLC Bridge Base ........................................ 160

Instance .................................................................... 103

active .................................................................... 106

backup ................................................................. 112

basic versus advanced modes .................... 125

create ........................................................... 105, 124

2162.book Seite 771 Mittwoch, 22. September 2021 8:49 20

Index

772

Instance (Cont.)

details ................................................................... 126

monitoring ............................................... 144, 618

restore ................................................................... 113

SAP BTP cockpit ................................................ 390

sizing ..................................................................... 107

status .................................................................... 134

terminate ............................................................. 113

Integration .................................................................. 34

ABAP ...................................................................... 459

cloud best practices ......................................... 710

Git ........................................................................... 678

Google Cloud Storage ..................................... 506

non-SAP systems .................................... 497, 510

SAP Analytics Cloud ........................................ 571

SAP BW/4HANA ................................................ 478

SAP Data Warehouse Cloud ......................... 543

SAP HANA for SQL data warehousing ..... 489

SAP Information Steward ................... 485, 488

SAP Vora .............................................................. 515

third-party .......................................................... 757

use cases .............................................................. 460

Integrity constraint .............................................. 524

Intelligent enterprise ..................................... 33, 41

Intelligent information management ............ 38

Intelligent robotic process

automation (iRPA) .............................................. 55

Intelligent suite ........................................................ 41

Intelligent technologies ................................ 40, 43

Interactive Python (IPython) ................. 373, 374

clusters ................................................................. 380

widgets ................................................................. 375

Interquartile range ..................................... 378, 383

Invoice Object Recommendation ........... 58, 311

IP address ................................................................. 127

IPython Parallel package .................................... 380

IT personnel ............................................................ 322

J

JavaScript ................................................................. 597

Jenkins ....................................................................... 690

access .................................................................... 691

build ....................................................................... 695

build output ....................................................... 695

create freestyle project ................................... 692

Jira ............................................................................... 719

Job .................................................................................. 82

manage ................................................................ 620

Jump box .................................................................. 128

access .................................................................... 139

external IP address .......................................... 135

Jump box (Cont.)

import session .................................................... 142

set up ..................................................................... 136

status ..................................................................... 134

Jupyter Notebook ............... 44, 46, 311, 328, 332,

347, 373

access artifacts from JupyterLab ................ 434

basics ..................................................................... 374

connect to SAP HANA Cloud .............. 398, 400

create ........................................................... 336, 405

create file ............................................................. 380

dashboard ........................................................... 379

data analysis ...................................................... 381

data science experiments .............................. 405

install IPython widgets .................................. 375

optimizer and loss functions ....................... 348

run via SAP Business Application

Studio ..................................................... 396, 398

set the context ................................................... 440

start ........................................................................ 379

working with SAP HANA Cloud .................. 386

write data into SAP HANA Cloud ............... 400

JupyterLab ......................... 336, 343, 430, 431, 441

access Jupyter Notebook artifacts ............. 434

completer ............................................................. 435

create experiment ............................................ 405

discover extensions ......................................... 433

features ................................................................. 431

output views ....................................................... 435

web interface ...................................................... 430

K

Kafka Consumer operator ................................... 61

Kafka Producer operator ...................................... 61

Keras ........................................................................... 347

Kernel ......................................................................... 432

Kibana Query Language (KQL) ......................... 629

kube .............................................................................. 77

kube-apiserver .......................................................... 76

kubeconfig ............................................................... 154

commands ........................................................... 155

Kubectl ...................................................... 76, 154, 662

Kubelet ........................................................................ 77

Kube-proxy ................................................................ 78

Kubernetes .................................. 45, 46, 68, 83, 719

advantages ........................................................... 85

best practices ...................................................... 711

cluster metrics ................................................... 633

cluster-level logging ........................................ 627

clusters ...... 68, 75, 85, 108, 129, 151, 295, 516

critical factors ...................................................... 72

2162.book Seite 772 Mittwoch, 22. September 2021 8:49 20

773

Index

Kubernetes (Cont.)

dashboard ........................................................... 273

distributions ....................................................... 151

features ................................................................... 74

get nodes ............................................................. 140

overview .................................................................. 68

package managers .......................................... 296

prepare environment ..................................... 154

SAP Vora .............................................................. 516

security ................................................................. 657

sizing ..................................................................... 108

supported versions ............................................. 64

upgrade ................................................................ 144

L

Label ........................................................................... 453

Landscape sizing ...................................................... 93

Launchpad .................................................................. 86

access .................................................................... 147

add applications ............................................... 170

applications ........................................................ 169

home screen ....................................................... 170

personalize ............................................................. 87

Legislation ............................................................... 640

Library ....................................................................... 337

client-side ............................................................ 436

external ................................................................ 433

import ......................................................... 339, 399

install .................................................................... 345

Jupyter Notebook ............................................. 375

JupyterLab ........................................................... 431

machine learning tracking SDK ................. 361

plotting ................................................................ 377

License key .................................................... 189, 611

install .................................................................... 167

permanent .......................................................... 612

License Management ................................ 188, 611

system licenses .................................................. 612

Licensing ................................................. 64, 189, 611

Limit range ................................................................. 83

Line chart ................................................................. 377

Lineage analysis .................................................... 230

extract lineage .................................................. 232

view ........................................................................ 234

Lineage depth ......................................................... 214

Log ............................................................................... 627

aggregate ............................................................ 630

browse .................................................................. 629

message ............................................................... 628

metrics .................................................................. 452

Log file ...................................................................... 141

copy ....................................................................... 141

M

Machine learning ......... 40, 46, 55, 309, 310, 328

approaches ......................................................... 319

architectural principles ................................. 325

artifacts ...................................................... 178, 368

business use cases ........................................... 318

content .................................................................... 62

core operators ................................................... 413

data and algorithms ...................................... 331

data-driven ........................................................ 320

embedded applications .................................... 52

embedded in SAP HANA ...................... 406, 437

features ................................................................ 314

framework .......................................................... 327

migrate models ................................................ 732

model lifecycle .................................................. 724

models .................................................................. 324

object storage ................................................... 367

open-source environments .......................... 331

operations ....................................... 325, 328, 723

personas .............................................................. 322

solutions .............................................................. 311

tasks ............................................................. 321, 323

techniques .......................................................... 386

TEI methodology ............................................. 313

train and deploy models ............................... 350

workflow ............................................................. 430

Machine learning scenario ............................... 440

associate Dockerfiles ...................................... 345

create .................................................................... 335

create version .................................................... 353

display history .................................................. 354

Python SDK ........................................................ 444

retrieve metadata ............................................ 443

templates ............................................................ 340

upload data sets ............................................... 366

versions ................................................................ 350

Machine learning tracking SDK ............ 417, 419,

439, 450

collect metrics ................................................... 361

functions ............................................................. 452

use as a wrapper .............................................. 451

Main engine ............................................................ 289

Maintenance .......................................... 64, 331, 661

backups ................................................................ 668

Kubernetes ............................................................ 73

models .................................................................. 325

persistent volume size ................................... 665

2162.book Seite 773 Mittwoch, 22. September 2021 8:49 20

Index

774

Maintenance (Cont.)

restart services .................................................. 664

switch to maintenance mode ...................... 662

Maintenance planner ................ 64, 154, 158, 160

manifest.json file ........................................ 697, 700

Manufacturing ....................................................... 742

use case ............................................. 743, 748, 749

Markdown ................................................................ 380

command ............................................................ 374

Master branch ........................................................ 703

Master node ....................................................... 69, 75

components ........................................................... 75

Match Pattern operator ...................................... 220

Materialized view .................................................. 539

Measure .......................................................... 555, 577

Medical supply ordering .................................... 750

Memory calculator .................................................. 64

Memory usage ....................................................... 195

Metadata Catalog ..................................................... 45

Metadata Explorer ........................ 61, 89, 174, 194

ABAP ............................................................ 460, 461

browse connections .............................. 197, 256

business glossary ............................................. 228

connections ........................................................ 367

create folders ..................................................... 202

data profiling ..................................................... 197

data set actions ................................................ 204

import rules ........................................................ 485

lineage analysis ................................................ 231

manage preparation tasks ........................... 258

manage publications ............................ 202, 211

manage tags ...................................................... 208

rulebooks ............................................................. 221

rules ....................................................................... 215

self-service data preparation ...................... 255

tiles ......................................................................... 174

upload data ........................................................ 366

view fact sheet ................................ 199, 404, 464

view SAP HANA table ..................................... 403

Metric Overview dashboard ................... 273, 633

Metrics

graph ..................................................................... 249

history ................................................................... 453

Metadata Explorer .......................................... 195

run .......................................................................... 451

Metrics Explorer .............. 273, 360, 419, 439, 451

access .................................................................... 360

dashboard ........................................................... 360

notebook experiment ..................................... 362

Metrics Tracking API ........................................... 312

Microservices ............................................................ 74

Microsoft Azure ........................................... 109, 502

connect ................................................................. 123

console URL ......................................................... 114

data access .......................................................... 506

quota error .......................................................... 110

register .................................................................. 119

sizing ..................................................................... 109

Microsoft Azure Data Lake Storage

(ADLS) .................................................................... 503

Microsoft Azure SQL Data Warehouse ......... 502

Microsoft SQL Server ........................................... 512

Microsoft Visual Studio ...................................... 396

Migration .................................................................. 730

models ................................................................... 731

training data ...................................................... 733

Minimum sizing ............................................... 95, 96

Kubernetes clusters .......................................... 108

MinIO ......................................................................... 734

Missing value .......................................................... 384

analysis ................................................................. 324

ML Data Manager ........................................ 365, 441

ML Scenario Manager ..... 62, 177, 311, 328, 333,

405, 440

deploy pipelines ................................................ 357

executions ........................................................... 352

integrate data sources .................................... 340

metrics .................................................................. 353

overview ............................................................... 333

register data sets ............................................... 370

set up a scenario ............................................... 334

templates ................................................... 340, 428

test machine learning models ..................... 359

training pipeline execution .......................... 351

use case ................................................................. 334

ML Training operator .......................................... 420

MLOps ....................................................... 85, 328, 723

capabilities .......................................................... 727

challenges ............................................................ 726

stages of maturity ............................................ 727

versus DevOps .................................................... 724

Model ......................................................................... 578

deploy .................................................................... 354

deployment service .......................................... 732

drift ......................................................................... 730

execute .................................................................. 352

ideate phase ........................................................ 717

import ................................................................... 562

maintenance ...................................................... 325

metrics .................................................................. 273

migration ............................................................. 732

name ...................................................................... 352

production lifecycle ......................................... 724

2162.book Seite 774 Mittwoch, 22. September 2021 8:49 20

775

Index

Model (Cont.)

repository ..................................................... 56, 732

tab .......................................................................... 333

test ......................................................................... 359

train ............................................................. 349, 411

Model Serving operator ..................................... 421

inputs and outputs .......................................... 422

Modeler ...................... 45, 61, 84, 89, 175, 237, 413

ABAP integration ............................................. 472

configuration ..................................................... 240

container registry ............................................ 167

Dockerfiles .......................................................... 298

download diagnostics .................................... 623

graphs ................................................................... 244

monitoring ............................................... 180, 616

navigate ............................................................... 240

operators .......................................... 242, 276, 413

push data to SAP Analytics Cloud ............. 363

SAP Analytics Cloud ........................................ 578

SAP BW operators ............................................ 484

schedule data pipelines ................................. 270

subengines .......................................................... 288

text analysis ....................................................... 540

trace messages ........................................ 272, 622

ModelStorage library ........................................... 412

Modular deployment .......................................... 151

Monitor tile ................................................... 175, 195

Monitoring ...................................... 73, 89, 180, 617

data pipelines .................................................... 270

data preparation .............................................. 260

diagnostics .......................................................... 624

graphs ......................................................... 352, 357

instance ................................................................ 144

Metadata Explorer .......................................... 195

Modeler ...................................................... 180, 616

profiling ............................................................... 198

rulebooks ............................................................. 226

Multicloud hybrid deployment ......................... 63

Multicontainer pod ................................................. 80

Multitier landscape ................................................. 73

N

Native SAP HANA storage extension ............ 388

Nested applications ............................................. 610

Nested policy .......................................................... 651

Net present value (NPV) ..................................... 314

Network and communication security ....... 107

Networked workforce ......................................... 739

Node controller ........................................................ 77

Node Overview dashboard ...................... 273, 634

NodeJS Multiplexer operator ........................... 290

Non-SAP system ................................................... 497

cloud connectivity ........................................... 497

on-premise ......................................................... 510

Notebook .............................................. 333, 362, 434

create .................................................................... 336

NOTROOT installer .............................................. 396

O

Object store type .................................................. 509

OData services ....................................................... 266

On-premise deployment ............................... 63, 64

installation ......................................................... 150

Open Database Connectivity (ODBC)

driver .................................................................... 509

Open Policy Agent (OPA) ................................... 649

Open VSX Registry ............................................... 396

OpenAPI Servlow operator ............................... 355

Open-source environment ............................... 331

Open-source programming language ......... 332

Operational data ...................................................... 42

Operational data processing (ODP) ............... 474

Operational mode ................................................ 661

Operator ......................................... 56, 100, 239, 242

ABAP ..................................................................... 472

add ports ............................................................. 277

built-in .............................................. 242, 275, 330

categories ........................................................... 243

cloud services .................................................... 266

compatability check ....................................... 279

configuration ........................................... 243, 284

connectivity ....................................................... 329

create ................................................. 267, 281, 282

custom .............................................. 267, 275, 282

data workflow ................................................... 264

documentation ................................................. 276

edit ......................................................................... 287

events ................................................................... 277

graph snippets .................................................. 262

groups .................................................................. 262

hyperscale data ................................................ 340

machine learning ................................... 328, 413

ports ...................................................................... 283

runtime ......................................................... 84, 288

SAP Analytics Cloud .............................. 363, 582

SAP BW ................................................................. 484

section .................................................................. 241

tags ........................................................................ 284

Oracle ........................................................................ 510

Orchestration ............................................................ 61

Outer join ................................................................ 538

Outlier ....................................................................... 384

2162.book Seite 775 Mittwoch, 22. September 2021 8:49 20

Index

776

Outlook ........................................................... 753, 759

Overall equipment effectiveness (OEE) ....... 743

P

Package manager ............................... 296, 302, 375

pandas .......................................... 303, 306, 381, 399

Parent Strategy ...................................................... 603

Partition scheme ......................................... 525, 528

Password .................................................................. 130

Performance metrics ........................ 324, 353, 411

Permissions ............................................................. 115

Google Cloud Platform .................................. 121

Persist run ................................................................ 453

Persistence layer ................................................... 326

Persistent volume ................................................ 665

error handling ................................................... 665

scale up ................................................................ 666

Persona ....................................................... 54, 86, 172

machine learning ............................................. 322

Personal access token ......................................... 682

generate ............................................................... 683

Personal data .......................................................... 640

privacy .................................................................. 641

Personalization ........................................ 73, 87, 149

applications ........................................................ 172

pip ............................................................................... 302

Pipeline ................ 49, 89, 175, 237, 239, 288, 325,

329, 413

advantages ......................................................... 329

best practices ..................................................... 709

CI/CD ..................................................................... 690

create .................................................................... 340

create with template ....................................... 428

data transfer ...................................................... 484

deploy with Python SDK ................................ 447

engine ...................................................................... 85

improvements ................................................... 758

inference .................................................... 169, 354

interact via APIs ............................................... 344

modeling ................................................................. 61

runtime behavior ................................................ 84

schedule ............................................................... 270

tab .......................................................................... 333

TensorFlow ......................................................... 347

test ......................................................................... 359

training ...................................................... 351, 444

versus Data Builder ......................................... 570

Planning ................................................................... 572

Platform

core ........................................................................ 151

extended .............................................................. 151

Platform (Cont.)

full stack ............................................................... 151

Plotly .......................................................................... 377

Pod .......................... 68, 69, 76, 78, 79, 82, 295, 665

deployment options .......................................... 79

security ................................................................... 83

troubleshooting ................................................ 165

Policy ........................................................... 81, 83, 650

assign .................................................................... 654

AWS ........................................................................ 115

categories ............................................................ 652

create ..................................................................... 655

custom ........................................................ 184, 655

list ........................................................................... 183

manage ................................................................. 649

nested .................................................................... 651

predelivered policies ........................................ 650

users ....................................................................... 605

Policy decision point (PDP) ............................... 649

Policy Management ......................... 170, 182, 649

assign policies .................................................... 654

create custom policies .................................... 655

Port type ............................................... 277, 279, 283

Position hierarchy ................................................ 536

Postman .......................................................... 355, 359

Prediction ................................................................. 573

result ...................................................................... 356

Predictive Analysis Library (PAL) .......... 311, 407

output tables ...................................................... 408

prerequisites ....................................................... 407

procedures ........................................................... 408

Predictive pricing .................................................. 750

Privacy-Enhanced Mail (PEM) .......................... 132

Private cloud deployment ................................... 63

Private key ...................................................... 132, 137

Privileges .................................................................. 184

select ............................................................ 480, 483

Process Chain operator ....................................... 481

Process Data operator ......................................... 239

Process Executor operator ................................ 289

Process ID-based limits and reservations ..... 83

Production environment ........................... 96, 701

Profile ......................................................................... 172

fact sheet .............................................................. 199

Profiling .................................................................... 198

metrics .................................................................. 195

Progress flow ........................................................... 352

Project ........................................................................ 712

Prometheus ............................................................. 631

expose data ......................................................... 636

federation ............................................................ 636

third-party integration .................................. 636

2162.book Seite 776 Mittwoch, 22. September 2021 8:49 20

777

Index

Public APIs ............................................................... 758

Public hyperscaler deployment ......................... 63

Public sector ............................................................ 742

Public user ID ......................................................... 101

Published data set ...................................... 202, 205

Pull request ............................................................. 678

PuTTY ............................................................... 136, 138

PuTTY Secure Copy client (PSCP) .................... 144

PuTTYgen

files ......................................................................... 137

Python ................................. 305, 332, 337, 374, 375

execute script in terminal ............................. 397

install packages ................................................ 398

libraries ................................................................ 377

operator ............................................................... 290

set up in SAP Business Application

Studio ............................................................... 396

Python API ............................................................... 312

Python Client API ................................................. 436

Python Consumer template ................... 354, 416

Python Producer operator ................................ 364

Python Producer template ............ 341, 415, 418

Python SDK ................................................... 312, 439

create pipelines ................................................. 444

execute and deploy pipelines ...................... 447

Jupyter Notebook ............................................. 440

methods ............................................................... 448

read data ............................................................. 442

templates ............................................................. 444

Python3 operator ........... 305, 307, 346, 415, 418

configure ............................................................. 356

Q

Quick link .................................................................... 88

Quick Sizer tool ......................................................... 95

R

R ................................................................................... 332

R Client operator ................................ 305, 308, 333

Range partitioning ............................................... 526

Raw NBConvert ...................................................... 380

Read File operator .......... 243, 247, 428, 443, 499,

504, 583

Recipe ........................................................................ 257

Red Hat OpenShift ......................................... 46, 332

Redshift SQL Consumer operator .................. 501

Redshift Table Consumer operator ............... 501

Reference object .................................................... 440

Regression ............................................................... 425

Relational database management

system (RDBMS) ............................................... 511

Relational disk engine ............................... 518, 667

Relational in-memory engine ................ 518, 667

Relationship ........................................................... 201

terms ..................................................................... 229

Release management ......................................... 754

Remote connection ............................................. 664

Remote function call (RFC) ............................... 467

Remote table .......................................................... 561

ReplicaSet ..................................................... 78, 81, 82

Replication controller ............................................ 82

Repository ............................................... 84, 241, 290

ABAP ..................................................................... 460

create folders .................................. 245, 281, 299

Repository-based shipment channel

(RBSC) ................................................................... 152

Resampling ............................................................. 385

Resource ..................................................... 81, 85, 649

management ........................................................ 85

quota .......................................... 83, 183, 655, 709

sizing ....................................................................... 96

types ............................................................. 649, 656

REST API ................................................................... 636

Retail .......................................................................... 742

Retention period ......................................... 640, 669

Return on investment (ROI) ................... 314, 317

Reusability ................................................................. 73

Roadmap .................................................................. 753

explorer ............................................................... 758

Roadmap Explorer ............................................... 753

Route controller ....................................................... 77

Rule ............................................................................ 214

bind .............................................................. 223, 488

categories ........................................ 215, 219, 487

create .................................................................... 216

create new categories .................................... 219

dashboards ........................................................ 226

import .................................................................. 222

parameters ......................................................... 217

SAP Information Steward ............................ 485

test ......................................................................... 217

Rule-based selection ........................................... 319

Rulebook ......................................................... 214, 221

create .................................................................... 222

monitor ................................................................ 226

recently run ........................................................ 195

rule bindings ...................................................... 223

run ......................................................................... 225

thresholds ........................................................... 225

Rules tile .......................................................... 175, 215

Run collection ............................ 360, 363, 419, 451

2162.book Seite 777 Mittwoch, 22. September 2021 8:49 20

Index

778

Run level ................................................................... 661

Runtime environment ....................................... 297

Runtime operator ................................................. 288

S

SAP ABAP operator ............................................... 474

SAP Agile Data Preparation ................................. 45

SAP AI Business Services ........ 46, 53, 55, 57, 311

evolution ................................................................ 56

ready-to-use scenarios ...................................... 58

SAP Analytics Cloud ....... 363, 544, 566, 571, 719

add OAuth client .............................................. 588

connectivity ........................................................ 587

create connections .......................................... 575

data import ........................................................ 576

data modeling ................................................... 577

functions .............................................................. 572

operators ............................................................. 582

SAP Data Warehouse Cloud ......................... 566

stories ................................................................... 579

SAP Analytics Cloud Formatter

operator ..................................................... 364, 585

SAP Analytics Cloud Producer

operator ............................................ 364, 586, 589

SAP BTP cockpit .................................. 388, 389, 644

services ....................................................... 390, 394

SAP Business Application Studio ......... 388, 393

create a project ................................................. 395

extend Python to Jupyter Notebook ......... 398

open ....................................................................... 395

run Jupyter Notebook ..................................... 396

set up Python ..................................................... 396

SAP Business Technology Platform

(SAP BTP) ........................ 40, 103, 251, 373, 387,

389, 393, 467, 756

connect with the cloud connector ............. 469

connectors ............................................................. 60

explore .................................................................. 389

features ................................................................... 42

on-premise connection .................................. 470

user account authentication ....................... 644

SAP Business Warehouse (SAP BW) ........ 47, 478

prerequisites ....................................................... 478

user authorization ........................ 481, 483, 484

SAP BW Process Chain operator ..................... 484

SAP BW/4HANA ..................................................... 478

data consumption ........................................... 482

prerequisites ....................................................... 478

user authorization ........................................... 481

SAP Cash Application .......................................... 747

SAP Cloud Appliance Library ......... 59, 93, 97, 99

backup .................................................................. 112

cloud providers .................................................. 109

connect ................................................................. 122

costs ....................................................................... 102

create instances ................................................ 105

deploy solutions ................................................ 103

prerequisites ....................................................... 114

register .................................................................. 101

run solution ........................................................ 145

security ................................................................. 106

set up SAP Data Intelligence ........................ 113

sizing ..................................................................... 108

SAP Community .................................................... 104

SAP Continuous Integration and

Delivery ................................................................ 712

SAP Conversational AI .................................... 52, 53

SAP Data Hub ..................................................... 52, 58

SAP Data Intelligence ............. 39, 45, 51, 55, 311,

545, 600

access through browser ................................. 148

add Visual Studio Code .................................. 680

administration .................................................. 595

administrator access ....................................... 600

application lifecycle management ............ 673

applications ................................................ 88, 169

architecture ........................................................... 60

capabilities ............................................................ 48

connect to SAP HANA Cloud ........................ 402

core components ................................................ 61

create solution instance ................................ 124

data sources .......................................................... 47

deployment options .......................................... 63

evolution ......................................................... 56, 58

features ................................................................... 44

genesis ..................................................................... 52

installation ................................................ 154, 160

integrate with business processes ............... 46

Kubernetes ............................................................ 83

launchpad ........................................... 86, 147, 169

libraries ................................................................. 337

log on ..................................................................... 146

machine learning ............................................. 328

maintenance ...................................................... 661

migration ............................................................. 731

objectives ............................................................... 48

on-premise .......................................................... 150

outlook .................................................................. 753

overview ................................................................. 43

personalize .......................................................... 149

recent innovations ........................................... 754

restart services ................................................... 664

2162.book Seite 778 Mittwoch, 22. September 2021 8:49 20

779

Index

SAP Data Intelligence (Cont.)

security ................................................................. 639

setup ............................................................... 93, 113

trial edition ............................................................ 59

versus SAP Data Warehouse Cloud ........... 570

SAP Data Intelligence Cloud ............... 46, 64, 600

SAP Data Services ................................. 47, 754, 760

SAP Data Warehouse Cloud ........... 543, 545, 760

connection types .............................................. 561

create connections .......................................... 558

create database user ....................................... 559

create spaces ...................................................... 550

data visualization ............................................ 566

develop artifacts ............................................... 554

generate password .......................................... 547

landing page ...................................................... 549

SAP Analytics Cloud ........................................ 566

set up trial tenant ............................................ 546

SAP Distribution for Hadoop .............................. 45

SAP Fiori ................................................................... 395

SAP Gateway ........................................................... 481

SAP HANA ..................... 46, 47, 311, 373, 386, 515

access external view ....................................... 483

connection type ................................................ 480

create data frame ............................................ 401

data lakes ............................................................ 757

embedded machine learning .... 406, 425, 437

engine ................................................................... 388

machine learning libraries ........................... 407

Python API .......................................................... 312

Python Client API ............................................. 436

Python libraries ................................................ 328

SAP Vora .............................................................. 518

smart data integration .................................. 558

table .................................................... 342, 401, 404

tools ....................................................................... 388

user authorization ........................................... 483

Wire protocol ..................................................... 167

SAP HANA Client operator ................................ 342

SAP HANA Cloud ..... 41, 335, 373, 386–389, 544

architecture ........................................................ 388

central tool ......................................................... 391

connect ................................................................. 339

connect to Jupyter Notebook ............ 398, 400

connect to SAP Data Intelligence .............. 402

data sources ....................................................... 389

enable script server ......................................... 407

instance ................................................................ 388

preview data ...................................................... 336

read data into Jupyter Notebook ............... 402

trial account ....................................................... 390

SAP HANA cockpit ...................................... 388, 391

SAP HANA data warehousing

foundation ......................................................... 489

SAP HANA database explorer ....... 388, 391, 392

catalog ................................................................. 392

extract properties ............................................ 399

show tables ........................................................ 401

SAP HANA for SQL data warehousing .......... 489

prerequisites ...................................................... 489

transfer data from ........................................... 493

transfer data into ............................................ 490

SAP HANA ML Inference operator ....... 425, 428

configuration parameters ........................... 426

inputs and outputs ......................................... 427

SAP HANA ML Training operator ................... 423

SAP HANA Wire protocol ......................... 520, 643

SAP Information Steward ..... 215, 485, 754, 760

use case ................................................................ 745

SAP Intelligent Robotic Process

Automation (SAP Intelligent RPA) .............. 52

SAP Landscape Transformation

Replication Server .................................. 460, 465

operator ............................................................... 474

SAP Leonardo Artificial Intelligence ................ 55

SAP Leonardo Machine Learning

Foundation .......................................... 52–55, 730

evolution ................................................................ 56

feature comparison ........................................ 731

features ................................................................... 56

models .................................................................. 731

training data ..................................................... 733

SAP Model Company .......................................... 102

SAP NetWeaver ...................................................... 478

SAP S/4HANA .................................. 46, 54, 461, 467

ABAP system ...................................................... 472

SAP S/4HANA Cloud ........................................... 467

SAP Service Marketplace ................................... 101

SAP Vora ......................... 45, 46, 178, 340, 515, 518

access .................................................................... 520

application ............................................................ 89

create tables ............................................. 527, 529

create views ....................................................... 530

dashboard .......................................................... 634

data modeling .................................................. 524

data preview ...................................................... 521

DLog ...................................................................... 530

engines .............................................. 517, 518, 667

full-text search .................................................. 540

hierarchies ................................................. 536, 537

partition tables ................................................. 526

persistent storage ............................................ 667

sizing ...................................................... 96, 99, 668

transaction coordinator ...................... 167, 643

2162.book Seite 779 Mittwoch, 22. September 2021 8:49 20

Index

780

SAP Vora (Cont.)

use SQL Editor .................................................... 522

SAP Vora Client operator ................................... 541

SAP Vora Deployment operator ..................... 667

sapdi library ............................................................ 361

SAProuter ................................................................. 664

Scalability ................................................... 73, 85, 330

Scalar type ...................................................... 278, 291

Scatter plot .................................................... 375, 383

Schedule details ..................................................... 131

Schedule-based retraining ................................ 730

Scheduled job ......................................................... 620

Scheduled publication ........................................ 574

Scheduler .................................................................... 77

scikit-sklearn ........................................................... 386

scipy.stats ................................................................. 286

Scorecard wizard ................................................... 226

Secret ......................................................................... 608

Secret key ................................................................. 118

Secure communication channel ....................... 56

Security ..................................................................... 639

data protection and privacy .............. 639, 641

on-premise connectivity ................................ 658

SAP Cloud Appliance Library ....................... 106

user authentication ......................................... 642

Segmentation ......................................................... 541

Selenium .................................................................. 719

Semantic analysis ....................................... 541, 552

Semantic Data Lake (SDL) ..... 202, 250, 367, 509

Separation by purpose ........................................ 642

Sequential class ..................................................... 349

Sequential neural network ............................... 348

Service account ............................................ 121, 124

Service controller ..................................................... 77

Service provider ....................................................... 56

Service Ticket Intelligence ................................... 58

Service user ............................................................. 114

Service-level agreement (SLA) ......................... 102

Shared access signature (SAS) token ............. 506

Shared tenant ......................................................... 756

Single container pod .............................................. 79

Single exponential smoothing ........................ 409

Sizing ........................................................... 64, 93, 124

calculator ....................................................... 65, 67

installation ............................................................ 94

instances .............................................................. 107

minimum ............................................................... 96

persistent volume ............................................ 665

SAP HANA Cloud .............................................. 389

SAP Vora .............................................................. 668

System Management ...................................... 610

t-shirt approach ................................................... 99

Sizing (Cont.)

virtual machines ............................................... 128

sklearn ....................................................................... 384

SLC Bridge ................................................................. 151

deploy stack.xml ............................................... 161

expert mode configuration .......................... 156

initialize ................................................................ 155

run modes ............................................................ 156

SLC Bridge Base .................................. 152, 158, 164

SLT Connector operator ...................................... 474

Smart discovery ..................................................... 581

Smart predict .......................................................... 573

Snapshot ................................................................... 675

Software development kit (SDK) ..................... 439

Solution ................................................. 100, 103, 697

activated .............................................................. 106

develop .................................................................. 699

file ........................................................................... 602

password .............................................................. 130

run .......................................................................... 145

type ............................................................... 101, 103

Source code management ................................. 693

Space ................................................................. 390, 549

add users .................................................... 551, 556

auditing ................................................................ 559

classification ...................................................... 560

create ........................................................... 550, 556

create database users ..................................... 559

develop artifacts ............................................... 554

manage ................................................................. 556

priority .................................................................. 556

security ................................................................. 557

Spark ........................................................................... 518

SQL Console ................................................... 391, 392

SQL Editor ....................................................... 178, 522

SQL scripts ................................................................ 523

Stable branch .......................................................... 703

stack.xml ................................................................... 158

deploy .......................................................... 161, 164

Stakeholders meeting .......................................... 318

Standard connector ................................................ 60

StatefulSet .................................................................. 82

Statistical modeling ................................... 384, 386

Statistics .................................................................... 382

Stemming ................................................................. 541

Story ........................................................................... 579

add objects .......................................................... 581

builder ................................................................... 566

Strategy ........................................................... 384, 602

Streaming ................................................................... 61

table ....................................................................... 530

Structure type ............................................... 278, 291

2162.book Seite 780 Mittwoch, 22. September 2021 8:49 20

781

Index

Structured File Consumer operator .... 491, 504

Structured File Producer operator ................. 504

Subaccount .............................................................. 390

configure entitlements .................................. 646

create .................................................................... 644

mapping .............................................................. 469

Subengine ............................................. 275, 285, 288

advantages ......................................................... 289

create custom operators ............................... 289

Subject matter expert ......................................... 322

Submit Metrics API .............................................. 419

Submit Metrics operator ................................... 417

inputs and outputs .......................................... 418

Subnet ....................................................................... 130

Subscription .................................................. 104, 119

ID .................................................................. 119, 123

Sub-select ................................................................. 534

Supervised technique ......................................... 386

Supply chain ................................................. 742, 747

SUSE Linux Enterprise Server (SLES) ............. 302

System administrator ...................... 182, 184, 186

access .................................................................... 600

commands .......................................................... 598

maintenance ...................................................... 661

System diagnostics .............................................. 631

System logging ............................................ 181, 626

System Management ......... 46, 91, 184, 595, 600

access control .................................................... 642

applications .............................................. 187, 608

CI/CD ..................................................................... 690

cluster admin view .......................................... 601

command-line client ....................................... 595

expose ................................................................... 165

files ......................................................................... 605

login ....................................................................... 597

my workspace .................................................... 605

persistent volume size .................................... 665

SAP Vora .............................................................. 516

services ................................................................. 516

sizing ..................................................................... 610

tasks ...................................................................... 601

tenants ................................................................. 163

users ................................................... 604, 642, 698

System tenant ........................................................ 163

T

Table ................................................................. 527, 563

catalog .................................................................. 528

create in-memory ............................................ 527

create using disk engine ................................ 529

details ................................................................... 527

Table (Cont.)

partitioning ....................................................... 525

types ............................................................. 278, 529

Table Consumer operator ................................. 493

Table Producer operator .................................... 492

Table-based replication ...................................... 460

Tag .............................................................................. 101

automatic ........................................................... 207

automatic inheritance .................................. 305

create .................................................................... 346

Dockerfiles .......................................................... 301

hierarchy .......................................... 196, 207, 210

manual ................................................................ 208

operators ............................................................. 284

search filters ...................................................... 211

set ........................................................................... 453

usage .................................................................... 196

Target

column ................................................................. 411

value ..................................................................... 411

Template

create pipeline .................................................. 428

graphs .................................................................. 415

inference .............................................................. 428

pipelines .............................................................. 341

Python SDK ........................................................ 444

Temporary branch ...................................... 702, 704

Tenant ....................................................................... 185

ID ............................................................................ 103

manage ................................................................ 599

shared ................................................................... 756

types ...................................................................... 163

workspace .................................................. 186, 607

Tenant admin ..................................... 187, 602, 604

create users ........................................................ 642

view ....................................................................... 600

TensorFlow .......................................... 311, 443, 445

inception model ............................................... 422

pipelines .............................................................. 347

Term template ....................................................... 228

Termination date ................................................. 131

Termination protection .................. 113, 129, 135

remove ................................................................. 148

Test cycle .................................................................. 705

Test Drive Center (TDC) ..................................... 144

Testing environment .......................................... 701

Text analysis ........................................................... 540

linguistics ............................................................ 541

operator ............................................................... 540

Threshold ................................................................. 225

Tiller ........................................................................... 296

Time series algorithm ........................................ 409

2162.book Seite 781 Mittwoch, 22. September 2021 8:49 20

Index

782

Time series engine ............................................... 518

Time to live (TTL) ..................................................... 83

Tokenization ........................................................... 541

Tooltip ....................................................................... 378

Total economic impact (TEI) .................. 309, 313

benefits ................................................................. 315

components ........................................................ 313

costs ....................................................................... 317

framework .......................................................... 314

Trace message ........................... 180, 272, 617, 621

Trace publisher ...................................................... 622

Traditional deployment ........................................ 70

Training data .......................................................... 733

add to data lake ................................................ 734

Training operator ................................................. 446

Training pipeline ................................ 342, 444, 446

deploy with Python SDK ................................ 447

execute ................................................................. 351

metrics .................................................................. 360

Training run .................................................. 419, 451

Transaction

/n/IWFND/GW_CLIENT ................................. 481

SICF ........................................................................ 479

SM30 ..................................................................... 475

STC01 ..................................................................... 478

Transformation history ..................................... 230

Transmission control .......................................... 642

Transparency .......................................................... 718

Trigger message ..................................................... 265

Troubleshooting ................................................... 164

T-shirt sizing approach ................................ 99, 389

TTL controller ............................................................ 83

U

Undeploy model ................................................... 421

Union view .................................................... 186, 606

Usage analytics ...................................................... 648

Use cases ................................................ 737, 741, 743

automatic invoice posting ........................... 744

finance .................................................................. 746

food storage and maintenance .................. 745

guided vendor onboarding .......................... 744

manufacturing .................................................. 749

optimize asset effectiveness ........................ 743

supply chain ....................................................... 747

User ......................................................... 100, 104, 185

acceptance testing ........................................... 708

assign .................................................................... 126

assign policies ................................................... 654

authentication .................................................. 642

create as tenant admin .................................. 642

User (Cont.)

groups ................................................................... 115

manage ................................................................. 604

permissions ......................................................... 115

policies .................................................................. 605

preferences .......................................................... 175

workspace ............................................................ 698

User account and

authentication (UAA) ............................ 106, 644

User interface (UI) ................................................. 186

Utilities ...................................................................... 742

V

Validity check ......................................................... 107

VCTL ..................................... 596, 597, 663, 690, 697

commands ........................................................... 598

operating system .............................................. 596

Version ...................................................................... 350

control system .................................. 73, 673, 708

create ..................................................................... 353

history ................................................................... 354

version.json file ...................................................... 624

View ............................................................................ 530

additional functions ........................................ 533

catalog .................................................................. 532

import and export ............................................ 535

Virtual deployment ................................................ 71

Virtual machine ............................................. 71, 109

sizing ..................................................................... 128

Virtual private cloud (VPC) ................................ 659

Virtual private network (VPN) ............... 467, 659

Visual board ......................................... 360, 363, 420

Visual Studio Code ...................................... 170, 679

/vflow folder ....................................................... 687

access ..................................................................... 681

add .......................................................................... 680

integrate GitHub repository ........................ 685

Visualization .............................. 324, 363, 377, 383

SAP Data Warehouse Cloud ......................... 566

Volume controller ................................................... 77

Vora Tools ..................................... 89, 178, 520, 524

vrep command ....................................................... 599

vSystem ..................................................................... 610

W

Warm data ................................................................ 388

Whitelisting ............................................................. 475

collaborative ...................................................... 477

individual ............................................................. 475

Wholesale distribution ....................................... 742

2162.book Seite 782 Mittwoch, 22. September 2021 8:49 20

783

Index

Windows Azure Storage Blob (WASB) ........... 501

WinSCP ...................................................................... 141

Wiretap operator ................................................... 286

Worker node ...................................................... 69, 75

components ........................................................... 77

Workflow .................................................................. 261

Workflow Trigger operator ..................... 265, 484

Workload ............................................................. 80, 94

management ........................................................ 73

Workspace ............................................................... 186

Wrapper .................................................................... 451

Write File operator ................... 247, 415, 499, 504

Write Results File operator ............................... 239

Y

YAML Ain’t Markup Language (YAML) ........ 154

2162.book Seite 783 Mittwoch, 22. September 2021 8:49 20

First-hand knowledge.

We hope you have enjoyed this reading sample. You may recommend or pass it on to others, but only in its entirety, including all pages. This reading sample and all its parts are protected by copyright law. All usa-ge and exploitation rights are reserved by the author and the publisher.

Dharma Teja Atluri is an executive architect and artificial intelli-gence/machine learning evangelist at IBM. He has more than 18 years of experience working in advanced analytics with both SAP and non-SAP product lines.

Atluri, Bardhan, Ghosh, Ghosh, Saha

SAP Data Intelligence: The Comprehensive Guide783 pages, 2022, $89.95 ISBN 978-1-4932-2162-2

www.sap-press.com/5369

Devraj Bardhan is an accomplished global leader for SAP Inno-vations at IBM. He has led several large transformation projects, driving business growth agenda through innovation and digital efficiencies.

Santanu Ghosh is an SAP analytics practitioner working as a consultant for more than 15 years in the data warehouse space. He has worked with SAP Business Warehouse, SAP HANA, SAP BusinessObjects BI, and SAP Analytics Cloud.

Snehasish Ghosh is an enterprise information management (EIM) consultant and data engineer working at IBM Australia. He has more than 15 years of experience working in analytics and the information management portfolio.

Arindom Saha is an SAP business intelligence consultant with more than 11 years of experience working with the SAP analytics portfolio. He has extensive experience in SAP and non-SAP repor-ting and visualization products.