Browse the BookIn this chapter, you’ll see how data is governed and managed in SAP Data Intelligence. You’ll learn how to use the Metadata Explorer to disco-ver, profile, and catalog your data, as well as create data quality rules, run rulebooks, and perform data lineage analysis.
Atluri, Bardhan, Ghosh, Ghosh, Saha
SAP Data Intelligence: The Comprehensive Guide783 pages, 2022, $89.95 ISBN 978-1-4932-2162-2
www.sap-press.com/5369
First-hand knowledge.
“Metadata-Driven Data Governance”
Contents
Index
The Authors
193
5
Chapter 5
Metadata-Driven Data Governance
You’re now ready to learn about data governance and data quality
management. In this chapter, you’ll learn how to use SAP Data Intelli-
gence metadata governance to manage your data and generate data-
driven insights. We’ll guide you through each step in the process with
practical examples.
Data governance over your organization’s data and building a unified view of data
stored across multiple systems in silos are key activities in creating a consistent infor-
mation management ecosystem. A well-structured data governance framework enables
the following benefits:
� Unified metadata catalog to gain visibility into the data assets in the enterprise infor-
mation management (EIM) landscape
� Easy governance and management of metadata across disparate sources
� The ability to explore, analyze, and consume information on your data assets with
the ability to share, version management, and lineage assessment
� Data quality monitoring and active data governance to improve reliability and trust-
worthiness of enterprise data
� Provide better insight into privacy-related data
� Quick turn around on information requests with easy access to the information in
the data and data models
� Self-service and data-driven decision-making by business users
� Support nondomain experts and business users to relate IT data assets to business
terminology
In this chapter, we’ll walk you through how to use SAP Data Intelligence to discover and
curate your data, perform data quality assessments, and enrich your data with business
semantics using other data sets. We’ll first discuss using the Metadata Explorer, a key
data governance tool within SAP Data Intelligence, before we cover data profiling, man-
agement via the data catalog, data quality rulebooks, and data lineages.
2162.book Seite 193 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
194
Note
To use the exercises in this book, you’ll need to work with your system administrator to
ensure you have the correct authorizations and roles to access several features in the
Metadata Explorer in SAP Data Intelligence according to your user persona—whether
you’re a data engineer, data or information steward, or business user. For more informa-
tion related to roles and authorizations, refer to Chapter 17, Section 17.2.1.
5.1 Metadata Explorer for Data Governance
Once you log on to SAP Data Intelligence, tiles provide access to different sets of activi-
ties. A good thing about this modularization is that access to tiles is controlled by sys-
tem administrators with the help of roles and authorizations, thereby ensuring that
data security and privacy are managed effectively. A user can only access to what they
need. Refer to Chapter 17, Section 17.2.1, for more information on roles and user access
control. The tile we’ll explore in this chapter is the Metadata Explorer. Figure 5.1 shows
the homepage, which displays some cards to help you navigate to your desired area,
which we discussed in detail in Chapter 4, Section 4.2.2.
Figure 5.1 Metadata Explorer Homepage
In this section, we’ll show you how to extract or crawl metadata from the different
source and target systems in your information ecosystem, manage this metadata, and
2162.book Seite 194 Mittwoch, 22. September 2021 8:49 20
195
5.1 Metadata Explorer for Data Governance
5
also generate a complete picture of you various connected systems through the intui-
tive Discovery Dashboard.
5.1.1 Intelligent Information Management with the Discovery Dashboard
Let’s begin with the Discovery Dashboard, shown in Figure 5.2, which can be accessed
from the Monitor tile in the Metadata Explorer homepage. This useful link provides a
set of metrics to assess the performance and usage of the Metadata Explorer. You’ll see
charts, graphs, tables, and hyperlinks to other tiles providing access to more details
about each metric, such as the following:
� Memory Usage
Shows the memory utilization for data preparation, data cataloging, and data profil-
ing. The Metadata and Preparation charts show the amount of memory used versus
free memory and uses different colors to indicate the level of utilization and alerts.
� Dataset Distribution
Displays data set distribution across connections. You can click on each section of
the pie chart to display the number of data sets for a particular connection. You also
use the Manage Publications link, which will take you to Manage Publications and
show you the published data sets.
� Monitoring
Shows you the overall status of various tasks being performed in the Metadata
Explorer like Profile, Publish, Rulebook, and Preparation, all classified by status (i.e.,
Error, Running, Completed, and Partial). Click Manage to go to the Monitoring page,
where you can filter by date range, task type, or task status.
� Recently Run Rulebooks
Shows the number of available rulebooks and the statuses of the last five rulebooks.
You’ll also see the number of rules contained in each rulebook. Click the number to
see the rules and categories themselves.
� Catalog Metrics
Displays the number of available data sets by connection. You can also see trends in
how the number of data sets has changed, by connection, in the last 7 days. This
information gives you an idea of the most used connections in your enterprise infor-
mation landscape.
� Profiling Metrics
A graphical representation of the number of data sets that have been profiled, by
connection. Also displays the total number of successful fact sheets.
� Recently Published
The tile shows 5 links to the catalog where the published data sets are stored.
� Glossary Metrics
The Glossary Metrics tile shows up to five links to the user’s most recently created or
2162.book Seite 195 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
196
updated terms in the business glossary. The top of the tile shows the number of
terms and categories that have been created.
� Tags Usage
Provides the number of tags created in a tag hierarchy and the usage of tags by object
(data sets or columns). You can search tags as well.
� Tags Hierarchy
Shows the default hierarchy and five most recently used hierarchies. With each hier-
archy, you can access further metadata about tags, like last changed or usage statis-
tics, by data set and column.
Figure 5.2 Sample Discovery Dashboard
5.1.2 Metadata Crawlers to Explore, Classify, and Label Data Assets
With SAP Data Intelligence’s Connection Management application, you can create con-
nections, view metadata, and preview data in real time to start understanding your data
set. This process is also referred to as crawling. You don’t need to stage the data set asset
physically with SAP Data Intelligence to view the metadata.
Via Browse Connections, you can view data profiling fact sheets and explore informa-
tion about the data set, review the column metadata, preview the data in real time, and
more. We’ll discuss these capabilities in detail in Section 5.2.3.
5.1.3 Managing Metadata Data across a Connected System Landscape
Data landscapes of organizations today are quite complex and disparate. For example,
in the same landscape you may find SAP ERP, SAP Business Warehouse (SAP BW), Ama-
zon Redshift for data warehouse, clouds like Amazon Web Services (AWS) or Microsoft
2162.book Seite 196 Mittwoch, 22. September 2021 8:49 20
197
5.2 Data Profiling to Understand Data
5
Azure for storage, and Microsoft Power BI as a reporting solution. With Connection
Management in SAP Data Intelligence, SAP has provided connectivity options for vari-
ous technologies thereby giving options to easily bring in the metadata from these dis-
tributed components into a central view. In this way, data managers can enjoy com-
plete transparency into data processes across all connected components.
As shown in Figure 5.3, SAP Data Intelligence gives connectivity to various types of sys-
tems, both on-premise and in the cloud, as most data landscapes nowadays are hybrid
in nature. To arrive at the Connection Management application, go to Browse Connec-
tions under the Catalog section. We’ll discuss how to create a connection in Chapter 6,
Section 6.2.
Figure 5.3 Various Connection Types in SAP Data Intelligence
5.2 Data Profiling to Understand Data
Data profiling is the process of analyzing and providing a detailed statistical report of
the data set in question. The Metadata Explorer has a built-in feature for data profiling
that provides additional information about the data stored in the object, including
minimum-maximum, average length, null values, blank values, and distinct values.
This information helps data engineers and data specialists assess the quality of the data
and identify the nature of data transformation or data preparation required before the
data set can be made available for reporting.
This section will teach you how to profile your data and understand the nature of your
data using additional tools like fact sheets.
2162.book Seite 197 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
198
5.2.1 Profiling Data Sets from Connections
To profile your data set, follow these steps:
1. From the SAP Data Intelligence landing page, access the Metadata Explorer tile and
select Catalog • Browse Connections.
2. Select the connection (DI_DATA_LAKE, in this example) and navigate to the data
object to be profiled. For our scenario, we’ll profile Items.csv in /shared/SAPDIMeta-
dataExplorer, as shown on the left side of Figure 5.4.
3. Click on and select Start Profiling, and confirm this action. The profiling task
should be initiated, as indicated by the message shown at the bottom of Figure 5.4.
Figure 5.4 Executing a Profiling Task
5.2.2 Profiling Actions and Monitor
Once the profiling is initiated, you can check the status of the profiling task by navigat-
ing to Monitor • Monitor Tasks from the Metadata Explorer homepage. Figure 5.5 shows
the various statuses of a profiling task for the Items.csv, as initiated in Section 5.2.1. The
top screen shows the profiling task in the Running status, and the bottom screen, the
profiling task in the Completed status.
All profiled data sets can be seen from the Catalog • View Profiled Datasets option
within the Metadata Explorer, as shown in Figure 5.6. You can check the history of data
profiling executed on a data set from the Version History field, arranged in descending
order of runtimes.
2162.book Seite 198 Mittwoch, 22. September 2021 8:49 20
199
5.2 Data Profiling to Understand Data
5
Figure 5.5 Data Profiling Task Statuses
Figure 5.6 Displaying the Data Profiling Version History of a Data Set
5.2.3 Viewing Profile Fact Sheets
Fact sheets provide detailed information of the metadata of the data set after data pro-
filing has been completed successfully. It provides information on the data columns,
data types, tags, unique keys, and description of the data set as well as the connection
ID, type of data set, data set size, last modified, last published, and much more. Fact
sheets provide trends on the row count and size, including charts to provide a better
view of data spread and the metadata of the data. The Data Preview tab provides sample
data of the data source.
You can access a fact sheet from Monitoring, Browse Connections, or Catalog. Choose
the data set for the fact sheet, click on , and then select the View Fact Sheet option.
2162.book Seite 199 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
200
Note
By default, the Data Preview tab is set to show only 100 records. The limit can be
increased up to 1,000 records. This value can be changed in the Data Preview tab using
the Maximum number of rows preview dropdown list.
Information is displayed in the following tabs and sections in the fact sheet, which
shows different details about the data set:
� Overview
The Overview tab, shown in Figure 5.7, is organized into the following three sections:
– Dataset Overview: Displays information on the Connection ID, Last Published,
Last Modified, Last Profiled, Number of Columns, Number of Rows, Delimiters, and
Owner.
– Dataset Metrics: Provides the distribution of columns by data type, trend analysis
of count of records profiled, number of assigned data glossary terms, and any tags
or hierarchies associated.
– People and Reviews: Provide details of any rating, commentary, or discussion
associated with the data sets that may have been provided by users using them.
Figure 5.7 Fact Sheet Overview Tab
� Columns
Displays metadata like Name, Type, Minimum-Maximum, Average Length, % of Null
or Blank Fields, Distinct Values, Uniqueness, and Number of Tags, as shown in Figure
5.8.
2162.book Seite 200 Mittwoch, 22. September 2021 8:49 20
201
5.2 Data Profiling to Understand Data
5
Figure 5.8 Fact Sheet Columns Tab
� Data Preview
Displays a set of records from the data set.
� Reviews
Shows ratings and additional information like comments and the comment history
for the data set, as shown in Figure 5.9.
Figure 5.9 Fact Sheet Reviews Tab
� Relationships
Displays the Business Glossary, Terms and Tags, and Associated Data Quality Rule-
books for the data set. You can also assign tags from the Relationships tab, as we’ll
discuss in Section 5.3.2. This tab has three sections:
2162.book Seite 201 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
202
– Terms and Tags: Provides details on any glossary term or tags associated to the
classify the data set or to individual attributes or columns on the data set.
– Data Quality: Gives details of any data quality rulebooks that have been set up on
the data object.
– More Relationships: Displays any additional data objects that have been created
from it.
5.3 Managing Publications and Data Catalogs
In this section, we’ll take you through the various steps for creating and managing
metadata related to various source and target data sets available for your organization
to use via a method of publishing data sets. This section will also show you how to orga-
nize data and related attributes or fields by tagging them and organizing these tags.
5.3.1 Catalog of Published Data Sets
Publishing a data set makes a local copy of the data set’s metadata in the Metadata
Explorer. A published data object, also known as a published data set, can be generated
from various source object types: a connection; a schema or folder on a connection; or
an object such as a view, table, or file. In this section, we’ll teach you, step by step, how
to browse a connection, publish a data set, and create tags to classify and label it.
Note
For our exercises, we’ll be using DI_DATA_LAKE connection, which is configured with
Semantic Data Lake (SDL). For details on how to create this connection, refer to Chapter
6, Section 6.2.1. This option is available for Amazon Simple Storage Service (Amazon S3),
Google Cloud Storage, Hadoop Distributed File System (HDFS), Azure Data Lake, Micro-
soft Windows Azure Storage Blob (WASB), and SDL. Make sure you have the right autho-
rization and roles to perform this activity.
Figure 5.10 and Figure 5.11 show you how to create folders and upload files to them. Fol-
low these steps:
1. Click on Browse Connections in the Metadata Explorer and select the connection
where you want to create the folder. Drill down to the location where you want to
create the folder. In this case, we’ll create a folder under DI_DATA_LAKE/shared.
2. Click on the New Folder icon , as shown in Figure 5.10 1.
3. Provide a Folder Name and click OK 2. Once the folder is created successfully, a mes-
sage will be displayed.
2162.book Seite 202 Mittwoch, 22. September 2021 8:49 20
203
5.3 Managing Publications and Data Catalogs
5
Figure 5.10 Creating Folders in Metadata Explorer for Supported Systems
4. Click on the Upload Files icon .
5. Click on , browse to the location where files were saved, select a file you want to
upload, and click Upload, as shown in Figure 5.11.
Note
Before uploading a file, you can also edit the name of the data set, or you can rename
the data set once uploaded to the folder by clicking on the three dots icon shown in the
list view of the files.
As shown in Figure 5.11, you can rename a file by clicking on Edit 1. In our example, we
renamed this file to Contacts1.csv. Once you click the Upload button, the file is uploaded
2 and can be found in the file list 3.
2162.book Seite 203 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
204
Figure 5.11 Renaming and Uploading Files in Metadata Explorer
Figure 5.12 shows the various actions you can perform on a data set. Now that we’ve
uploaded our data sets, let’s see how we can publish one. A data set is available to other
users for further analysis only after published. Once an object has been published, the
metadata for the object is available for exploratory analysis under Catalog.
Figure 5.12 List of Actions You Can Perform on a Data Set
To publish a data set, follow these steps:
1. Go to Browse Connections and go to the location where the data set is located, in this
case, under DI_DATA_LAKE/shared/SAPDIMetadataExplorer.
2. Click on the icon for the data set.
2162.book Seite 204 Mittwoch, 22. September 2021 8:49 20
205
5.3 Managing Publications and Data Catalogs
5
3. Select + (New Publication) and provide a Name and Description, as shown on the
right side of Figure 5.13.
4. Click Publish.
Figure 5.13 Publishing a Data Set
Once published, the data set should be visible under Catalog and available for other
users to access, as shown in Figure 5.14. Also, the published data set, including its parent
folder and subfolders, will be display as Published in the Browse Connections screen
even if all the data sets in the folder or subfolder are not published, as displayed in the
top and bottom screens shown in Figure 5.15, respectively.
Figure 5.14 Published Data Set in Catalog
Note
You can publish a group of objects organized by folders when browsing connections or
for an individual data set. You can explore more options in the documentation on the
Metadata Explorer, available at http://s-prs.co/v536910.
2162.book Seite 205 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
206
Figure 5.15 SAPDIMetadataExplorer Published as a Connection
Once the data set is published, you can view its metadata from the Catalog, as shown in
Figure 5.16.
Figure 5.16 Viewing the Metadata of a Published Object in Catalog View
2162.book Seite 206 Mittwoch, 22. September 2021 8:49 20
207
5.3 Managing Publications and Data Catalogs
5
Use the Browse Catalog feature in the Metadata Explorer under the Catalog. Click on
the icon beside the object and select View Metadata. Figure 5.16 shows two views
of the metadata:
1 Properties, which shows generic information on the data set like Name, Description,
Type, Size, Last Modified, Owner, Connection ID, Schema, Folder, Status, Search Rank
Matched Terms, Last Profiled, and Last Published.
2 Columns to see metadata for columns, like Name and Type.
As shown in Figure 5.17, all the published data sets shown earlier in Figure 5.16 are also
available in the Catalog. Since we’ve already profiled Items.csv in Section 5.2.1, this file
has the status of PROFILED.
Figure 5.17 List of Published Data Sets in the Catalog
5.3.2 Automatic Tags and Hierarchical Tagging
Once published, the data set is available in the corresponding connection folder under
Catalog. The Metadata Explorer in SAP Data Intelligence provides a hierarchical tagging
method for data sets and data elements or columns, which allows you to organize, man-
age, and find relevant information. The Metadata Explorer includes a preexisting Con-
tentType hierarchical tagging structure. Two tagging methods are available: automatic
and manual.
After a data set is published, when you profile a data set, the Metadata Explorer is intel-
ligent enough to understand the type of data elements in the data set and assign tags
automatically from predefined ContentType tags. For example, as shown in Figure 5.18,
when data profiling is run, automatic tags are assigned in the Number of Tags column
1. By clicking the > icon in a row in the Columns preview in a fact sheet, you’ll open a
screen to view the associated tag 2. You can decide whether to delete that tag and then
assign a tag manually by clicking the Manage Tags button 3.
2162.book Seite 207 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
208
Figure 5.18 Tags in a Fact Sheet
To assign a tag manually to a data set, after clicking the Manage Tags button 3, follow
these steps:
1. Browse for the connection and data set that you want to tag.
2. Click on the icon for the data set and select View Fact Sheet.
3. Go to the Relationships tab and click on Manage Tags, as shown in Figure 5.19.
4. In the Manage Tags window, select the tag you want to associate with the data. In this
case, we’re working with the Customer.csv file, to which we want to assign PERSONAL
INFORMATION.
5. To validate the search with tag, go back to the Catalog landing page and click the
icon next to PERSONAL INFORMATION. The Customer.csv should appear, as shown in
Figure 5.20.
2162.book Seite 208 Mittwoch, 22. September 2021 8:49 20
209
5.3 Managing Publications and Data Catalogs
5
Figure 5.19 Manual Tagging of Published Data Set from Fact Sheet
Figure 5.20 Verifying the Manual Tag
2162.book Seite 209 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
210
To assign a tag manually to a column in the data set, as shown earlier in Figure 5.18, fol-
low these steps:
1. Browse the connection and data set which you want to tag.
2. Click on the icon next to the data set and select View Fact Sheet.
3. Go to the Columns view and click on the field row.
4. If automatic tagging was performed, you can delete these tags from the Content-
Type.
5. Go to Manage Tags and select the tag from the ContentType.
6. To delete a tag, go to Manage Tags view and click the X icon beside the tag.
You can also create new tag hierarchies if the default ContentType hierarchy does not
suit the purpose, or you may want to create a new hierarchy to classify data sets and
data elements differently, for example, by functional area or business domain.
To create a new tag hierarchy, follow these steps:
1. Go to the Catalog view in the Metadata Explorer.
2. Click on the icon next to Select Tag Hierarchy and select More Actions.
3. Select Manage Tag Hierarchies, as shown on the left side of Figure 5.21, click on the +
sign, and provide a Name and Description.
4. Click Save and close.
Figure 5.21 Creating a New Tag Hierarchy
2162.book Seite 210 Mittwoch, 22. September 2021 8:49 20
211
5.3 Managing Publications and Data Catalogs
5
To add child tags to the new hierarchy, select the new hierarchy, click More Actions and
select Add Tag to Hierarchy. After maintaining the Name and Description fields, click
Save or Save and New to create the new tag.
Note
Using Add Tag to Hierarchy for a defined parent hierarchy, you can create tags that are
children and grandchildren. You can perform other actions on tags, like Edit Tag Proper-
ties, Delete Tag from Hierarchy, and Add Tag as Search Filter on the child nodes of the
top-level tag hierarchy, that is, for the FunctionalDomain case, from ContentType under
Catalog.
5.3.3 Using Tags as Search Filters
You can use tags to search for data set(s) with a particular tag or set of tags. As shown in
Figure 5.22, you can search data elements using tags by clicking the filter icon next to a
tag 1. The search tag filter is added to the top 2. The list of data elements will update in
the results pane.
Figure 5.22 Searching Data Objects Using Tags
5.3.4 Managing Publications in the Catalog
To view all publications for a particular connection, follow these steps:
1. Go to the Metadata Explorer and select Catalog • Browse Connections.
2162.book Seite 211 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
212
2. If you want to view all publications under a connection, click on New Publication and
drag and drop the connection from the left side, as shown in Figure 5.23. Click on
to see all folders, subfolders, and data sets that have been published, as shown in
Figure 5.24.
Figure 5.23 Browsing Connections for Published Data Sets
Figure 5.24 Displaying All Published Data Sets for a Connection
2162.book Seite 212 Mittwoch, 22. September 2021 8:49 20
213
5.3 Managing Publications and Data Catalogs
5
To update or delete a publication, after performing the previous steps, continue with
the following steps:
1. Click on the published data object you want to update or delete, as shown in Figure
5.25.
Figure 5.25 Navigating a Publication
2. For updating the name or description of a publication without republishing the data
set, the Update Publication button is turned on once you change any of these two
attributes. Once changed, click on Update Publication, as shown in Figure 5.26.
Figure 5.26 Modifying a Publication
2162.book Seite 213 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
214
3. If you want to include more files in a publication or select/deselect Include Subfolder
(which is available when you create a new publication), you’ll need to use Update and
Publish.
4. If you want to delete a publication, use the Delete option.
You can also manage publications from Data Intelligence Metadata Explorer • Adminis-
tration • Manage Publications. This screen displays a different view of the publications,
organized by connection, as shown in Figure 5.27. You can create a publication from this
view as well via the Create Publication button.
Figure 5.27 Creating Publications from Manage Publications
5.3.5 Lineage Depth Set in Publication Processing
This optional setting is available for lineage analysis and, when selected, shows the
source in a lineage graph. If you set the Lineage Depth as 0, no lineage analysis is
returned. You can set the value between 1 and 100 to show depth levels up to 100 levels.
For example, if you set the depth as 50 and the lineage has 15 levels, you will see all 15
levels. However, if you set the lineage depth at 5 but the actual depth is 15, only the first
5 levels are shown.
5.4 Defining Data Quality Rules and Running Rulebooks
In previous sections, you learned how to publish your data sets and organize them
using tags, how to profile your data sets to gather characteristics of your data attributes,
and how to make this information available for further analysis and usage. However,
you’ll need to continuously monitor the quality of your data from the start to ensure
your data is useful and provides the insights you’re expecting. Thus, you’ll require rules
created around data attributes or elements, and you’ll need to assess your data against
those rules and then quantify and present the outcome of the assessment using dash-
boards. This section will show you how to execute all these tasks.
2162.book Seite 214 Mittwoch, 22. September 2021 8:49 20
215
5.4 Defining Data Quality Rules and Running Rulebooks
5
5.4.1 Rules Determining Business Data Compliance
As a data steward, you must ensure that your data follows the data quality standards
defined by your organization’s master data management and data governance guide-
lines. End users and business users often need to assess or confirm the data used day to
day against specific business rules to improve its quality. A simple example would be
checking the completeness of contact information like the address or contact details of
a customer. A rule must be created and implemented to perform this check.
You’ll need to follow a sequence of steps to successfully implement a business rule. In
this section, we’ll go through each step for showing you how to create a rule, create a
rulebook, bind a rule to a data set, and execute the rulebook. A dashboard can then be
created to reflect the outcome of the rules as scorecards. To work with data quality
rules, click the Rules tile, shown earlier in Figure 5.1, from the Metadata Explorer home-
page.
When you access the Rules tile, you’ll arrive at the screen shown in Figure 5.28. With SAP
Data Intelligence’s Metadata Explorer, you can import existing SAP Information Stew-
ard rules as well, which we’ll explain in Chapter 12, Section 12.5.1. As shown in Figure 5.28,
rules are usually organized under Rules Categories. SAP provides a predefined set of cat-
egories, but you can also create new categories, which we’ll explain in detail in Section
5.4.2. A sample data set, shown in Figure 5.29, will be used for our example exercises.
Figure 5.28 Rules Page from Metadata Explorer
2162.book Seite 215 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
216
Figure 5.29 Sample Data Contact Data
Let’s say we would like to check the accuracy of the Country field to determine if the file
has the correct country code for Australia, as defined by the business, which in this case
is AU. To create the rule, follow these steps:
1. Click on the icon next to the Accuracy category, shown earlier in Figure 5.28, and
choose Create Rule. You can also click on the Create Rule icon.
2. On the Create Rule – Completeness screen, shown in Figure 5.30, provide a Rule ID,
Name, and Description and then click Save. The Rule ID is free text but should be con-
sistent with the defined data and information standards for your organization.
Figure 5.30 Creating a Data Quality Rule
3. On the next screen, you can add a parameter by clicking the icon to accept the
value and Save. In our example, we’ve chosen the parameter to check for case sensi-
tivity, as shown in Figure 5.31. You need to fill out necessary details like Name, Type,
whether it is case sensitive or not, and Description.
2162.book Seite 216 Mittwoch, 22. September 2021 8:49 20
217
5.4 Defining Data Quality Rules and Running Rulebooks
5
4. Once you create a parameter, only then can you add the condition to check by click-
ing the icon. Assign the P_CC parameter we created earlier by selecting Operator
Condition from the Parameter Name dropdown list. Depending on the nature of the
operator condition being checked or validated, you may have to fill in additional
details. The Mode field has two options: User Entry where you can define the values
or formats to be used in the condition and Parameter Value where you must identify
one or more additional parameters with the same data type as the selected parame-
ter. In our example, we’re checking that the value of the Country Code field is equal
to “AU.”
5. If the rule has been defined correctly, the Rule is valid message will be displayed at
the top of the screen.
Figure 5.31 Defining a Data Quality Rule
You can decide to apply the rule on a specific set of records from a data set using the Fil-
ters option on the Rule Definition screen.
The next step in the process is to test the rule we just created to ensure that it is working
as expected. This test can be performed by clicking the Test Rule button. A new screen
will open where you can define some test cases and test the rule by clicking the + but-
ton, as shown in Figure 5.32.
In this case, we’ve updated the rule to be case sensitive. In the top screen, shown in
Figure 5.33, you can enter test parameter values as inputs in the numbered rows. Then,
click the Run Tests button to review the results, as shown in the bottom screen. The only
test case that passes is where the value is “AU.”
2162.book Seite 217 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
218
Figure 5.32 Adding a Test Case
Figure 5.33 Creating Test Cases for Rule Validation
2162.book Seite 218 Mittwoch, 22. September 2021 8:49 20
219
5.4 Defining Data Quality Rules and Running Rulebooks
5
You can delete or edit header properties, parameters, conditions, and filters from the
Rule Definition dashboard. You can execute similar actions for these test cases.
5.4.2 Categories to Organize Business Rules
SAP has provided a predefined set of categories for organizing business rules. SAP’s pre-
defined rule categories are shown in Table 5.1.
However, you may have rules that don’t fall into any of these categories. In this case,
you can create a new rule category from the Rule Overview screen, as shown in Figure
5.34. To create the category, click the + button, shown at the top of the screen. Then,
enter a Name for the category and a Description and then click the Save button. You can
then see the new category, in our example, Sensitivity, as shown in the bottom screen.
You can edit or delete the category by clicking on the icon next to the rule category.
Rule Category Category Description Example
Accuracy Data has a standard value. Country code is populated as standard
value for all records.
Completeness All necessary data is present. Customer record should have address,
email, and contact number.
Conformity Confirm correctness of data
type and format.
Contact number should be 9 digits.
Consistency The data value is same across
data sets.
If a record is inactive, the Inactive field
is filled with X across data sets.
Integrity Validate data relationships. Check customer records have child
records in customer contacts.
Timeliness Validate data is current and
available
Report quarterly sales by a certain
date.
Uniqueness Check for duplicate records
or primary keys.
Check that there is only one record for
a product in a product data set.
Validity Validate if data supports a
policy or measurement.
A new product should be of a particu-
lar color.
Table 5.1 SAP-Defined Rule Categories
2162.book Seite 219 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
220
Figure 5.34 Creating a New Rule Category
5.4.3 Using the Match Pattern Operator
In some scenarios, you may want to perform data quality checks for some specific pat-
terns for the values or strings in a particular field. For example, you may want to check
that a particular string has only alphanumeric characters or that the contact number
provided is a 9-digit phone number. The Match Pattern operator facilitates the imple-
mentation of similar data validation rules using the Metadata Explorer.
The example we’ll use in this section is validating a 9-digit phone number. As shown in
Figure 5.35, this rule has been defined using the steps described in Section 5.4.1. Once a
rule is defined, you can enter a test input value to ensure the rule works as expected.
The test results are shown in Figure 5.36.
2162.book Seite 220 Mittwoch, 22. September 2021 8:49 20
221
5.4 Defining Data Quality Rules and Running Rulebooks
5
Figure 5.35 Setting Up a Rule to Match the Pattern of a Contact Number
Figure 5.36 Test Result Showing the Correct Validation of Contact Numbers
5.4.4 Running and Monitoring Rulebooks
A rulebook is an object created in the Metadata Explorer to manage a set of rules that
can be run on one or more data sets. You may often want to run a set of data quality
checks on a specific set of data that is relevant for your department or domain of busi-
ness. The best approach for this goal is defining rules, creating rulebooks, binding rules
2162.book Seite 221 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
222
to various entities in your data sets, and then executing the rulebook. A rulebook can
have rules belonging to one or more category, and rules may be bound to one or more
data sets in the same rulebook. For example, we’ll show you how two different rules are
bound to different data sets but included in the same rulebook.
First, let’s create a rulebook. Rulebook creation can be done from the rulebooks link on
the Rules tile, shown earlier in Figure 5.1. You’ll arrive at the Rulebook Overview screen,
shown in Figure 5.37. Click the + button to create a rulebook, enter a Name and Descrip-
tion, and click the Save button.
Figure 5.37 Creating a Rulebook
Once the rulebook is created, you’ll need to import the rules you want to execute in the
rulebook. Your new rulebook will appear as a tile on the Rulebook Overview screen.
Click the tile to arrive at the screen shown in Figure 5.38. Now, click the Import Rules
icon on the right to open the screen shown in Figure 5.39, where you can select the
required rules. Click Save.
Figure 5.38 Importing Rules into a Rulebook
2162.book Seite 222 Mittwoch, 22. September 2021 8:49 20
223
5.4 Defining Data Quality Rules and Running Rulebooks
5
Figure 5.39 Selecting and Adding Rules to a Rulebook
After your rules are imported, the next step is to bind the rules to data sets and col-
umns. In our example, we’ll bind our two rules to two different data sets. To bind a rule,
click on the icon next to the imported rule and select View Rule Bindings, as shown
in Figure 5.40.
Figure 5.40 Viewing Rule Bindings
Click on the + icon to open the Create Rule Binding screen, shown in Figure 5.41. In the
Qualified Name field, provide the full path of the data set to which you want to bind the
specific rule. The Binding Name is a unique identifier for the specific rule binding cre-
ated. You can also add a Description to set its context. Finally, map the field in the data
set to the parameter assigned to the rule to complete this step.
2162.book Seite 223 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
224
Figure 5.41 Creating Rule Bindings
Figure 5.42 shows the first rule binding we created earlier, for checking the accuracy of
the country code, and Figure 5.43 shows our second rule binding, for validating the pat-
tern of the contact number.
Figure 5.42 First Rule Binding
2162.book Seite 224 Mittwoch, 22. September 2021 8:49 20
225
5.4 Defining Data Quality Rules and Running Rulebooks
5
Figure 5.43 Adding a Second Rule Binding
Once the rulebook is created and rules are bound, run the rulebook using the Run All
option. Once the execution is completed, click View Results to check the results, as
shown in Figure 5.44.
Figure 5.44 Displaying the Number of Records Passing the Criteria for Each Rule Binding
As shown in Figure 5.44, the percentage of rows that passed the data quality check is
60%. A list of rows for which validation failed is also provided.
Thresholds determine the passing and failure values for the rulebook. You can also
change thresholds in the rulebook by clicking the icon shown in Figure 5.45.
2162.book Seite 225 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
226
Figure 5.45 Setting Rulebook Thresholds
You can see the status of all completed activities via Monitoring • Monitoring Tasks in
the Metadata Explorer, as shown in Figure 5.46.
Figure 5.46 Checking Task Statuses
You can further create a quality dashboard from Rules • View Rules Dashboard to mon-
itor rulebook results. You can click the + button to create a new dashboard, arriving at
the set of screens shown in Figure 5.47. First, add a dashboard Name and Description and
click Save 1. Then, click the + icon to add a new data quality scorecard 2 and use the
Scorecard Wizard to set up the dashboard 3. The wizard has five steps:
1. Select the rulebook for which you would want to create the scorecard.
2. Select the type of reporting you would like to perform (i.e., reporting on rule catego-
ries, data sets, or the rulebook itself).
3. Choose a scorecard type. For more details, refer to http://s-prs.co/v536911.
4. Select one or more data sets, depending on the Scorecard Type option.
5. Maintain the Title and Subtitle fields and click Save.
2162.book Seite 226 Mittwoch, 22. September 2021 8:49 20
227
5.4 Defining Data Quality Rules and Running Rulebooks
5
Figure 5.47 Creating a New Scorecard
Figure 5.48 shows our example dashboard comparing two rule categories.
Figure 5.48 Data Quality Dashboard
2162.book Seite 227 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
228
5.4.5 Business Glossary of Terms and Definitions
For every organization, maintaining a central repository of terms and what they mean
from a business context for sharing is an important activity. Defining a business glos-
sary ensures a consistent set of terminology is used to refer to data sets, entities, and
relationships. The main aim of using a glossary across the enterprise is to ensure a bet-
ter understanding of the information used across the organization.
A business glossary consists of three main areas:
� A term template defines additional information that is required or optional when the
terms are defined.
� A category groups various terms.
� The defined terms provide clarity for the business.
In the Metadata Explorer, a default glossary placeholder is provided by SAP. You’ll need
to define the necessary categories and terms in this placeholder. For starters, a group of
individuals within the organization should agree on a set of terms and their definitions
for the glossary. This set of individuals could be data stewards, business reps, or end
users.
You can define a new glossary category by going to the Business Glossary tile, shown
earlier in Figure 5.1, and clicking glossaries. Then, click the + icon beside Category and
click the Create Category button.
To create a new term in the glossary, click Create Term to arrive at the screen shown in
Figure 5.49, where you’ll provide a Name for the term, a Definition of the term for busi-
ness users, and Keywords to identify data sets or attributes that can be identified by the
term. Click Save when you’re done with these settings.
Figure 5.49 Defining a New Term
2162.book Seite 228 Mittwoch, 22. September 2021 8:49 20
229
5.4 Defining Data Quality Rules and Running Rulebooks
5
A term can be linked to other terms, rules, rulebooks, published data sets, and columns.
With term relationships, you can visualize related information in a graph. This graph
can provide a complete picture of a term’s relevance in the EIM landscape. If a relation-
ship is no longer relevant, you can remove the link. Likewise, when related objects are
removed from the catalog, they are automatically updated in the related objects for the
associated terms. For example, if the contacts table is removed from the connection,
then those terms linked to the table or the columns within the table are removed from
the term’s Relationships tab.
Once a term is defined and saved, click Edit and go to the Relationships tab, where you
can click the Edit Related Objects button. Now, you can associate any Terms, Datasets/
Columns, Rules, or Rulebooks, as shown in Figure 5.50, by making selections and click-
ing Save Related Objects.
Figure 5.50 Editing Related Objects
Note
Data sets/columns are only available for association to terms when they have been
published to the catalog.
A graphical representation of the relationships created between terms is shown in
Figure 5.51.
2162.book Seite 229 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
230
Figure 5.51 Viewing Relationships
You can set a business glossary as default or delete or edit the glossary. Terms should
be reviewed regularly to ensure they are up to date with latest definitions and associa-
tions relevant to your organization or industry. If a term is no longer needed, it can be
deleted from the glossary. You could also create categories and associate with terms in
the glossary. These categories can be then used as filter conditions.
5.5 Data Lineage from Transformation History
By now, you’ve seen how to bring in data from various systems, extract metadata,
assess data quality, and make data sets available to end users. However, one important
aspect of all these activities is the ability to quickly identify the root causes of issues
that may impact reports and end users. To perform this triage, you’ll need to under-
stand the relationships between these organized data sets, which is where data lineage
analysis comes into play. This section will help you understand how to work with rela-
tionships between data sets.
5.5.1 Lineage Analyses for Tracing Data Sets to Sources
Let’s consider a scenario where you’re using a report developed from a reporting tool
for end-of-the-month reporting, but you realize the data doesn’t look correct. Or you
open a report you used successfully yesterday, but today, it stops working because of an
issue with a field on a data set you’ve used. Finding out which source system the data is
coming from or what transformation has been done on the data can be painstakingly
2162.book Seite 230 Mittwoch, 22. September 2021 8:49 20
231
5.5 Data Lineage from Transformation History
5
difficult analysis. With lineage analysis in the Metadata Explorer, you can quickly iden-
tify the source data set, and the turnaround for fixing data could be greatly reduced.
You can see where your data is coming from when using multiple source systems and
complex transformations in your graphs. For example, Figure 5.52 shows the data lin-
eage for an SAP Business Explorer (SAP BEx) query for an SAP BW system, which might
be used to build a report in SAP Analytics Cloud, SAP BusinessObjects, or another
reporting tool. The data lineage of the query shows the associated SAP BW InfoProvider
object as the source, shows intermediate SAP BW objects, and shows the transforma-
tion steps that finally resulted in the output of SAP BEx query.
Figure 5.52 SAP BEx Query Data Lineage
5.5.2 Lineage Extraction and Supported Sources
Data lineage information can be extracted for several types of sources, such as the fol-
lowing:
� SAP BW: Data stores, InfoProviders, and SAP BW queries
� SAP HANA: SQL views, column views, and synonyms
� SAP Vora: Data source tables and views
Lineages can also be extracted from operators in Modeler graph tasks, which uses a data
set referenced through a connection defined in the Connection Management applica-
tion.
Notes
Some lineage extraction limitations exist with some operators in graphs. The list of
operators supporting graph extractions can be found at http://s-prs.co/v536912.
2162.book Seite 231 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
232
Lineages can be extracted using two different methods, which we’ll discuss in the fol-
lowing sections.
Extracting Lineage while Publishing Data Sets
To extract the lineage of a data set, enable the Lineage option (if supported) while cre-
ating a publication or when publishing a data set, as shown in Figure 5.53. You can refer
back to Section 5.3.1 for more details on publication of data sets. To extract lineage from
Modeler graphs, you can use similar steps.
Figure 5.53 Lineage Toggle for Support Data Set
Often, when a lineage extraction is enabled on a data set used in a graph, a lineage
extraction graph is automatically triggered, as shown in Figure 5.54. In this example, an
SAP HANA table has been used as a target system in the graph, and an SAP HANA data-
base metadata extractor graph has been initiated.
2162.book Seite 232 Mittwoch, 22. September 2021 8:49 20
233
5.5 Data Lineage from Transformation History
5
Figure 5.54 Lineage Extractor Graph Initiated
Automatic Lineage Extraction
You can enable automatic lineage extraction on Modeler graphs and during Metadata
Explorer data preparation. With automatic lineage enabled, you can create a history of
lineage analysis, showing details on how the graph has changed with respect to sources
or targets that have been added or removed as well as the transformations that have
been performed. A few additional settings must be enabled in the System Management
application for SAP Data Intelligence. Under the General tab, the options shown in
Figure 5.55 should be configured.
For Metadata Explorer: Automatic lineage extraction of Modeler Graphs and Metadata
Explorer: Automatic lineage extraction of Data Preparations, you must choose one of
the following options:
� enabled_and_publish_datasets
Extracts lineage and publishes the data set to the catalog in the Metadata Explorer.
You can access this the lineage by browsing the connection or the catalog.
� enabled_and_do_not_publish_datasets
Extracts lineage but does not publish the data set to the catalog. You can access this
lineage by browsing the connection in the Metadata Explorer.
� disabled
Will not automatically extract lineage.
For the Metadata Explorer: Days until deletion of automatic lineage option, set the
value to “-1” to ensure all automatic lineage is maintained.
2162.book Seite 233 Mittwoch, 22. September 2021 8:49 20
5 Metadata-Driven Data Governance
234
For the Metadata Explorer: Automatic lineage extraction frequency setting, ensure the
number of minutes is set for the extraction interval.
Figure 5.55 Enabling Automatic Lineage Extraction
5.5.3 Understanding and Configuring the Lineage View
The view for data lineage can be configured to suit your needs through the settings
under the Lineage tab in the Catalog. Click on the Settings , as shown in Figure 5.56,
and select the options shown in Table 5.2.
Figure 5.56 Lineage View Settings
2162.book Seite 234 Mittwoch, 22. September 2021 8:49 20
235
5.6 Summary
5
5.6 Summary
This chapter explored SAP Data Intelligence’s Metadata Explorer in depth, including its
features that you can implement to develop your organization’s data governance
framework. We discussed features like data quality assessment, data lineage tracking,
and cataloging and showed you how to implement these capabilities.
In the next chapter, you’ll learn how to create data pipelines and ingest, cleanse, trans-
form, and store data.
Settings Description
Fixed Node Width Adjusts how object names are displayed (on or off).
Orientation Changes the orientation of how nodes are displayed.
Node Placement Adjusts the number of straight edges and the placement of nodes.
Three options are available:
� Brandes-Koepf
� Linear Segments
� Simple
Node Spacing Adjust the distance between the nodes appropriately.
Line Types Manage how the lines connecting the nodes are displayed:
� Merge: This option combines lines that go in the same direction
and then split when necessary.
� Split: This option separates each line.
Table 5.2 Lineage View Settings: Options
2162.book Seite 235 Mittwoch, 22. September 2021 8:49 20
7
Contents
Preface ....................................................................................................................................................... 21
Part I Getting Started
1 The Data Fabric for the Intelligent Enterprise 33
1.1 Data Fabric ................................................................................................................................ 34
1.1.1 Trends ......................................................................................................................... 35
1.1.2 Benefits ...................................................................................................................... 37
1.2 Data Orchestration ............................................................................................................... 38
1.3 SAP Business Technology Platform ............................................................................... 40
1.4 SAP Data Intelligence .......................................................................................................... 43
1.5 Summary ................................................................................................................................... 50
2 Architecture and Capabilities 51
2.1 Genesis of SAP Data Intelligence .................................................................................... 52
2.1.1 Features from SAP Leonardo Machine Learning Foundation .................. 54
2.1.2 Evolution from SAP Data Hub to SAP Data Intelligence ............................ 58
2.2 SAP Data Intelligence Architecture ............................................................................... 60
2.3 Deployment Options and Bring Your Own License Model .................................. 63
2.4 Kubernetes Cluster and Containers .............................................................................. 68
2.4.1 Overview of Kubernetes ....................................................................................... 68
2.4.2 Kubernetes Cluster Architecture ....................................................................... 75
2.4.3 Container Runtimes ............................................................................................... 78
2.4.4 Pods and Workloads .............................................................................................. 79
2.4.5 Resources and Policies .......................................................................................... 81
2.4.6 Kubernetes and SAP Data Intelligence ............................................................ 83
2.5 SAP Data Intelligence Launchpad .................................................................................. 86
2.5.1 Persona-Based Application ................................................................................. 86
2.5.2 Overview of Applications ..................................................................................... 88
2.6 Summary ................................................................................................................................... 91
2162.book Seite 7 Mittwoch, 22. September 2021 8:49 20
Contents
8
3 Setup and Installation 93
3.1 Landscape Sizing .................................................................................................................... 93
3.1.1 Sizing Various SAP Data Intelligence Components .................................... 94
3.1.2 Minimum Sizing and Initial Sizing for SAP Data Intelligence .................. 95
3.1.3 Understanding the T-Shirt Sizing Approach ................................................. 99
3.2 SAP Cloud Appliance Library ............................................................................................. 99
3.2.1 Getting Started with SAP Cloud Appliance Library ..................................... 101
3.2.2 Deploying SAP Solutions in the Cloud ............................................................. 103
3.2.3 Activating and Creating Solution Instances .................................................. 105
3.2.4 Security Considerations for SAP Cloud Appliance Library ......................... 106
3.3 On-Demand Cloud Provisioning and Instance Sizing ............................................ 107
3.3.1 Sizing with SAP Cloud Appliance Library ........................................................ 108
3.3.2 Supported Cloud Providers for SAP Cloud Appliance Library ................... 109
3.3.3 Understanding Costs and Payments ............................................................... 109
3.3.4 Backing Up, Restoring, and Terminating an Instance ................................ 112
3.4 Setting Up SAP Data Intelligence on SAP Cloud Appliance Library ................. 113
3.4.1 Prerequisites for Cloud Provider Account ...................................................... 114
3.4.2 Connecting to SAP Cloud Appliance Library .................................................. 122
3.4.3 Creating and Accessing the Solution ............................................................... 124
3.4.4 Accessing the Jump Box for Monitoring and Troubleshooting .............. 136
3.4.5 Running the Solution ............................................................................................ 145
3.4.6 Access through Browser Using Local Hosts File ........................................... 148
3.4.7 Personalization ........................................................................................................ 149
3.5 SAP Data Intelligence 3.0 Installation On-Premise ................................................ 150
3.5.1 Planning and Prerequisites for an On-Premise Installation ..................... 150
3.5.2 Modular Deployment with SLC Bridge ............................................................ 151
3.5.3 Installing SAP Data Intelligence with the Maintenance Planner
and SLC Bridge ......................................................................................................... 154
3.6 Summary ................................................................................................................................... 168
4 Using SAP Data Intelligence Applications 169
4.1 SAP Data Intelligence Launchpad Applications ....................................................... 169
4.2 Applications for Data Engineers ..................................................................................... 172
4.2.1 Connection Management ................................................................................... 172
4.2.2 Metadata Explorer ................................................................................................. 174
4.2.3 Modeler ...................................................................................................................... 175
2162.book Seite 8 Mittwoch, 22. September 2021 8:49 20
9
Contents
4.2.4 Customer Data Export .......................................................................................... 176
4.3 Applications for Data Scientists ...................................................................................... 177
4.3.1 ML Scenario Manager ........................................................................................... 177
4.3.2 Vora Tools ................................................................................................................. 178
4.4 Applications for Modelers and Auditors ..................................................................... 179
4.4.1 Monitoring Applications ...................................................................................... 180
4.4.2 Audit and System Logs ......................................................................................... 181
4.5 Applications for System Administrators ..................................................................... 182
4.5.1 Policy Management ............................................................................................... 182
4.5.2 Handling Privileges ................................................................................................ 184
4.5.3 System Management ............................................................................................ 184
4.5.4 License Management ............................................................................................ 188
4.6 Summary ................................................................................................................................... 189
Part II Data Management, Orchestration, and Machine Learning
5 Metadata-Driven Data Governance 193
5.1 Metadata Explorer for Data Governance .................................................................... 194
5.1.1 Intelligent Information Management with the
Discovery Dashboard ............................................................................................ 195
5.1.2 Metadata Crawlers to Explore, Classify, and Label Data Assets ............. 196
5.1.3 Managing Metadata Data across a Connected System Landscape ...... 196
5.2 Data Profiling to Understand Data ................................................................................ 197
5.2.1 Profiling Data Sets from Connections ............................................................. 198
5.2.2 Profiling Actions and Monitor ............................................................................ 198
5.2.3 Viewing Profile Fact Sheets ................................................................................. 199
5.3 Managing Publications and Data Catalogs ................................................................ 202
5.3.1 Catalog of Published Data Sets ......................................................................... 202
5.3.2 Automatic Tags and Hierarchical Tagging ..................................................... 207
5.3.3 Using Tags as Search Filters ................................................................................ 211
5.3.4 Managing Publications in the Catalog ............................................................ 211
5.3.5 Lineage Depth Set in Publication Processing ................................................ 214
5.4 Defining Data Quality Rules and Running Rulebooks .......................................... 214
5.4.1 Rules Determining Business Data Compliance ............................................ 215
5.4.2 Categories to Organize Business Rules ........................................................... 219
2162.book Seite 9 Mittwoch, 22. September 2021 8:49 20
Contents
10
5.4.3 Using the Match Pattern Operator ................................................................... 220
5.4.4 Running and Monitoring Rulebooks ................................................................ 221
5.4.5 Business Glossary of Terms and Definitions ................................................. 228
5.5 Data Lineage from Transformation History .............................................................. 230
5.5.1 Lineage Analyses for Tracing Data Sets to Sources ..................................... 230
5.5.2 Lineage Extraction and Supported Sources ................................................... 231
5.5.3 Understanding and Configuring the Lineage View .................................... 234
5.6 Summary ................................................................................................................................... 235
6 Modeling Data Processing Pipelines 237
6.1 Using the SAP Data Intelligence Modeler ................................................................... 237
6.1.1 Flow-Based Paradigm as a Network of Information .................................. 238
6.1.2 Data Pipeline Engine in the Flow-Based Modeler ....................................... 239
6.1.3 Navigating the Modeler Panes and Toolbars ................................................ 240
6.1.4 Built-In Operators ................................................................................................... 242
6.1.5 Creating and Validating Graphs ........................................................................ 244
6.2 Creating and Managing Connections ........................................................................... 250
6.2.1 Creating Connections ........................................................................................... 250
6.2.2 Connecting to Cloud Foundry ............................................................................ 251
6.2.3 Managing Certificates .......................................................................................... 253
6.2.4 Authorizations for Connections ........................................................................ 254
6.3 Self-Service Data Preparation with the Metadata Explorer ............................... 255
6.3.1 Preparing Data for Accurate Results and Better Insights ......................... 255
6.3.2 Self-Service Data Preparation with the Metadata Explorer ..................... 255
6.3.3 Transforming Structured Data Sets ................................................................. 256
6.3.4 Managing Data Preparation Actions ............................................................... 258
6.3.5 Processing Data Preparation Actions .............................................................. 259
6.4 Integrating, Processing, and Orchestrating Workflows ....................................... 261
6.4.1 Graph Snippets as a Group of Operators ....................................................... 262
6.4.2 Working with Data Workflow Operators ....................................................... 264
6.4.3 Integrating SAP Cloud Applications ................................................................. 266
6.4.4 Change Data Capture Graph .............................................................................. 267
6.4.5 Custom Operators .................................................................................................. 267
6.5 Scheduling and Monitoring Data Pipelines ............................................................... 270
6.5.1 Scheduling and Monitoring Data Pipelines ................................................... 270
6.5.2 Trace Messages ....................................................................................................... 272
2162.book Seite 10 Mittwoch, 22. September 2021 8:49 20
11
Contents
6.5.3 Tracking Model Metrics ........................................................................................ 273
6.5.4 Kubernetes Dashboard and Cluster Logs ....................................................... 273
6.6 Summary ................................................................................................................................... 273
7 Creating Operators and Data Types 275
7.1 Creating Custom Operators .............................................................................................. 276
7.1.1 Visibility of Events .................................................................................................. 277
7.1.2 Compatibility of Port Types ................................................................................. 277
7.1.3 Creating and Editing Operators ......................................................................... 281
7.2 Implementing Runtime Operators ................................................................................ 288
7.2.1 Subengines in SAP Data Intelligence Modeler .............................................. 288
7.2.2 Working with Subengines to Create Operators ........................................... 289
7.3 Creating Data Types ............................................................................................................. 290
7.3.1 Predefined Global Scalar Types .......................................................................... 291
7.3.2 Defining Your Own Custom Data Types ......................................................... 292
7.3.3 Leveraging Data Types in Graphs ...................................................................... 293
7.4 Summary ................................................................................................................................... 293
8 Building Docker Images 295
8.1 Containers in Pods and Pods in Clusters ..................................................................... 295
8.1.1 Delivery of Data-Driven Applications .............................................................. 295
8.1.2 Helm: Package Manager for Kubernetes ........................................................ 296
8.1.3 Dockerfiles: Predefined Runtime Environments .......................................... 297
8.2 Assembling a Docker Image ............................................................................................. 298
8.2.1 Building Docker Images through Dockerfiles ............................................... 298
8.2.2 Enhancing Docker Images with Different Package Managers ................ 302
8.3 Dockerfile Inheritance ......................................................................................................... 303
8.4 Using Docker with Python ................................................................................................. 305
8.5 Summary ................................................................................................................................... 308
2162.book Seite 11 Mittwoch, 22. September 2021 8:49 20
Contents
12
9 Machine Learning 309
9.1 Machine Learning with SAP .............................................................................................. 310
9.1.1 Machine Learning Solutions in the SAP Landscape .................................... 311
9.1.2 TEI Methodology in Machine Learning ........................................................... 313
9.1.3 Transforming Business Use Cases with Machine Learning ..................... 318
9.1.4 Data-Driven Approach versus Traditional Rule-Based Approach ........... 319
9.1.5 Machine Learning Tasks in Enterprise Contexts .......................................... 321
9.1.6 Architectural Principles for Machine Learning ............................................. 325
9.2 Machine Learning with SAP Data Intelligence ......................................................... 328
9.2.1 Scalable Data Pipelines in Complex Data Landscapes ............................... 329
9.2.2 Data and Algorithms as Assets for Machine Learning ............................... 331
9.2.3 Leveraging Open-Source Environments and Skills ...................................... 331
9.3 Using the ML Scenario Manager ..................................................................................... 333
9.3.1 ML Scenario Manager Overview ........................................................................ 333
9.3.2 Setting Up a Scenario in ML Scenario Manager ........................................... 334
9.3.3 Integrating Hyperscale Data and Targets ...................................................... 339
9.3.4 Leveraging Scenario Templates for Machine Learning .............................. 340
9.3.5 Dockerfile Building and Grouping .................................................................... 345
9.3.6 Implementing TensorFlow Pipelines ............................................................... 347
9.3.7 Training and Deploying Models with New Versions .................................. 350
9.3.8 Metrics Explorer and Machine Learning Tracking SDK .............................. 360
9.3.9 Run Collection and Run Performance .............................................................. 363
9.3.10 Visualizing SAP Data Intelligence Metrics with SAP Analytics Cloud ... 363
9.4 ML Data Manager in Data Workspaces and Data Collections ........................... 365
9.4.1 Data Workspaces and Data Collections .......................................................... 365
9.4.2 Organizing Data Sets in Data Lakes ................................................................. 367
9.4.3 Curating a Data Collection .................................................................................. 368
9.4.4 Registering a Data Set ........................................................................................... 369
9.5 Summary ................................................................................................................................... 371
10 Jupyter Notebook 373
10.1 Jupyter Notebook Fundamentals ................................................................................... 374
10.1.1 Interactive Tool for Data Science Projects ...................................................... 374
10.1.2 Jupyter Notebook Dashboard and User Interface ....................................... 379
10.1.3 Data Analysis in Jupyter Notebook ................................................................... 381
2162.book Seite 12 Mittwoch, 22. September 2021 8:49 20
13
Contents
10.2 Working with SAP HANA Cloud ...................................................................................... 386
10.2.1 SAP HANA Cloud: Cloud Database as a Service ............................................ 387
10.2.2 Exploring SAP HANA Cloud on an SAP BTP Trial Account ......................... 389
10.2.3 Understanding the SAP HANA Cockpit and SAP HANA
Database Explorer .................................................................................................. 391
10.2.4 Using Jupyter Notebook in SAP BTP and Integration with SAP
HANA Cloud .............................................................................................................. 393
10.2.5 SAP Data Intelligence Connection .................................................................... 402
10.3 Data Science Experiments with Jupyter Notebook ................................................ 405
10.3.1 SAP HANA Embedded Machine Learning ....................................................... 406
10.3.2 Machine Learning Core Operators .................................................................... 413
10.3.3 SAP HANA ML Training Operator ...................................................................... 423
10.3.4 SAP HANA ML Inference Operator .................................................................... 425
10.4 JupyterLab as the Next-Gen Jupyter Notebook ....................................................... 430
10.4.1 JupyterLab: The Next-Gen User Interface with Built-In Libraries .......... 431
10.4.2 Accessing Jupyter Notebook Artifacts from JupyterLab ............................ 434
10.4.3 SAP HANA Python Client API .............................................................................. 436
10.5 Summary ................................................................................................................................... 437
11 SAP Data Intelligence Python SDK 439
11.1 Using SAP Data Intelligence Python SDK .................................................................... 440
11.1.1 Setting a Context in Jupyter Notebook ........................................................... 440
11.1.2 Data Lake API for SDL ............................................................................................ 441
11.1.3 Retrieving Machine Learning Scenario Metadata ....................................... 443
11.1.4 Training Container Using the SDK .................................................................... 444
11.1.5 Executing and Deploying Pipelines .................................................................. 447
11.2 Accessing Artifacts Using Methods ............................................................................... 448
11.3 Machine Learning Tracking SDK ..................................................................................... 450
11.3.1 Initializing Run for an Experiment .................................................................... 451
11.3.2 Grouping Runs in Run Collections .................................................................... 451
11.3.3 Analyzing Metrics and Logs ................................................................................ 454
11.4 Summary ................................................................................................................................... 454
2162.book Seite 13 Mittwoch, 22. September 2021 8:49 20
Contents
14
Part III Integration
12 Integrating with ABAP Systems 459
12.1 Integration Scenarios ........................................................................................................... 459
12.1.1 Scenarios and Use Cases for Integration ........................................................ 460
12.1.2 ABAP Metadata in the Metadata Explorer ..................................................... 461
12.2 Provisioning Data from ABAP Systems ........................................................................ 465
12.2.1 Exposing the CDS View ........................................................................................ 465
12.2.2 Connection Prerequisites for Data Extraction .............................................. 466
12.2.3 Connecting On-Premise Systems with the Cloud Connector .................. 467
12.3 Using Operators to Trigger Execution in an ABAP System ................................. 472
12.3.1 ABAP Operators to Trigger Function Modules or BAPIs ............................. 472
12.3.2 Prerequisites for ABAP Operators in Remote Systems ............................... 474
12.4 SAP BW/4HANA and SAP Data Intelligence Hybrid Data Virtualization ...... 478
12.4.1 Prerequisites in SAP Business Warehouse ..................................................... 478
12.4.2 Using Connection Type HANA_DB ................................................................... 480
12.4.3 Authorization Check for Services ...................................................................... 481
12.4.4 SAP BW Operator for Pipeline ............................................................................ 484
12.5 Additional Connectivity ...................................................................................................... 485
12.5.1 SAP Information Steward .................................................................................... 485
12.5.2 SAP HANA for SQL Data Warehousing ............................................................ 489
12.6 Summary ................................................................................................................................... 495
13 Integrating with Non-SAP Systems 497
13.1 Non-SAP Cloud System Connectivity ............................................................................ 497
13.1.1 Amazon S3 ................................................................................................................ 498
13.1.2 Amazon Redshift .................................................................................................... 500
13.1.3 Windows Azure Storage Blob ............................................................................. 501
13.1.4 Microsoft Azure SQL Data Warehouse ............................................................ 502
13.1.5 Microsoft Azure Data Lake .................................................................................. 503
13.1.6 Google Cloud Storage ........................................................................................... 506
13.1.7 Google BigQuery ..................................................................................................... 508
13.1.8 IBM Cloud Storage ................................................................................................. 509
2162.book Seite 14 Mittwoch, 22. September 2021 8:49 20
15
Contents
13.2 Non-SAP On-Premise System Connectivity ............................................................... 510
13.2.1 Oracle Relational Database Management System ..................................... 510
13.2.2 Microsoft SQL Server ............................................................................................. 512
13.3 Summary ................................................................................................................................... 513
14 Integrating Big Data Workloads with SAP Vora 515
14.1 SAP Vora in Kubernetes Framework ............................................................................. 516
14.1.1 System Management ............................................................................................ 516
14.1.2 SAP Vora Engine Architecture ............................................................................ 517
14.1.3 Accessing SAP Vora User Interface ................................................................... 520
14.1.4 SAP Vora Data Preview ......................................................................................... 521
14.1.5 Using SQL Editor ..................................................................................................... 522
14.1.6 Using SQL Scripts .................................................................................................... 523
14.2 Data Modeling in SAP Vora ............................................................................................... 524
14.2.1 Creating Database Schemas ............................................................................... 524
14.2.2 Creating Partition Schemes ................................................................................ 525
14.2.3 Creating Tables and Views .................................................................................. 527
14.2.4 Creating Calculated Columns ............................................................................ 532
14.2.5 Additional Functions for Views .......................................................................... 533
14.3 Hierarchies in SAP Vora ...................................................................................................... 536
14.3.1 SAP Vora SQL for Hierarchical Data Analysis ................................................. 537
14.3.2 Using Adjacency Table to Render a Hierarchy .............................................. 539
14.3.3 Caching Hierarchies with Materialized Views .............................................. 539
14.4 Full-Text Search in SAP Vora ............................................................................................. 540
14.4.1 Text Analysis Graphs in Modeler ...................................................................... 540
14.4.2 Linguistic and Semantic Analysis ...................................................................... 541
14.4.3 Full-Text Search on a Document Collection .................................................. 542
14.5 Summary ................................................................................................................................... 542
15 Integrating with SAP Data Warehouse Cloud 543
15.1 Overview of SAP Data Warehouse Cloud ................................................................... 543
15.1.1 SAP Cloud Services Ecosystem ........................................................................... 544
15.1.2 Setting Up the Trial Tenant ................................................................................. 546
15.2 Understanding Spaces ......................................................................................................... 549
15.2.1 Spaces as Virtual Workspaces ............................................................................ 549
2162.book Seite 15 Mittwoch, 22. September 2021 8:49 20
Contents
16
15.2.2 Development in a Space ....................................................................................... 554
15.2.3 Managing Spaces ................................................................................................... 556
15.3 Exploring Connections and Using the Data Builder ............................................... 561
15.3.1 Available Connection Types ................................................................................ 561
15.3.2 Data Builder: Model to Business Catalog ....................................................... 562
15.3.3 Space-Aware Integrated Story Builder ............................................................ 566
15.4 Data Builder in SAP Data Warehouse Cloud versus Pipelines in
SAP Data Intelligence .......................................................................................................... 570
15.5 Summary ................................................................................................................................... 570
16 Integrating with SAP Analytics Cloud 571
16.1 Overview of SAP Analytics Cloud ................................................................................... 571
16.1.1 Solution to Analyze, Plan, Predict, and Collaborate .................................... 572
16.1.2 Fundamental Components: Data, Models, and Stories ............................ 574
16.2 Use Operators: Read File, Formatter, and Producer .............................................. 582
16.2.1 Read File Operator .................................................................................................. 583
16.2.2 Decode Table Operator ......................................................................................... 584
16.2.3 SAP Analytics Cloud Formatter .......................................................................... 585
16.2.4 SAP Analytics Cloud Producer ............................................................................ 586
16.3 Pipelines to Train, Predict, and Visualize Data ......................................................... 587
16.3.1 Using the Dataset API ........................................................................................... 587
16.3.2 Data Set Provision and Consumption ............................................................. 589
16.4 Summary ................................................................................................................................... 591
Part IV System Management, Security, and Operations
17 Administration 595
17.1 System Management Command-Line Client Reference ...................................... 595
17.1.1 Command-Line Client for SAP Data Intelligence ......................................... 596
17.1.2 Using the VCTL Tool: JavaScript Utility ........................................................... 597
17.1.3 Useful Commands for Command-Line Client ............................................... 598
17.2 Administration Applications ............................................................................................ 599
17.2.1 Administrator Access ............................................................................................ 600
17.2.2 System Management ............................................................................................ 600
2162.book Seite 16 Mittwoch, 22. September 2021 8:49 20
17
Contents
17.2.3 License Management ............................................................................................ 611
17.2.4 Connection Management ................................................................................... 613
17.3 Monitoring the SAP Data Intelligence Modeler ....................................................... 616
17.3.1 Monitoring the Status of Graph Execution ................................................... 616
17.3.2 Tracing Messages to Isolate Problems and Errors ....................................... 621
17.3.3 Downloading Diagnostic Information for Graphs ...................................... 623
17.4 SAP Data Intelligence System Logging ........................................................................ 626
17.4.1 Kubernetes Cluster-Level Logging Mechanism ............................................ 627
17.4.2 Browsing Application Logs in the Diagnostics Kibana Web
User Interface .......................................................................................................... 629
17.4.3 Aggregating Logs in External Logging Service .............................................. 630
17.5 System Diagnostics ............................................................................................................... 631
17.5.1 SAP Data Intelligence Diagnostics: Diagnostics Grafana ......................... 631
17.5.2 Kubernetes Cluster Metrics ................................................................................ 633
17.5.3 Integrating Diagnostics with External APM Solution ................................ 635
17.6 Summary ................................................................................................................................... 637
18 Security 639
18.1 Approach to Data Protection ............................................................................................ 639
18.1.1 Business Semantics for Industry-Specific Legislations .............................. 640
18.1.2 Functions for Data Privacy Compliance .......................................................... 641
18.1.3 Security Features for Data Protection and Privacy ...................................... 641
18.2 Authenticating Services and Users ................................................................................ 642
18.2.1 Roles and Scope-Driven User Access Control ................................................ 642
18.2.2 SAP BTP User Account and Authentication ................................................... 644
18.2.3 Self-Signed Certificate Authority and TLS ...................................................... 649
18.2.4 Leveraging Policy Management for Access Control .................................... 649
18.2.5 Enabling Security Features on Kubernetes Cluster ..................................... 657
18.3 Securely Connecting On-Premise Systems ................................................................. 658
18.3.1 Cloud Connector ..................................................................................................... 658
18.3.2 Site-to-Site Virtual Private Network ................................................................ 659
18.3.3 Virtual Private Cloud Peering ............................................................................. 659
18.4 Summary ................................................................................................................................... 659
2162.book Seite 17 Mittwoch, 22. September 2021 8:49 20
Contents
18
19 Maintenance 661
19.1 Understanding Operational Modes or Run Levels .................................................. 661
19.2 Switching the Platform to Maintenance Mode ....................................................... 662
19.2.1 Enabling or Disabling Maintenance Mode .................................................... 663
19.2.2 Restarting SAP Data Intelligence Services ...................................................... 664
19.2.3 Setting Up a Remote Connection to SAP ........................................................ 664
19.3 Increasing System Management Persistent Volume Size ................................... 665
19.3.1 Persistent Volume Error Handling .................................................................... 665
19.3.2 Changing the Persistent Storage Size of the SAP Vora Disk Engine ...... 667
19.3.3 Changing the Buffer and File Size of the SAP Vora Disk Engine ............. 668
19.4 Performing Backups ............................................................................................................. 668
19.5 Summary ................................................................................................................................... 671
20 Application Lifecycle Management 673
20.1 Version Control System ...................................................................................................... 673
20.2 Git ................................................................................................................................................. 674
20.2.1 Git Basics and Terminology ................................................................................. 675
20.2.2 Git Integration and CI/CD Process .................................................................... 678
20.2.3 Setting Up Your Environment for Git Workflows ........................................ 697
20.3 Continuous Integration and Continuous Delivery ................................................. 707
20.3.1 Continuous Integration Best Practices ............................................................ 707
20.3.2 Leveraging SAP Solutions for CI/CD ................................................................. 712
20.4 DevOps Fundamentals and Tools ................................................................................... 713
20.4.1 The Core Tenets of DevOps ................................................................................. 715
20.4.2 Implement Tooling for DevOps ......................................................................... 718
20.4.3 DevOps for Hybrid Architectures ...................................................................... 719
20.5 SAP Data Intelligence as the MLOps Platform .......................................................... 723
20.5.1 Production Lifecycle of Machine Learning Models ...................................... 724
20.5.2 MLOps Challenges .................................................................................................. 726
20.5.3 MLOps Capabilities ................................................................................................ 727
20.6 Migrating from SAP Leonardo Machine Learning Foundation ......................... 730
20.6.1 Bring Your Own Model ......................................................................................... 731
20.6.2 Migrating the Training Data ............................................................................... 733
20.6.3 Adding the Training Data to a Data Lake ....................................................... 734
20.7 Summary ................................................................................................................................... 734
2162.book Seite 18 Mittwoch, 22. September 2021 8:49 20
19
Contents
21 Business Content and Use Cases 737
21.1 Digital Transformation and SAP Data Intelligence ................................................ 737
21.2 Business Content by Industry .......................................................................................... 740
21.3 Finance Use Cases .................................................................................................................. 746
21.4 Supply Chain Use Cases ...................................................................................................... 747
21.5 Manufacturing Use Cases .................................................................................................. 749
21.6 Summary ................................................................................................................................... 751
Appendices 753
A Outlook and Roadmap ........................................................................................................ 753
B The Authors .............................................................................................................................. 763
Index .......................................................................................................................................................... 765
2162.book Seite 19 Mittwoch, 22. September 2021 8:49 20
765
Index
/vflow directory ..................................................... 687
/vhome folder ........................................................ 699
A
ABAP .......................................................................... 459
best practices ..................................................... 710
CDS views ............................................................ 465
certificate ............................................................ 466
connect with SAP BTP ..................................... 470
connect with SAP Data Intelligence ......... 471
connection prerequisites ............................... 466
data provisioning ............................................ 465
execute functions ............................................. 460
operator prerequisites .................................... 474
operators ............................................................. 472
use cases .............................................................. 460
ABAP CDS Reader operator ............ 465, 473, 477
ABAP Converter operator .................................. 472
ABAP integration ..................................................... 60
ABAP ODP operator ............................................. 474
Access control ........................................................ 642
manage policies ................................................ 649
Access control list (ACL) ..................................... 107
Access key ................................................................ 118
Access point ............................................................ 129
Account ........................................................... 100, 104
active ..................................................................... 106
assign users ........................................................ 126
choose ................................................................... 125
create .................................................................... 122
owner .................................................................... 100
user ........................................................................ 114
Adam algorithm .................................................... 348
Adjacency list ......................................................... 539
Administration ...................................................... 595
applications ........................................................ 599
monitoring ......................................................... 616
system diagnostics .......................................... 631
system logging .................................................. 626
tile ........................................................................... 175
Administrative service ....................................... 266
Administrator ........................... 100, 182, 600, 661
Algorithm ....................................................... 320, 331
APL ......................................................................... 410
data-driven approach .................................... 320
deep learning ..................................................... 348
embedded in SAP HANA ................................ 425
Algorithm (Cont.)
examples ............................................................. 409
PAL ......................................................................... 407
personalize ......................................................... 385
Alias ........................................................................... 534
Amazon Elastic Container Registry
(Amazon ECR) ................................................... 144
Amazon Redshift .................................................. 500
Amazon Simple Storage Service
(Amazon S3) .............................................. 498, 734
Amazon Web Services (AWS) ........................... 109
connect ................................................................ 122
console URL ........................................................ 114
monitor ................................................................ 144
policies ................................................................. 115
quota error ......................................................... 110
register as cloud provider ............................. 114
sizing .................................................................... 108
Analytics .................................................. 42, 618, 739
processing ........................................................... 326
SAP Analytics Cloud ....................................... 572
stories ................................................................... 567
usage .................................................................... 648
Anonymization ..................................................... 392
Apache Kafka ................................................. 332, 381
API server .................................................................... 76
Appliance ................................................................. 100
Application development and integration ... 42
Application development machine
learning ............................................................... 327
Application Function Library (AFL) ............... 407
Application instance ........................................... 188
Application integration ........................................ 34
Application lifecycle management ............... 673
CI/CD .................................................................... 707
DevOps ................................................................. 713
Git .......................................................................... 674
MLOps .................................................................. 723
Application log ...................................................... 629
Application management ................................. 608
properties ............................................................ 610
Application management services
(AMS) .................................................................... 722
Application performance management
(APM) .................................................................... 635
Application programming interface (API) .... 56
Google Cloud Platform .................................. 120
public .................................................................... 758
2162.book Seite 765 Mittwoch, 22. September 2021 8:49 20
Index
766
Architecture ....................................................... 51, 60
decision points ..................................................... 64
Kubernetes ............................................................. 73
Kubernetes clusters ............................................ 75
microservices ........................................................ 74
Artifact ...................................................................... 415
class ............................................................. 439, 448
Artifact Consumer operator ................... 416, 428
inputs and outputs .......................................... 417
Artifact Producer operator ............. 276, 368, 414
configuration parameters ............................ 415
inputs and outputs .......................................... 416
Artificial intelligence (AI) ..................... 39, 53, 309
Attribute ................................................................... 555
Auditing .................................................................... 181
Auditor ............................................................ 179, 182
Authentication ...................................................... 642
Authorization ............................................... 107, 359
check ...................................................................... 481
connections ........................................................ 254
Google Cloud Storage ..................................... 507
OAuth .................................................................... 589
SAP BW users ..................................................... 484
SAP HANA users ................................................ 483
scenarios .............................................................. 555
type .............................................................. 120, 123
Automated acceptance testing ....................... 708
Automated Predictive Library (APL) ... 311, 410
prerequisites ....................................................... 410
Automatic invoice posting ............................... 744
Automatic lineage extraction .......................... 233
Automation ............................................................. 718
AutoML .................................................... 62, 312, 315
Autoscaling ............................................................. 755
B
Backup .................................................... 112, 167, 668
files ......................................................................... 669
Banking ..................................................................... 741
Base operator .......................................................... 282
Base strategy ........................................................... 603
Benchmark .............................................................. 319
Best practices .......................................................... 707
Bias ................................................................... 321, 384
Big data .................................................... 46, 178, 515
Binary Large Object (BLOB) file ....................... 412
Binary target ........................................................... 410
Blocking .................................................................... 640
Bokeh ......................................................................... 377
Box plot ..................................................................... 378
Brainstorming workshop .................................. 318
Branch ........................................................................ 701
Bring your own license (BYOL) .......................... 63
Bring your own model (BYOM) ....................... 731
Bugfix branch ......................................................... 704
Build server .................................................... 680, 690
Build step .................................................................. 695
Build trigger ............................................................. 694
Business Builder .......................................... 552, 565
artifacts ................................................................ 555
Business catalog ..................................................... 562
Business content ......................................... 737, 740
Business entity ....................................................... 555
Business Entity Recognition ............................... 58
Business glossary ........................................ 175, 228
create new term ................................................ 228
Business model innovation .............................. 739
Business purpose .................................................. 640
Business user .......................................................... 100
C
Caching ...................................................................... 539
Calculated column ...................................... 532, 533
Calendar .................................................................... 573
Canvas ........................................................................ 580
Capabilities ................................................................ 51
Cash flow analysis ................................................. 317
Catalog ............................................................. 175, 205
browse connections ......................................... 197
manage publications ...................................... 211
metrics .................................................................. 195
view metadata ................................................... 207
Certificate authority (CA) ................................... 642
self-signed ............................................................ 649
Certificates ............................................................... 253
ABAP ...................................................................... 466
import ................................................................... 254
manage ................................................................. 615
self-signed CAs ................................................... 649
Change data capture (CDC) ...................... 267, 461
Chart ................................................................. 296, 378
create ........................................................... 361, 581
SAP Vora .............................................................. 521
stories .................................................................... 568
Chemicals ................................................................. 741
Classification ........................................................... 425
model ..................................................................... 418
Client-server architecture .................................... 75
Client-side library .................................................. 436
Cloud application .................................................. 266
Cloud connector ................................ 251, 467, 710
access ..................................................................... 468
2162.book Seite 766 Mittwoch, 22. September 2021 8:49 20
767
Index
Cloud connector (Cont.)
connect with SAP BTP ..................................... 469
exposed backend systems ............................. 470
features ................................................................ 468
security ................................................................. 658
use .......................................................................... 658
Cloud data integration .......................................... 60
API .......................................................................... 266
Cloud deployment ........................................ 66, 103
Cloud Foundry ............................................. 251, 713
enable ................................................................... 644
Cloud integration ................................................. 710
Cloud Native Computing Foundation
(CNCF) ............................................................ 68, 296
Cloud provider ................................................ 44, 109
account ................................................................ 100
AWS ........................................................................ 114
costs ............................................................. 102, 109
Google Cloud Platform .................................. 120
Microsoft Azure ................................................ 119
monitoring ......................................................... 144
prerequisites ....................................................... 114
register ................................................................. 114
select ...................................................................... 122
sizing ..................................................................... 108
Cloud provisioning .............................................. 107
Cloud services ........................................................ 544
architecture ........................................................ 545
Cloud vendor ............................................................. 43
Cloud-enabled profile ......................................... 721
Cloud-native profile ............................................. 720
Cluster ................................................................ 68, 295
admin .................................................................... 604
IPython ................................................................. 380
manage ................................................................ 602
metrics .................................................................. 633
security ................................................................. 657
storage .................................................................. 665
subnet ................................................................... 129
view ........................................................................ 601
Cluster Overview dashboard ............................ 633
Clustering ................................................................. 425
Cluster-level logging ............................................ 627
Code management .................................................. 73
Cold data ................................................................... 388
Collaboration .......................................................... 573
Column ..................................................................... 532
transform ............................................................ 576
Command-line interface (CLI) ...... 186, 296, 595
commands .......................................................... 152
Communciation scenario .................................. 467
Communication protocol ................................. 107
Communication security .................................. 642
Complex materials .............................................. 750
Concept drift .......................................................... 730
Configuration Manager ..................................... 329
Configuration pane ....... 240, 244, 284, 346, 414
Configuration type .............................................. 241
Connection Management ......... 61, 88, 172, 250,
402, 613
ABAP systems .................................................... 471
authorizations .................................................. 254
connect to Cloud Foundry ............................ 251
create connections .......................................... 250
manage certificates ............................... 253, 615
manage connections ...................................... 613
metadata crawling ......................................... 196
non-SAP cloud systems ................................. 497
options ................................................................. 173
SAP HANA ........................................................... 480
WASB .................................................................... 501
Connection Manager .......................................... 179
Connection type ............. 172, 250, 251, 497, 614
ABAP ..................................................................... 461
ADLS ...................................................................... 503
Amazon S3 .......................................................... 498
AZURE_SQL_DB ............................................... 502
cloud connector gateway ............................. 471
GCP_BIGQUERY ............................................... 508
HANA_DB ........................................ 402, 480, 489
MSSQL .................................................................. 512
Oracle ................................................................... 511
SAP BW ................................................................. 478
SAP Data Warehouse Cloud ........................ 561
SDL ......................................................................... 509
tables .................................................................... 528
TLS ......................................................................... 403
Consistency check ................................................ 669
Constant Generator operator ................ 286, 342,
417, 428
Consumer products ............................................. 741
Consumption model ........................................... 555
Container ............................ 68, 71, 78, 83, 295, 297
create .................................................................... 691
Docker .................................................................. 298
images ..................................................................... 75
registry ........................................................ 152, 167
runtimes ................................................................. 79
Container-based deployment ..................... 69, 71
containerd .................................................................. 79
ContentType tag ................................................... 207
Contextual AI ............................................................ 62
2162.book Seite 767 Mittwoch, 22. September 2021 8:49 20
Index
768
Continuous integration/continuous
delivery (CI/CD) ............ 72, 315, 673, 678, 707
best practices ..................................................... 707
pipelines ............................................................... 690
SAP solutions ..................................................... 712
Controller manager ................................................ 77
Core data services (CDS) view ................ 461, 465
expose ................................................................... 465
operator ............................................................... 473
Cost calculator .......................................................... 66
Cost Explorer API .................................................. 111
Cost forecast ........................................................... 126
Crawling .................................................................... 196
CRI-O ............................................................................. 79
Cron job ....................................................................... 82
Cross channel integration ................................. 739
Cross industry ........................................................ 742
Custom ABAP operator ...................................... 474
Custom operator ................................................... 275
add ports ............................................................. 283
base ........................................................................ 282
configuration ........................................... 284, 285
create .......................................................... 276, 281
deploy ................................................................... 286
documentation ................................................. 286
edit ......................................................................... 287
output probability ........................................... 287
script ...................................................................... 284
subengines .......................................................... 289
Custom resource ................................................... 667
Customer Data Export ........................................ 176
D
DaemonSet ....................................................... 82, 630
Data analysis ........................................................... 381
statistical modeling .............................. 384, 386
Data Attribute Recommendation ..................... 58
Data Builder ......................................... 552, 561, 570
artifacts ................................................................ 554
connection types .............................................. 561
create graphical view ..................................... 563
create SQL view ................................................. 564
create table ......................................................... 563
import files .......................................................... 552
Data category .......................................................... 369
Data collection ............................................. 323, 365
create .................................................................... 366
curate .................................................................... 368
Data composition .................................................... 39
Data consumption ............................ 174, 482, 559
Data crawling .......................................................... 461
Data deletion ........................................................... 640
Data democratization ............................................ 38
Data drift ................................................................... 730
Data engineer .................................................. 59, 374
applications ........................................................ 172
Data exploration ......................................... 374, 375
Data fabric .................................................... 33, 34, 37
benefits ................................................................... 37
trends ....................................................................... 35
Data flow ......................................................... 239, 562
Data Frame API ....................................................... 312
Data governance .............................. 39, 61, 94, 193
sizing ....................................................................... 97
Data ingestion ................................................ 45, 574
Data integration ............................................ 60, 326
Data lake ...................................... 367, 388, 439, 441
access ..................................................................... 442
add training data ............................................. 734
SAP HANA ............................................................ 757
storage system ................................................... 367
Data Lake API ................................................ 440, 441
Data lineage ............................................................. 230
extract ......................................................... 231, 233
view ........................................................................ 234
Data modeler .......................................................... 179
Data orchestration ................... 34, 38, 48, 61, 237
connections ......................................................... 172
Data pipeline .......... 48, 49, 61, 89, 175, 239, 288,
325, 329, 413
best practices ...................................................... 709
CI/CD ..................................................................... 690
create ..................................................................... 340
schedule ................................................................ 270
sizing ....................................................................... 98
Data platform ........................................................... 36
Data preparation ................................................... 255
actions .............................................. 256, 258, 259
manage tasks ..................................................... 258
monitor ................................................................. 260
Data preview ................................................. 199, 404
SAP Vora .................................................... 521, 528
Data privacy ............................................................. 641
Data profiling ................................................ 197, 255
actions and monitor ....................................... 198
Data protection ...................................................... 639
Data provider service ........................................... 266
Data provisioning ................................................. 460
ABAP ...................................................................... 465
Data quality rule .................................................... 214
Data science ............................................................. 309
experiments ........................................................ 405
projects ....................................................... 318, 374
2162.book Seite 768 Mittwoch, 22. September 2021 8:49 20
769
Index
Data scientist ....... 49, 56, 59, 310, 316, 319, 323,
365, 374, 406
applications ........................................................ 177
approaches ......................................................... 320
Data serialization .................................................. 154
Data set ........................................................... 193, 333
ABAP ...................................................................... 461
actions ........................................................ 257, 259
balance ................................................................. 385
create collection ............................................... 366
distribution ......................................................... 195
document ............................................................ 519
exploratory analysis ....................................... 381
extract lineage ........................................ 232, 233
fact sheet ............................................................. 199
hierarchies .......................................................... 536
import ................................................................... 575
inference .............................................................. 425
manage tags ...................................................... 210
metadata ............................................................. 462
metrics .................................................................. 200
organize in data lakes .................................... 367
outliers ................................................................. 384
profile ................................................. 198, 207, 255
publish ........................................................ 202, 204
register ................................................................. 369
trace ....................................................................... 230
train and test ..................................................... 384
transform ............................................................ 256
view fact sheet ................................................... 464
view metadata .................................................. 206
visualize ............................................................... 383
Data source ....................................................... 46, 323
inference .............................................................. 425
streaming ............................................................ 340
tables ..................................................................... 529
Data sprawl ................................................................. 36
Data steward ................................................. 174, 215
Data tiering ............................................................. 544
Data Transfer operator ....................................... 484
Data Transform operator ................................... 492
Data Transport operator .................................... 482
Data type .................. 241, 242, 275, 278, 290, 382
create .................................................................... 292
leverage ................................................................ 293
Data visualization ................................................. 377
Data volume .............................................................. 94
Data workflow ........................................................ 264
Data workspace ...................................................... 365
Data wrangling ............................................. 324, 576
Database and data management ....................... 42
Database schema .................................................. 524
Database view ........................................................ 530
catalog ................................................................. 532
Data-driven application ..................................... 295
Data-driven approach ...................... 310, 319, 321
benefits ................................................................ 320
Dataset API .............................................................. 587
Debugging ............................................................... 709
Decode Table operator .............................. 364, 584
Deep learning ......................................................... 348
Default branch .............................................. 702, 703
Delivery team ........................................................ 708
Delta load ................................................................. 473
Deploy model ........................................................ 421
Deployment .............................................................. 63
cloud ..................................................................... 103
controller ............................................................... 81
custom operators ............................................ 286
decision making .................................................. 64
evolution ................................................................ 70
Kubernetes ..................................................... 73, 85
machine learning models ............................. 354
modular ............................................................... 151
on-premise ......................................................... 150
pipeline ................................................................ 448
pods .......................................................................... 79
stack.xml ............................................................. 162
traditional to container-based ...................... 70
URL ........................................................................ 429
version control system .................................. 674
Develop branch ..................................................... 703
Development environment ................................ 96
DevOps ........................................... 85, 327, 713, 715
design ................................................................... 716
hybrid architecture ......................................... 719
phases .................................................................. 716
six pillars ............................................................. 714
tools ............................................................. 718, 721
versus MLOps .................................................... 724
DI_DATA_LAKE .................................. 202, 367, 441
Diagnostic report ................................................. 621
download ............................................................ 623
structure .............................................................. 624
Diagnostics Grafana ..................................... 89, 631
cluster metrics ................................................... 633
dashboard .......................................................... 632
Diagnostics Kibana ........... 89, 181, 628, 631, 719
features ................................................................ 629
Digital transformation .............................. 737, 739
Dimension .............................................................. 577
Discovery dashboard .......................................... 195
Disk engine ............................................................. 667
sizing .................................................................... 668
2162.book Seite 769 Mittwoch, 22. September 2021 8:49 20
Index
770
Distributed data management ........................... 98
Distributed Logs (DLogs) ................................... 664
Docker .................................................. 46, 77, 79, 297
containers ................................................. 295, 298
create container ............................................... 691
images .............. 74, 80, 295, 297, 298, 302, 691
registry ................................................................. 152
use with Python ................................................ 305
Dockerfiles ............................................ 295, 297, 298
add to Python operator ................................. 346
best practices ..................................................... 711
build ....................................................................... 345
create ................................................. 298, 300, 345
create tags .......................................................... 301
inheritance .......................................................... 303
library installation .......................................... 345
Document Classification ...................................... 58
Document Information Extraction ........ 58, 747
Document store engine ........................... 518, 519
tables ..................................................................... 520
E
Eclipse ........................................................................ 396
ECMAScript ............................................................. 597
Elasticsearch ........................................................... 628
End of purpose ....................................................... 640
Endpoint ................................................ 134, 145, 643
Enterprise data .......................................................... 35
Enterprise information management
(EIM) ............................................................... 51, 544
Enterprise platform ............................................. 327
Enterprise strategy .................................................. 67
Entitlements ........................................................... 646
Entity relationship model ................................. 554
create .................................................................... 564
Epoch ......................................................................... 348
etcd ................................................................................ 76
Event .......................................................................... 277
Evolution ............................................................. 52, 58
Execution log .......................................................... 523
execution.json file ................................................ 625
Experience data ........................................................ 42
Expert sizing .............................................................. 95
Exploratory data analysis .................................. 381
input data set .................................................... 381
Export/import ....................................................... 186
Extended Machine Learning Library
(EML) ..................................................................... 311
Extended strategy ................................................. 603
eXtensible Access Control Markup
Language (XACML) .......................................... 649
Extension manager .............................................. 432
External logging service ..................................... 630
F
Fact model ................................................................ 555
Fact sheet .................................................................. 199
view .............................................................. 404, 464
Feasibility study ..................................................... 319
Feature branch ....................................................... 704
Feature release cycle .............................................. 67
File browser ............................................................. 432
File system ............................................................... 605
FileHandler class .................................................... 448
Files ................................................................... 186, 379
engine .................................................................... 667
Finance ...................................................................... 746
Flow-based programming ................................. 238
Fluentd ............................................................ 628, 630
Food storage and maintenance ....................... 745
Force build ............................................................... 302
Freestyle project .................................................... 692
Full-text search ....................................................... 542
G
Garbage collection ........................................ 82, 710
Git ................................................................................ 674
/vflow folder ....................................................... 687
best practices ...................................................... 677
branching .................................................. 701, 702
commands ................................................. 675, 676
enable client ....................................................... 680
environment ....................................................... 679
file statuses ......................................................... 678
generate token ................................................... 683
integration ...................................... 678, 698, 728
repository ..................... 682, 685, 688, 700, 705
set up environment .......................................... 697
workflows .................................................. 697, 699
GitHub ................................................... 333, 682, 719
Jupyter ................................................................... 374
repository ............................................................ 685
trigger .................................................................... 695
GitOps .......................................................................... 85
approach ................................................................ 73
Global account ........................................................ 390
Glossary category .................................................. 228
Glossary metrics .................................................... 195
Google BigQuery .................................................... 508
Google Cloud Platform ....................................... 109
connect ................................................................. 124
2162.book Seite 770 Mittwoch, 22. September 2021 8:49 20
771
Index
Google Cloud Platform (Cont.)
console URL ........................................................ 114
quota error .......................................................... 110
register ................................................................. 120
sizing ..................................................................... 109
Google Cloud Storage .......................................... 506
Governance ............................................... 39, 61, 193
GPU support ........................................................... 167
Gradient boost classifier .................................... 410
Graph ............................................ 180, 239, 244, 248
categories ............................................................ 246
create .......................................................... 245, 413
dead instance ..................................................... 619
diagnostics ................................................ 621, 623
editor ..................................................................... 240
engine ......................................................... 518, 519
execute ................................................................. 248
inference .............................................................. 357
leverage data types ......................................... 293
monitor ................................................................ 616
operators ................................................... 239, 247
process logs ........................................................ 249
Push to SAP Analytics Cloud ....................... 582
reuse ...................................................................... 261
section .................................................................. 241
statuses ................................... 248, 287, 617, 618
templates ................................................... 330, 415
training ................................................................ 352
validate ................................................................ 248
Graph snippet ......................................................... 262
create .................................................................... 264
operators ............................................................. 262
types ...................................................................... 262
Graph Terminator operator .......... 239, 245, 446,
494, 710
graphs.json file ....................................................... 624
Grid search .............................................................. 321
Guided vendor onboarding .............................. 744
H
Hadoop ............................................. 43, 45, 178, 518
hana_ml .................................................................... 399
Handshake ............................................................... 123
Hash partitioning ................................................. 526
hdbcli ......................................................................... 398
Helm .......................................................................... 296
install .................................................................... 296
Hibernation ............................................................. 755
Hierarchical tagging ............................................ 207
Hierarchy ....................................................... 535, 536
build using adjacency tables ....................... 539
Hierarchy (Cont.)
caching ................................................................ 539
SQL for data analysis ..................................... 537
Hold out data set .................................................. 349
Home page .............................................................. 170
Horizontal scaling ................................................... 85
Hot data .................................................................... 388
Hotfix branch ........................................................ 704
development ...................................................... 705
Human resources ................................................. 742
Hybrid data processing ......................................... 34
Hybrid data virtualization ................................ 478
Hybrid landscape .......................................... 36, 720
Hyperparameter grid search technique ...... 321
Hyperscale data ..................................................... 339
Hyperscaler ............................................. 47, 387, 389
Hypervisor ................................................................. 71
I
IBM Storage ............................................................ 509
Ideate phase ............................................................ 717
Image composer ...................................................... 85
Implicit access ....................................................... 440
Industry .................................................................... 740
use cases .............................................................. 741
Inference model .................................................... 421
Inference pipeline ....................................... 169, 355
deploy ................................................................... 357
Information Access (InA) protocol ................ 478
Ingress ....................................................................... 165
controller ............................................................ 165
Inheritance .............................................................. 303
Initial load ............................................................... 473
Initial sizing ........................................................ 95, 97
Inner join ................................................................. 537
Innovations ............................................................ 754
Installation ................................................ 64, 94, 154
download and initialize SLC Bridge ......... 155
on-premise ......................................................... 150
postinstallation configuration .................. 165
prepare Kubernetes environment ............. 154
prerequisites ...................................................... 150
run maintenance planner ............................ 158
test ......................................................................... 167
troubleshooting ............................................... 164
use SLC Bridge Base ........................................ 160
Instance .................................................................... 103
active .................................................................... 106
backup ................................................................. 112
basic versus advanced modes .................... 125
create ........................................................... 105, 124
2162.book Seite 771 Mittwoch, 22. September 2021 8:49 20
Index
772
Instance (Cont.)
details ................................................................... 126
monitoring ............................................... 144, 618
restore ................................................................... 113
SAP BTP cockpit ................................................ 390
sizing ..................................................................... 107
status .................................................................... 134
terminate ............................................................. 113
Integration .................................................................. 34
ABAP ...................................................................... 459
cloud best practices ......................................... 710
Git ........................................................................... 678
Google Cloud Storage ..................................... 506
non-SAP systems .................................... 497, 510
SAP Analytics Cloud ........................................ 571
SAP BW/4HANA ................................................ 478
SAP Data Warehouse Cloud ......................... 543
SAP HANA for SQL data warehousing ..... 489
SAP Information Steward ................... 485, 488
SAP Vora .............................................................. 515
third-party .......................................................... 757
use cases .............................................................. 460
Integrity constraint .............................................. 524
Intelligent enterprise ..................................... 33, 41
Intelligent information management ............ 38
Intelligent robotic process
automation (iRPA) .............................................. 55
Intelligent suite ........................................................ 41
Intelligent technologies ................................ 40, 43
Interactive Python (IPython) ................. 373, 374
clusters ................................................................. 380
widgets ................................................................. 375
Interquartile range ..................................... 378, 383
Invoice Object Recommendation ........... 58, 311
IP address ................................................................. 127
IPython Parallel package .................................... 380
IT personnel ............................................................ 322
J
JavaScript ................................................................. 597
Jenkins ....................................................................... 690
access .................................................................... 691
build ....................................................................... 695
build output ....................................................... 695
create freestyle project ................................... 692
Jira ............................................................................... 719
Job .................................................................................. 82
manage ................................................................ 620
Jump box .................................................................. 128
access .................................................................... 139
external IP address .......................................... 135
Jump box (Cont.)
import session .................................................... 142
set up ..................................................................... 136
status ..................................................................... 134
Jupyter Notebook ............... 44, 46, 311, 328, 332,
347, 373
access artifacts from JupyterLab ................ 434
basics ..................................................................... 374
connect to SAP HANA Cloud .............. 398, 400
create ........................................................... 336, 405
create file ............................................................. 380
dashboard ........................................................... 379
data analysis ...................................................... 381
data science experiments .............................. 405
install IPython widgets .................................. 375
optimizer and loss functions ....................... 348
run via SAP Business Application
Studio ..................................................... 396, 398
set the context ................................................... 440
start ........................................................................ 379
working with SAP HANA Cloud .................. 386
write data into SAP HANA Cloud ............... 400
JupyterLab ......................... 336, 343, 430, 431, 441
access Jupyter Notebook artifacts ............. 434
completer ............................................................. 435
create experiment ............................................ 405
discover extensions ......................................... 433
features ................................................................. 431
output views ....................................................... 435
web interface ...................................................... 430
K
Kafka Consumer operator ................................... 61
Kafka Producer operator ...................................... 61
Keras ........................................................................... 347
Kernel ......................................................................... 432
Kibana Query Language (KQL) ......................... 629
kube .............................................................................. 77
kube-apiserver .......................................................... 76
kubeconfig ............................................................... 154
commands ........................................................... 155
Kubectl ...................................................... 76, 154, 662
Kubelet ........................................................................ 77
Kube-proxy ................................................................ 78
Kubernetes .................................. 45, 46, 68, 83, 719
advantages ........................................................... 85
best practices ...................................................... 711
cluster metrics ................................................... 633
cluster-level logging ........................................ 627
clusters ...... 68, 75, 85, 108, 129, 151, 295, 516
critical factors ...................................................... 72
2162.book Seite 772 Mittwoch, 22. September 2021 8:49 20
773
Index
Kubernetes (Cont.)
dashboard ........................................................... 273
distributions ....................................................... 151
features ................................................................... 74
get nodes ............................................................. 140
overview .................................................................. 68
package managers .......................................... 296
prepare environment ..................................... 154
SAP Vora .............................................................. 516
security ................................................................. 657
sizing ..................................................................... 108
supported versions ............................................. 64
upgrade ................................................................ 144
L
Label ........................................................................... 453
Landscape sizing ...................................................... 93
Launchpad .................................................................. 86
access .................................................................... 147
add applications ............................................... 170
applications ........................................................ 169
home screen ....................................................... 170
personalize ............................................................. 87
Legislation ............................................................... 640
Library ....................................................................... 337
client-side ............................................................ 436
external ................................................................ 433
import ......................................................... 339, 399
install .................................................................... 345
Jupyter Notebook ............................................. 375
JupyterLab ........................................................... 431
machine learning tracking SDK ................. 361
plotting ................................................................ 377
License key .................................................... 189, 611
install .................................................................... 167
permanent .......................................................... 612
License Management ................................ 188, 611
system licenses .................................................. 612
Licensing ................................................. 64, 189, 611
Limit range ................................................................. 83
Line chart ................................................................. 377
Lineage analysis .................................................... 230
extract lineage .................................................. 232
view ........................................................................ 234
Lineage depth ......................................................... 214
Log ............................................................................... 627
aggregate ............................................................ 630
browse .................................................................. 629
message ............................................................... 628
metrics .................................................................. 452
Log file ...................................................................... 141
copy ....................................................................... 141
M
Machine learning ......... 40, 46, 55, 309, 310, 328
approaches ......................................................... 319
architectural principles ................................. 325
artifacts ...................................................... 178, 368
business use cases ........................................... 318
content .................................................................... 62
core operators ................................................... 413
data and algorithms ...................................... 331
data-driven ........................................................ 320
embedded applications .................................... 52
embedded in SAP HANA ...................... 406, 437
features ................................................................ 314
framework .......................................................... 327
migrate models ................................................ 732
model lifecycle .................................................. 724
models .................................................................. 324
object storage ................................................... 367
open-source environments .......................... 331
operations ....................................... 325, 328, 723
personas .............................................................. 322
solutions .............................................................. 311
tasks ............................................................. 321, 323
techniques .......................................................... 386
TEI methodology ............................................. 313
train and deploy models ............................... 350
workflow ............................................................. 430
Machine learning scenario ............................... 440
associate Dockerfiles ...................................... 345
create .................................................................... 335
create version .................................................... 353
display history .................................................. 354
Python SDK ........................................................ 444
retrieve metadata ............................................ 443
templates ............................................................ 340
upload data sets ............................................... 366
versions ................................................................ 350
Machine learning tracking SDK ............ 417, 419,
439, 450
collect metrics ................................................... 361
functions ............................................................. 452
use as a wrapper .............................................. 451
Main engine ............................................................ 289
Maintenance .......................................... 64, 331, 661
backups ................................................................ 668
Kubernetes ............................................................ 73
models .................................................................. 325
persistent volume size ................................... 665
2162.book Seite 773 Mittwoch, 22. September 2021 8:49 20
Index
774
Maintenance (Cont.)
restart services .................................................. 664
switch to maintenance mode ...................... 662
Maintenance planner ................ 64, 154, 158, 160
manifest.json file ........................................ 697, 700
Manufacturing ....................................................... 742
use case ............................................. 743, 748, 749
Markdown ................................................................ 380
command ............................................................ 374
Master branch ........................................................ 703
Master node ....................................................... 69, 75
components ........................................................... 75
Match Pattern operator ...................................... 220
Materialized view .................................................. 539
Measure .......................................................... 555, 577
Medical supply ordering .................................... 750
Memory calculator .................................................. 64
Memory usage ....................................................... 195
Metadata Catalog ..................................................... 45
Metadata Explorer ........................ 61, 89, 174, 194
ABAP ............................................................ 460, 461
browse connections .............................. 197, 256
business glossary ............................................. 228
connections ........................................................ 367
create folders ..................................................... 202
data profiling ..................................................... 197
data set actions ................................................ 204
import rules ........................................................ 485
lineage analysis ................................................ 231
manage preparation tasks ........................... 258
manage publications ............................ 202, 211
manage tags ...................................................... 208
rulebooks ............................................................. 221
rules ....................................................................... 215
self-service data preparation ...................... 255
tiles ......................................................................... 174
upload data ........................................................ 366
view fact sheet ................................ 199, 404, 464
view SAP HANA table ..................................... 403
Metric Overview dashboard ................... 273, 633
Metrics
graph ..................................................................... 249
history ................................................................... 453
Metadata Explorer .......................................... 195
run .......................................................................... 451
Metrics Explorer .............. 273, 360, 419, 439, 451
access .................................................................... 360
dashboard ........................................................... 360
notebook experiment ..................................... 362
Metrics Tracking API ........................................... 312
Microservices ............................................................ 74
Microsoft Azure ........................................... 109, 502
connect ................................................................. 123
console URL ......................................................... 114
data access .......................................................... 506
quota error .......................................................... 110
register .................................................................. 119
sizing ..................................................................... 109
Microsoft Azure Data Lake Storage
(ADLS) .................................................................... 503
Microsoft Azure SQL Data Warehouse ......... 502
Microsoft SQL Server ........................................... 512
Microsoft Visual Studio ...................................... 396
Migration .................................................................. 730
models ................................................................... 731
training data ...................................................... 733
Minimum sizing ............................................... 95, 96
Kubernetes clusters .......................................... 108
MinIO ......................................................................... 734
Missing value .......................................................... 384
analysis ................................................................. 324
ML Data Manager ........................................ 365, 441
ML Scenario Manager ..... 62, 177, 311, 328, 333,
405, 440
deploy pipelines ................................................ 357
executions ........................................................... 352
integrate data sources .................................... 340
metrics .................................................................. 353
overview ............................................................... 333
register data sets ............................................... 370
set up a scenario ............................................... 334
templates ................................................... 340, 428
test machine learning models ..................... 359
training pipeline execution .......................... 351
use case ................................................................. 334
ML Training operator .......................................... 420
MLOps ....................................................... 85, 328, 723
capabilities .......................................................... 727
challenges ............................................................ 726
stages of maturity ............................................ 727
versus DevOps .................................................... 724
Model ......................................................................... 578
deploy .................................................................... 354
deployment service .......................................... 732
drift ......................................................................... 730
execute .................................................................. 352
ideate phase ........................................................ 717
import ................................................................... 562
maintenance ...................................................... 325
metrics .................................................................. 273
migration ............................................................. 732
name ...................................................................... 352
production lifecycle ......................................... 724
2162.book Seite 774 Mittwoch, 22. September 2021 8:49 20
775
Index
Model (Cont.)
repository ..................................................... 56, 732
tab .......................................................................... 333
test ......................................................................... 359
train ............................................................. 349, 411
Model Serving operator ..................................... 421
inputs and outputs .......................................... 422
Modeler ...................... 45, 61, 84, 89, 175, 237, 413
ABAP integration ............................................. 472
configuration ..................................................... 240
container registry ............................................ 167
Dockerfiles .......................................................... 298
download diagnostics .................................... 623
graphs ................................................................... 244
monitoring ............................................... 180, 616
navigate ............................................................... 240
operators .......................................... 242, 276, 413
push data to SAP Analytics Cloud ............. 363
SAP Analytics Cloud ........................................ 578
SAP BW operators ............................................ 484
schedule data pipelines ................................. 270
subengines .......................................................... 288
text analysis ....................................................... 540
trace messages ........................................ 272, 622
ModelStorage library ........................................... 412
Modular deployment .......................................... 151
Monitor tile ................................................... 175, 195
Monitoring ...................................... 73, 89, 180, 617
data pipelines .................................................... 270
data preparation .............................................. 260
diagnostics .......................................................... 624
graphs ......................................................... 352, 357
instance ................................................................ 144
Metadata Explorer .......................................... 195
Modeler ...................................................... 180, 616
profiling ............................................................... 198
rulebooks ............................................................. 226
Multicloud hybrid deployment ......................... 63
Multicontainer pod ................................................. 80
Multitier landscape ................................................. 73
N
Native SAP HANA storage extension ............ 388
Nested applications ............................................. 610
Nested policy .......................................................... 651
Net present value (NPV) ..................................... 314
Network and communication security ....... 107
Networked workforce ......................................... 739
Node controller ........................................................ 77
Node Overview dashboard ...................... 273, 634
NodeJS Multiplexer operator ........................... 290
Non-SAP system ................................................... 497
cloud connectivity ........................................... 497
on-premise ......................................................... 510
Notebook .............................................. 333, 362, 434
create .................................................................... 336
NOTROOT installer .............................................. 396
O
Object store type .................................................. 509
OData services ....................................................... 266
On-premise deployment ............................... 63, 64
installation ......................................................... 150
Open Database Connectivity (ODBC)
driver .................................................................... 509
Open Policy Agent (OPA) ................................... 649
Open VSX Registry ............................................... 396
OpenAPI Servlow operator ............................... 355
Open-source environment ............................... 331
Open-source programming language ......... 332
Operational data ...................................................... 42
Operational data processing (ODP) ............... 474
Operational mode ................................................ 661
Operator ......................................... 56, 100, 239, 242
ABAP ..................................................................... 472
add ports ............................................................. 277
built-in .............................................. 242, 275, 330
categories ........................................................... 243
cloud services .................................................... 266
compatability check ....................................... 279
configuration ........................................... 243, 284
connectivity ....................................................... 329
create ................................................. 267, 281, 282
custom .............................................. 267, 275, 282
data workflow ................................................... 264
documentation ................................................. 276
edit ......................................................................... 287
events ................................................................... 277
graph snippets .................................................. 262
groups .................................................................. 262
hyperscale data ................................................ 340
machine learning ................................... 328, 413
ports ...................................................................... 283
runtime ......................................................... 84, 288
SAP Analytics Cloud .............................. 363, 582
SAP BW ................................................................. 484
section .................................................................. 241
tags ........................................................................ 284
Oracle ........................................................................ 510
Orchestration ............................................................ 61
Outer join ................................................................ 538
Outlier ....................................................................... 384
2162.book Seite 775 Mittwoch, 22. September 2021 8:49 20
Index
776
Outlook ........................................................... 753, 759
Overall equipment effectiveness (OEE) ....... 743
P
Package manager ............................... 296, 302, 375
pandas .......................................... 303, 306, 381, 399
Parent Strategy ...................................................... 603
Partition scheme ......................................... 525, 528
Password .................................................................. 130
Performance metrics ........................ 324, 353, 411
Permissions ............................................................. 115
Google Cloud Platform .................................. 121
Persist run ................................................................ 453
Persistence layer ................................................... 326
Persistent volume ................................................ 665
error handling ................................................... 665
scale up ................................................................ 666
Persona ....................................................... 54, 86, 172
machine learning ............................................. 322
Personal access token ......................................... 682
generate ............................................................... 683
Personal data .......................................................... 640
privacy .................................................................. 641
Personalization ........................................ 73, 87, 149
applications ........................................................ 172
pip ............................................................................... 302
Pipeline ................ 49, 89, 175, 237, 239, 288, 325,
329, 413
advantages ......................................................... 329
best practices ..................................................... 709
CI/CD ..................................................................... 690
create .................................................................... 340
create with template ....................................... 428
data transfer ...................................................... 484
deploy with Python SDK ................................ 447
engine ...................................................................... 85
improvements ................................................... 758
inference .................................................... 169, 354
interact via APIs ............................................... 344
modeling ................................................................. 61
runtime behavior ................................................ 84
schedule ............................................................... 270
tab .......................................................................... 333
TensorFlow ......................................................... 347
test ......................................................................... 359
training ...................................................... 351, 444
versus Data Builder ......................................... 570
Planning ................................................................... 572
Platform
core ........................................................................ 151
extended .............................................................. 151
Platform (Cont.)
full stack ............................................................... 151
Plotly .......................................................................... 377
Pod .......................... 68, 69, 76, 78, 79, 82, 295, 665
deployment options .......................................... 79
security ................................................................... 83
troubleshooting ................................................ 165
Policy ........................................................... 81, 83, 650
assign .................................................................... 654
AWS ........................................................................ 115
categories ............................................................ 652
create ..................................................................... 655
custom ........................................................ 184, 655
list ........................................................................... 183
manage ................................................................. 649
nested .................................................................... 651
predelivered policies ........................................ 650
users ....................................................................... 605
Policy decision point (PDP) ............................... 649
Policy Management ......................... 170, 182, 649
assign policies .................................................... 654
create custom policies .................................... 655
Port type ............................................... 277, 279, 283
Position hierarchy ................................................ 536
Postman .......................................................... 355, 359
Prediction ................................................................. 573
result ...................................................................... 356
Predictive Analysis Library (PAL) .......... 311, 407
output tables ...................................................... 408
prerequisites ....................................................... 407
procedures ........................................................... 408
Predictive pricing .................................................. 750
Privacy-Enhanced Mail (PEM) .......................... 132
Private cloud deployment ................................... 63
Private key ...................................................... 132, 137
Privileges .................................................................. 184
select ............................................................ 480, 483
Process Chain operator ....................................... 481
Process Data operator ......................................... 239
Process Executor operator ................................ 289
Process ID-based limits and reservations ..... 83
Production environment ........................... 96, 701
Profile ......................................................................... 172
fact sheet .............................................................. 199
Profiling .................................................................... 198
metrics .................................................................. 195
Progress flow ........................................................... 352
Project ........................................................................ 712
Prometheus ............................................................. 631
expose data ......................................................... 636
federation ............................................................ 636
third-party integration .................................. 636
2162.book Seite 776 Mittwoch, 22. September 2021 8:49 20
777
Index
Public APIs ............................................................... 758
Public hyperscaler deployment ......................... 63
Public sector ............................................................ 742
Public user ID ......................................................... 101
Published data set ...................................... 202, 205
Pull request ............................................................. 678
PuTTY ............................................................... 136, 138
PuTTY Secure Copy client (PSCP) .................... 144
PuTTYgen
files ......................................................................... 137
Python ................................. 305, 332, 337, 374, 375
execute script in terminal ............................. 397
install packages ................................................ 398
libraries ................................................................ 377
operator ............................................................... 290
set up in SAP Business Application
Studio ............................................................... 396
Python API ............................................................... 312
Python Client API ................................................. 436
Python Consumer template ................... 354, 416
Python Producer operator ................................ 364
Python Producer template ............ 341, 415, 418
Python SDK ................................................... 312, 439
create pipelines ................................................. 444
execute and deploy pipelines ...................... 447
Jupyter Notebook ............................................. 440
methods ............................................................... 448
read data ............................................................. 442
templates ............................................................. 444
Python3 operator ........... 305, 307, 346, 415, 418
configure ............................................................. 356
Q
Quick link .................................................................... 88
Quick Sizer tool ......................................................... 95
R
R ................................................................................... 332
R Client operator ................................ 305, 308, 333
Range partitioning ............................................... 526
Raw NBConvert ...................................................... 380
Read File operator .......... 243, 247, 428, 443, 499,
504, 583
Recipe ........................................................................ 257
Red Hat OpenShift ......................................... 46, 332
Redshift SQL Consumer operator .................. 501
Redshift Table Consumer operator ............... 501
Reference object .................................................... 440
Regression ............................................................... 425
Relational database management
system (RDBMS) ............................................... 511
Relational disk engine ............................... 518, 667
Relational in-memory engine ................ 518, 667
Relationship ........................................................... 201
terms ..................................................................... 229
Release management ......................................... 754
Remote connection ............................................. 664
Remote function call (RFC) ............................... 467
Remote table .......................................................... 561
ReplicaSet ..................................................... 78, 81, 82
Replication controller ............................................ 82
Repository ............................................... 84, 241, 290
ABAP ..................................................................... 460
create folders .................................. 245, 281, 299
Repository-based shipment channel
(RBSC) ................................................................... 152
Resampling ............................................................. 385
Resource ..................................................... 81, 85, 649
management ........................................................ 85
quota .......................................... 83, 183, 655, 709
sizing ....................................................................... 96
types ............................................................. 649, 656
REST API ................................................................... 636
Retail .......................................................................... 742
Retention period ......................................... 640, 669
Return on investment (ROI) ................... 314, 317
Reusability ................................................................. 73
Roadmap .................................................................. 753
explorer ............................................................... 758
Roadmap Explorer ............................................... 753
Route controller ....................................................... 77
Rule ............................................................................ 214
bind .............................................................. 223, 488
categories ........................................ 215, 219, 487
create .................................................................... 216
create new categories .................................... 219
dashboards ........................................................ 226
import .................................................................. 222
parameters ......................................................... 217
SAP Information Steward ............................ 485
test ......................................................................... 217
Rule-based selection ........................................... 319
Rulebook ......................................................... 214, 221
create .................................................................... 222
monitor ................................................................ 226
recently run ........................................................ 195
rule bindings ...................................................... 223
run ......................................................................... 225
thresholds ........................................................... 225
Rules tile .......................................................... 175, 215
Run collection ............................ 360, 363, 419, 451
2162.book Seite 777 Mittwoch, 22. September 2021 8:49 20
Index
778
Run level ................................................................... 661
Runtime environment ....................................... 297
Runtime operator ................................................. 288
S
SAP ABAP operator ............................................... 474
SAP Agile Data Preparation ................................. 45
SAP AI Business Services ........ 46, 53, 55, 57, 311
evolution ................................................................ 56
ready-to-use scenarios ...................................... 58
SAP Analytics Cloud ....... 363, 544, 566, 571, 719
add OAuth client .............................................. 588
connectivity ........................................................ 587
create connections .......................................... 575
data import ........................................................ 576
data modeling ................................................... 577
functions .............................................................. 572
operators ............................................................. 582
SAP Data Warehouse Cloud ......................... 566
stories ................................................................... 579
SAP Analytics Cloud Formatter
operator ..................................................... 364, 585
SAP Analytics Cloud Producer
operator ............................................ 364, 586, 589
SAP BTP cockpit .................................. 388, 389, 644
services ....................................................... 390, 394
SAP Business Application Studio ......... 388, 393
create a project ................................................. 395
extend Python to Jupyter Notebook ......... 398
open ....................................................................... 395
run Jupyter Notebook ..................................... 396
set up Python ..................................................... 396
SAP Business Technology Platform
(SAP BTP) ........................ 40, 103, 251, 373, 387,
389, 393, 467, 756
connect with the cloud connector ............. 469
connectors ............................................................. 60
explore .................................................................. 389
features ................................................................... 42
on-premise connection .................................. 470
user account authentication ....................... 644
SAP Business Warehouse (SAP BW) ........ 47, 478
prerequisites ....................................................... 478
user authorization ........................ 481, 483, 484
SAP BW Process Chain operator ..................... 484
SAP BW/4HANA ..................................................... 478
data consumption ........................................... 482
prerequisites ....................................................... 478
user authorization ........................................... 481
SAP Cash Application .......................................... 747
SAP Cloud Appliance Library ......... 59, 93, 97, 99
backup .................................................................. 112
cloud providers .................................................. 109
connect ................................................................. 122
costs ....................................................................... 102
create instances ................................................ 105
deploy solutions ................................................ 103
prerequisites ....................................................... 114
register .................................................................. 101
run solution ........................................................ 145
security ................................................................. 106
set up SAP Data Intelligence ........................ 113
sizing ..................................................................... 108
SAP Community .................................................... 104
SAP Continuous Integration and
Delivery ................................................................ 712
SAP Conversational AI .................................... 52, 53
SAP Data Hub ..................................................... 52, 58
SAP Data Intelligence ............. 39, 45, 51, 55, 311,
545, 600
access through browser ................................. 148
add Visual Studio Code .................................. 680
administration .................................................. 595
administrator access ....................................... 600
application lifecycle management ............ 673
applications ................................................ 88, 169
architecture ........................................................... 60
capabilities ............................................................ 48
connect to SAP HANA Cloud ........................ 402
core components ................................................ 61
create solution instance ................................ 124
data sources .......................................................... 47
deployment options .......................................... 63
evolution ......................................................... 56, 58
features ................................................................... 44
genesis ..................................................................... 52
installation ................................................ 154, 160
integrate with business processes ............... 46
Kubernetes ............................................................ 83
launchpad ........................................... 86, 147, 169
libraries ................................................................. 337
log on ..................................................................... 146
machine learning ............................................. 328
maintenance ...................................................... 661
migration ............................................................. 731
objectives ............................................................... 48
on-premise .......................................................... 150
outlook .................................................................. 753
overview ................................................................. 43
personalize .......................................................... 149
recent innovations ........................................... 754
restart services ................................................... 664
2162.book Seite 778 Mittwoch, 22. September 2021 8:49 20
779
Index
SAP Data Intelligence (Cont.)
security ................................................................. 639
setup ............................................................... 93, 113
trial edition ............................................................ 59
versus SAP Data Warehouse Cloud ........... 570
SAP Data Intelligence Cloud ............... 46, 64, 600
SAP Data Services ................................. 47, 754, 760
SAP Data Warehouse Cloud ........... 543, 545, 760
connection types .............................................. 561
create connections .......................................... 558
create database user ....................................... 559
create spaces ...................................................... 550
data visualization ............................................ 566
develop artifacts ............................................... 554
generate password .......................................... 547
landing page ...................................................... 549
SAP Analytics Cloud ........................................ 566
set up trial tenant ............................................ 546
SAP Distribution for Hadoop .............................. 45
SAP Fiori ................................................................... 395
SAP Gateway ........................................................... 481
SAP HANA ..................... 46, 47, 311, 373, 386, 515
access external view ....................................... 483
connection type ................................................ 480
create data frame ............................................ 401
data lakes ............................................................ 757
embedded machine learning .... 406, 425, 437
engine ................................................................... 388
machine learning libraries ........................... 407
Python API .......................................................... 312
Python Client API ............................................. 436
Python libraries ................................................ 328
SAP Vora .............................................................. 518
smart data integration .................................. 558
table .................................................... 342, 401, 404
tools ....................................................................... 388
user authorization ........................................... 483
Wire protocol ..................................................... 167
SAP HANA Client operator ................................ 342
SAP HANA Cloud ..... 41, 335, 373, 386–389, 544
architecture ........................................................ 388
central tool ......................................................... 391
connect ................................................................. 339
connect to Jupyter Notebook ............ 398, 400
connect to SAP Data Intelligence .............. 402
data sources ....................................................... 389
enable script server ......................................... 407
instance ................................................................ 388
preview data ...................................................... 336
read data into Jupyter Notebook ............... 402
trial account ....................................................... 390
SAP HANA cockpit ...................................... 388, 391
SAP HANA data warehousing
foundation ......................................................... 489
SAP HANA database explorer ....... 388, 391, 392
catalog ................................................................. 392
extract properties ............................................ 399
show tables ........................................................ 401
SAP HANA for SQL data warehousing .......... 489
prerequisites ...................................................... 489
transfer data from ........................................... 493
transfer data into ............................................ 490
SAP HANA ML Inference operator ....... 425, 428
configuration parameters ........................... 426
inputs and outputs ......................................... 427
SAP HANA ML Training operator ................... 423
SAP HANA Wire protocol ......................... 520, 643
SAP Information Steward ..... 215, 485, 754, 760
use case ................................................................ 745
SAP Intelligent Robotic Process
Automation (SAP Intelligent RPA) .............. 52
SAP Landscape Transformation
Replication Server .................................. 460, 465
operator ............................................................... 474
SAP Leonardo Artificial Intelligence ................ 55
SAP Leonardo Machine Learning
Foundation .......................................... 52–55, 730
evolution ................................................................ 56
feature comparison ........................................ 731
features ................................................................... 56
models .................................................................. 731
training data ..................................................... 733
SAP Model Company .......................................... 102
SAP NetWeaver ...................................................... 478
SAP S/4HANA .................................. 46, 54, 461, 467
ABAP system ...................................................... 472
SAP S/4HANA Cloud ........................................... 467
SAP Service Marketplace ................................... 101
SAP Vora ......................... 45, 46, 178, 340, 515, 518
access .................................................................... 520
application ............................................................ 89
create tables ............................................. 527, 529
create views ....................................................... 530
dashboard .......................................................... 634
data modeling .................................................. 524
data preview ...................................................... 521
DLog ...................................................................... 530
engines .............................................. 517, 518, 667
full-text search .................................................. 540
hierarchies ................................................. 536, 537
partition tables ................................................. 526
persistent storage ............................................ 667
sizing ...................................................... 96, 99, 668
transaction coordinator ...................... 167, 643
2162.book Seite 779 Mittwoch, 22. September 2021 8:49 20
Index
780
SAP Vora (Cont.)
use SQL Editor .................................................... 522
SAP Vora Client operator ................................... 541
SAP Vora Deployment operator ..................... 667
sapdi library ............................................................ 361
SAProuter ................................................................. 664
Scalability ................................................... 73, 85, 330
Scalar type ...................................................... 278, 291
Scatter plot .................................................... 375, 383
Schedule details ..................................................... 131
Schedule-based retraining ................................ 730
Scheduled job ......................................................... 620
Scheduled publication ........................................ 574
Scheduler .................................................................... 77
scikit-sklearn ........................................................... 386
scipy.stats ................................................................. 286
Scorecard wizard ................................................... 226
Secret ......................................................................... 608
Secret key ................................................................. 118
Secure communication channel ....................... 56
Security ..................................................................... 639
data protection and privacy .............. 639, 641
on-premise connectivity ................................ 658
SAP Cloud Appliance Library ....................... 106
user authentication ......................................... 642
Segmentation ......................................................... 541
Selenium .................................................................. 719
Semantic analysis ....................................... 541, 552
Semantic Data Lake (SDL) ..... 202, 250, 367, 509
Separation by purpose ........................................ 642
Sequential class ..................................................... 349
Sequential neural network ............................... 348
Service account ............................................ 121, 124
Service controller ..................................................... 77
Service provider ....................................................... 56
Service Ticket Intelligence ................................... 58
Service user ............................................................. 114
Service-level agreement (SLA) ......................... 102
Shared access signature (SAS) token ............. 506
Shared tenant ......................................................... 756
Single container pod .............................................. 79
Single exponential smoothing ........................ 409
Sizing ........................................................... 64, 93, 124
calculator ....................................................... 65, 67
installation ............................................................ 94
instances .............................................................. 107
minimum ............................................................... 96
persistent volume ............................................ 665
SAP HANA Cloud .............................................. 389
SAP Vora .............................................................. 668
System Management ...................................... 610
t-shirt approach ................................................... 99
Sizing (Cont.)
virtual machines ............................................... 128
sklearn ....................................................................... 384
SLC Bridge ................................................................. 151
deploy stack.xml ............................................... 161
expert mode configuration .......................... 156
initialize ................................................................ 155
run modes ............................................................ 156
SLC Bridge Base .................................. 152, 158, 164
SLT Connector operator ...................................... 474
Smart discovery ..................................................... 581
Smart predict .......................................................... 573
Snapshot ................................................................... 675
Software development kit (SDK) ..................... 439
Solution ................................................. 100, 103, 697
activated .............................................................. 106
develop .................................................................. 699
file ........................................................................... 602
password .............................................................. 130
run .......................................................................... 145
type ............................................................... 101, 103
Source code management ................................. 693
Space ................................................................. 390, 549
add users .................................................... 551, 556
auditing ................................................................ 559
classification ...................................................... 560
create ........................................................... 550, 556
create database users ..................................... 559
develop artifacts ............................................... 554
manage ................................................................. 556
priority .................................................................. 556
security ................................................................. 557
Spark ........................................................................... 518
SQL Console ................................................... 391, 392
SQL Editor ....................................................... 178, 522
SQL scripts ................................................................ 523
Stable branch .......................................................... 703
stack.xml ................................................................... 158
deploy .......................................................... 161, 164
Stakeholders meeting .......................................... 318
Standard connector ................................................ 60
StatefulSet .................................................................. 82
Statistical modeling ................................... 384, 386
Statistics .................................................................... 382
Stemming ................................................................. 541
Story ........................................................................... 579
add objects .......................................................... 581
builder ................................................................... 566
Strategy ........................................................... 384, 602
Streaming ................................................................... 61
table ....................................................................... 530
Structure type ............................................... 278, 291
2162.book Seite 780 Mittwoch, 22. September 2021 8:49 20
781
Index
Structured File Consumer operator .... 491, 504
Structured File Producer operator ................. 504
Subaccount .............................................................. 390
configure entitlements .................................. 646
create .................................................................... 644
mapping .............................................................. 469
Subengine ............................................. 275, 285, 288
advantages ......................................................... 289
create custom operators ............................... 289
Subject matter expert ......................................... 322
Submit Metrics API .............................................. 419
Submit Metrics operator ................................... 417
inputs and outputs .......................................... 418
Subnet ....................................................................... 130
Subscription .................................................. 104, 119
ID .................................................................. 119, 123
Sub-select ................................................................. 534
Supervised technique ......................................... 386
Supply chain ................................................. 742, 747
SUSE Linux Enterprise Server (SLES) ............. 302
System administrator ...................... 182, 184, 186
access .................................................................... 600
commands .......................................................... 598
maintenance ...................................................... 661
System diagnostics .............................................. 631
System logging ............................................ 181, 626
System Management ......... 46, 91, 184, 595, 600
access control .................................................... 642
applications .............................................. 187, 608
CI/CD ..................................................................... 690
cluster admin view .......................................... 601
command-line client ....................................... 595
expose ................................................................... 165
files ......................................................................... 605
login ....................................................................... 597
my workspace .................................................... 605
persistent volume size .................................... 665
SAP Vora .............................................................. 516
services ................................................................. 516
sizing ..................................................................... 610
tasks ...................................................................... 601
tenants ................................................................. 163
users ................................................... 604, 642, 698
System tenant ........................................................ 163
T
Table ................................................................. 527, 563
catalog .................................................................. 528
create in-memory ............................................ 527
create using disk engine ................................ 529
details ................................................................... 527
Table (Cont.)
partitioning ....................................................... 525
types ............................................................. 278, 529
Table Consumer operator ................................. 493
Table Producer operator .................................... 492
Table-based replication ...................................... 460
Tag .............................................................................. 101
automatic ........................................................... 207
automatic inheritance .................................. 305
create .................................................................... 346
Dockerfiles .......................................................... 301
hierarchy .......................................... 196, 207, 210
manual ................................................................ 208
operators ............................................................. 284
search filters ...................................................... 211
set ........................................................................... 453
usage .................................................................... 196
Target
column ................................................................. 411
value ..................................................................... 411
Template
create pipeline .................................................. 428
graphs .................................................................. 415
inference .............................................................. 428
pipelines .............................................................. 341
Python SDK ........................................................ 444
Temporary branch ...................................... 702, 704
Tenant ....................................................................... 185
ID ............................................................................ 103
manage ................................................................ 599
shared ................................................................... 756
types ...................................................................... 163
workspace .................................................. 186, 607
Tenant admin ..................................... 187, 602, 604
create users ........................................................ 642
view ....................................................................... 600
TensorFlow .......................................... 311, 443, 445
inception model ............................................... 422
pipelines .............................................................. 347
Term template ....................................................... 228
Termination date ................................................. 131
Termination protection .................. 113, 129, 135
remove ................................................................. 148
Test cycle .................................................................. 705
Test Drive Center (TDC) ..................................... 144
Testing environment .......................................... 701
Text analysis ........................................................... 540
linguistics ............................................................ 541
operator ............................................................... 540
Threshold ................................................................. 225
Tiller ........................................................................... 296
Time series algorithm ........................................ 409
2162.book Seite 781 Mittwoch, 22. September 2021 8:49 20
Index
782
Time series engine ............................................... 518
Time to live (TTL) ..................................................... 83
Tokenization ........................................................... 541
Tooltip ....................................................................... 378
Total economic impact (TEI) .................. 309, 313
benefits ................................................................. 315
components ........................................................ 313
costs ....................................................................... 317
framework .......................................................... 314
Trace message ........................... 180, 272, 617, 621
Trace publisher ...................................................... 622
Traditional deployment ........................................ 70
Training data .......................................................... 733
add to data lake ................................................ 734
Training operator ................................................. 446
Training pipeline ................................ 342, 444, 446
deploy with Python SDK ................................ 447
execute ................................................................. 351
metrics .................................................................. 360
Training run .................................................. 419, 451
Transaction
/n/IWFND/GW_CLIENT ................................. 481
SICF ........................................................................ 479
SM30 ..................................................................... 475
STC01 ..................................................................... 478
Transformation history ..................................... 230
Transmission control .......................................... 642
Transparency .......................................................... 718
Trigger message ..................................................... 265
Troubleshooting ................................................... 164
T-shirt sizing approach ................................ 99, 389
TTL controller ............................................................ 83
U
Undeploy model ................................................... 421
Union view .................................................... 186, 606
Usage analytics ...................................................... 648
Use cases ................................................ 737, 741, 743
automatic invoice posting ........................... 744
finance .................................................................. 746
food storage and maintenance .................. 745
guided vendor onboarding .......................... 744
manufacturing .................................................. 749
optimize asset effectiveness ........................ 743
supply chain ....................................................... 747
User ......................................................... 100, 104, 185
acceptance testing ........................................... 708
assign .................................................................... 126
assign policies ................................................... 654
authentication .................................................. 642
create as tenant admin .................................. 642
User (Cont.)
groups ................................................................... 115
manage ................................................................. 604
permissions ......................................................... 115
policies .................................................................. 605
preferences .......................................................... 175
workspace ............................................................ 698
User account and
authentication (UAA) ............................ 106, 644
User interface (UI) ................................................. 186
Utilities ...................................................................... 742
V
Validity check ......................................................... 107
VCTL ..................................... 596, 597, 663, 690, 697
commands ........................................................... 598
operating system .............................................. 596
Version ...................................................................... 350
control system .................................. 73, 673, 708
create ..................................................................... 353
history ................................................................... 354
version.json file ...................................................... 624
View ............................................................................ 530
additional functions ........................................ 533
catalog .................................................................. 532
import and export ............................................ 535
Virtual deployment ................................................ 71
Virtual machine ............................................. 71, 109
sizing ..................................................................... 128
Virtual private cloud (VPC) ................................ 659
Virtual private network (VPN) ............... 467, 659
Visual board ......................................... 360, 363, 420
Visual Studio Code ...................................... 170, 679
/vflow folder ....................................................... 687
access ..................................................................... 681
add .......................................................................... 680
integrate GitHub repository ........................ 685
Visualization .............................. 324, 363, 377, 383
SAP Data Warehouse Cloud ......................... 566
Volume controller ................................................... 77
Vora Tools ..................................... 89, 178, 520, 524
vrep command ....................................................... 599
vSystem ..................................................................... 610
W
Warm data ................................................................ 388
Whitelisting ............................................................. 475
collaborative ...................................................... 477
individual ............................................................. 475
Wholesale distribution ....................................... 742
2162.book Seite 782 Mittwoch, 22. September 2021 8:49 20
783
Index
Windows Azure Storage Blob (WASB) ........... 501
WinSCP ...................................................................... 141
Wiretap operator ................................................... 286
Worker node ...................................................... 69, 75
components ........................................................... 77
Workflow .................................................................. 261
Workflow Trigger operator ..................... 265, 484
Workload ............................................................. 80, 94
management ........................................................ 73
Workspace ............................................................... 186
Wrapper .................................................................... 451
Write File operator ................... 247, 415, 499, 504
Write Results File operator ............................... 239
Y
YAML Ain’t Markup Language (YAML) ........ 154
2162.book Seite 783 Mittwoch, 22. September 2021 8:49 20
First-hand knowledge.
We hope you have enjoyed this reading sample. You may recommend or pass it on to others, but only in its entirety, including all pages. This reading sample and all its parts are protected by copyright law. All usa-ge and exploitation rights are reserved by the author and the publisher.
Dharma Teja Atluri is an executive architect and artificial intelli-gence/machine learning evangelist at IBM. He has more than 18 years of experience working in advanced analytics with both SAP and non-SAP product lines.
Atluri, Bardhan, Ghosh, Ghosh, Saha
SAP Data Intelligence: The Comprehensive Guide783 pages, 2022, $89.95 ISBN 978-1-4932-2162-2
www.sap-press.com/5369
Devraj Bardhan is an accomplished global leader for SAP Inno-vations at IBM. He has led several large transformation projects, driving business growth agenda through innovation and digital efficiencies.
Santanu Ghosh is an SAP analytics practitioner working as a consultant for more than 15 years in the data warehouse space. He has worked with SAP Business Warehouse, SAP HANA, SAP BusinessObjects BI, and SAP Analytics Cloud.
Snehasish Ghosh is an enterprise information management (EIM) consultant and data engineer working at IBM Australia. He has more than 15 years of experience working in analytics and the information management portfolio.
Arindom Saha is an SAP business intelligence consultant with more than 11 years of experience working with the SAP analytics portfolio. He has extensive experience in SAP and non-SAP repor-ting and visualization products.