43
Data Lake Governance Center FAQs Issue 01 Date 2021-03-29 HUAWEI TECHNOLOGIES CO., LTD.

Data Lake Governance Center - HUAWEI CLOUD

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Lake Governance Center - HUAWEI CLOUD

Data Lake Governance Center

FAQs

Issue 01

Date 2021-03-29

HUAWEI TECHNOLOGIES CO., LTD.

Page 2: Data Lake Governance Center - HUAWEI CLOUD

Copyright © Huawei Technologies Co., Ltd. 2021. All rights reserved.

No part of this document may be reproduced or transmitted in any form or by any means without priorwritten consent of Huawei Technologies Co., Ltd. Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.All other trademarks and trade names mentioned in this document are the property of their respectiveholders. NoticeThe purchased products, services and features are stipulated by the contract made between Huawei andthe customer. All or part of the products, services and features described in this document may not bewithin the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,information, and recommendations in this document are provided "AS IS" without warranties, guaranteesor representations of any kind, either express or implied.

The information in this document is subject to change without notice. Every effort has been made in thepreparation of this document to ensure accuracy of the contents, but all statements, information, andrecommendations in this document do not constitute a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd.Address: Huawei Industrial Base

Bantian, LonggangShenzhen 518129People's Republic of China

Website: https://www.huawei.com

Email: [email protected]

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. i

Page 3: Data Lake Governance Center - HUAWEI CLOUD

Contents

1 Product Consulting..................................................................................................................11.1 Regions and AZs...................................................................................................................................................................... 11.2 How Is Data Integrated into DGC?................................................................................................................................... 31.3 What Is the Relationship Between DGC and ROMA?.................................................................................................31.4 What Is the Relationship Between DGC and Huawei Horizon Digital Platform?.............................................31.5 Does DGC Support Private Cloud?.................................................................................................................................... 31.6 How Do I Create a Fine-Grained Permission Policy in IAM?....................................................................................31.7 How Do Enterprises Prevent Data Leakage During Data Governance?...............................................................3

2 Billing and Usage.....................................................................................................................52.1 Can I Try DGC for Free?.........................................................................................................................................................52.2 How Do I Renew My Instance When It Is About to Expire?..................................................................................... 62.3 Why Are the DIS Service Fees Included in DGC?..........................................................................................................62.4 What Can I Do If I Cannot Select an IAM Project When I Buy a DGC Instance?..............................................6

3 Management Center...............................................................................................................83.1 What Data Connections Are Supported?........................................................................................................................ 83.2 What Are the Precautions for Creating Data Connections?..................................................................................... 83.3 Why Do DWS/Hive/HBase Data Connections Fail to Obtain the Information About Database orTables?................................................................................................................................................................................................ 83.4 Why Are MRS Hive/HBase Clusters Not Displayed on the Page for Creating Data Connections?............. 83.5 What Should I Do If the Connection Test Fails When I Enable the SSL Connection During the Creationof a DWS Data Connection?....................................................................................................................................................... 9

4 Data Integration.................................................................................................................... 104.1 Is Field Conversion Supported?........................................................................................................................................ 104.2 What Data Formats Are Supported When the Data Source Is Hive?................................................................. 194.3 Can I Synchronize Jobs to Other Clusters?...................................................................................................................194.4 Does CDM Support Incremental Data Migration?.................................................................................................... 194.5 Can I Create Jobs in Batches?........................................................................................................................................... 194.6 Can I Back Up Jobs?............................................................................................................................................................. 204.7 How Do I Connect the On-Premises Intranet or Third-Party Private Network to CDM?............................ 204.8 What Is the Migration Performance in the Same VPC and Different VPCs?....................................................224.9 Why Is Error ORA-01555 Reported During Migration from Oracle to DWS?.................................................. 224.10 What Should I Do If the MongoDB Connection Migration Fails?......................................................................23

Data Lake Governance CenterFAQs Contents

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. ii

Page 4: Data Lake Governance Center - HUAWEI CLOUD

4.11 Why Does the Migration Fail When the Source End Keeps Changing During the Migration in theHBase Scenario?........................................................................................................................................................................... 24

5 Data Design............................................................................................................................ 255.1 What Is the Relationship Between Lookup Tables and Data Standards?......................................................... 255.2 What Is the Difference Between ER Modeling and Dimensional Modeling?.................................................. 255.3 What Data Modeling Methods Are Supported by Data Design?......................................................................... 255.4 How Can I Use Standardized Data?............................................................................................................................... 265.5 Does Data Design Support Database Reverse?.......................................................................................................... 26

6 Data Development................................................................................................................ 276.1 How Many Jobs Can Be Created in Data Development? Is There a Limit on the Number of Nodes in aJob?................................................................................................................................................................................................... 286.2 How Can I Quickly Rectify a Deleted CDM Cluster Associated with a Job?.....................................................286.3 Why Is There a Large Difference Between Job Execution Time and Start Time of a Job?..........................286.4 Will Subsequent Jobs Be Affected If a Job Fails to Be Executed During Scheduling of Dependent Jobs?What Should I Do?...................................................................................................................................................................... 296.5 What Do I Do If Node Error Logs Cannot Be Viewed When a Job Fails?..........................................................296.6 What Should I Do If the Agency List Fails to Be Obtained During Agency Configuration?....................... 296.7 How Do I Locate Job Scheduling Nodes with a Large Number?......................................................................... 306.8 Why Cannot Specified Peripheral Resources Be Selected When a Data Connection Is Created in DataDevelopment?............................................................................................................................................................................... 316.9 Why Cannot I Receive a Job Failure Alarm Notification After SMN Is Configured?......................................316.10 Why Is There No Job Running Scheduling Log on the Monitor Instance Page After PeriodicScheduling Is Configured for a Job?...................................................................................................................................... 326.11 Why Does the GUI Display Only the Failure Result but Not the Specific Error Cause After Hive SQLand Spark SQL Scripts Fail to Be Executed?....................................................................................................................... 326.12 What Do I Do If the Token Is Invalid During the Running of a Data Development Node?..................... 336.13 Why Cannot I View the Existing Workspaces After I Have the Required Policy?.........................................336.14 How Do I View Run Logs After a Job Is Tested?......................................................................................................336.15 Why Does a Job Scheduled by Month Start Running Before the Job Scheduled by Day Is Complete?............................................................................................................................................................................................................ 336.16 How Do I Execute Presto SQL in Data Development?.......................................................................................... 346.17 What Should I Do If Invalid Authentication Is Reported When I Run a DLI Script?................................... 346.18 Why Cannot I Select the Desired CDM Cluster in Proxy Mode When Creating a Data Connection?... 34

7 Data Assets............................................................................................................................. 357.1 What Are the Functions of the Data Assets Module?..............................................................................................357.2 What Assets Can Be Collected?....................................................................................................................................... 357.3 What Is Data Lineage?........................................................................................................................................................ 357.4 How Can Data Lineage Be Displayed on a Data Map?...........................................................................................36

8 Data Lake Mall.......................................................................................................................378.1 What Languages Do Data Lake Mall SDKs Support?.............................................................................................. 37

9 Data Security.......................................................................................................................... 389.1 Why Is Data in a Data Table Not Masked Based on Rules After a Data Masking Task Is Executed?.....38

Data Lake Governance CenterFAQs Contents

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. iii

Page 5: Data Lake Governance Center - HUAWEI CLOUD

9.2 Why Does the System Display a Message Indicating that Some Data Identification Rules Are in UseWhen They Are Deleted Although No Task Is Using Them?........................................................................................ 389.3 What Should I Do If Authentication Audit Logging Is Not Enabled?................................................................. 38

Data Lake Governance CenterFAQs Contents

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. iv

Page 6: Data Lake Governance Center - HUAWEI CLOUD

1 Product Consulting

1.1 Regions and AZs

1.2 How Is Data Integrated into DGC?

1.3 What Is the Relationship Between DGC and ROMA?

1.4 What Is the Relationship Between DGC and Huawei Horizon Digital Platform?

1.5 Does DGC Support Private Cloud?

1.6 How Do I Create a Fine-Grained Permission Policy in IAM?

1.7 How Do Enterprises Prevent Data Leakage During Data Governance?

1.1 Regions and AZs

ConceptA region and availability zone (AZ) identify the location of a data center. You cancreate resources in a specific region and AZ.

● Regions are divided from the dimensions of geographical location andnetwork latency. Public services, such as Elastic Cloud Server (ECS), ElasticVolume Service (EVS), Object Storage Service (OBS), Virtual Private Cloud(VPC), Elastic IP (EIP), and Image Management Service (IMS), are sharedwithin the same region. Regions are classified as universal regions anddedicated regions. A universal region provides universal cloud services forcommon tenants. A dedicated region provides services of the same type onlyor for specific tenants.

● An AZ contains one or more physical data centers. Each AZ has independentcooling, fire extinguishing, moisture-proof, and electricity facilities. Within anAZ, computing, network, storage, and other resources are logically dividedinto multiple clusters. AZs within a region are interconnected using high-speed optical fibers to allow you to build cross-AZ high-availability systems.

Figure 1-1 shows the relationship between regions and AZs.

Data Lake Governance CenterFAQs 1 Product Consulting

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 1

Page 7: Data Lake Governance Center - HUAWEI CLOUD

CN-Hong Kong

provide services for users outside the Chinese mainland. If you orThe countries and regions outside the Chinese mainland, such as Bangkok,

Figure 1-1 Regions and AZs

HUAWEI CLOUD provides services in many regions around the world. You canselect a region and AZ as needed. For more information, see HUAWEI CLOUDGlobal Regions.

Region Selection

When selecting a region, consider the following factors:

● LocationYou are advised to select a region close to you or your target users. Thisreduces network latency and improves access rate. However, Chinesemainland regions provide basically the same infrastructure, BGP networkquality, as well as operations and configurations on resources. Therefore, ifyou or your target users are in the Chinese mainland, you do not need toconsider the network latency differences when selecting a region.

your target users are in the Chinese mainland, these regions are notrecommended due to high access latency.– If you or your target users are in Asia Pacific excepting the Chinese

mainland, select the , AP-Bangkok, or AP-Singaporeregion.

– If you or your target users are in Africa, select the AF-Johannesburgregion.

– If you or your target users are in Europe, select the EU-Paris region.● Resource price

Resource prices may vary in different regions. For details, see Product PricingDetails.

AZ Selection

When determining whether to deploy resources in the same AZ, consider yourapplications' requirements on disaster recovery (DR) and network latency.

● For high DR capability, deploy resources in different AZs in the same region.● For low network latency, deploy resources in the same AZ.

Data Lake Governance CenterFAQs 1 Product Consulting

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 2

Page 8: Data Lake Governance Center - HUAWEI CLOUD

Regions and Endpoints

Before using an API to call resources, specify its region and endpoint. For detailson HUAWEI CLOUD regions and endpoints, see Regions and Endpoints.

1.2 How Is Data Integrated into DGC?DGC supports batch data migration and real-time data ingestion. It providesefficient access to more than 20 heterogeneous data sources, wizard-guidedconfiguration and management, and integration of single tables, entire databases,data that fluctuates periodically according to a certain rule, and data added ineach time segment.

1.3 What Is the Relationship Between DGC and ROMA?Real-time Open Multi-Cloud Agile (ROMA) serves as a channel linking varioussystems. It does not govern or plan the ingested data. DGC analyzes the structureof ingested data and re-models the data to eliminate data silos and helpenterprises build unified data models.

1.4 What Is the Relationship Between DGC and HuaweiHorizon Digital Platform?

DGC is a data enablement module of Huawei Horizon Digital Platform. It helpsyou better manage and use data.

1.5 Does DGC Support Private Cloud?DGC can be connected to HCS Online, providing high-quality services forcustomers.

For more information about HCS Online, see HCS Online.

1.6 How Do I Create a Fine-Grained Permission Policyin IAM?

Currently, fine-grained permission policies are not supported in DGC.

1.7 How Do Enterprises Prevent Data Leakage DuringData Governance?

1. Manage data hierarchically. Enterprises should take the national security,public rights and interests, personal privacy, and legitimate enterpriseinterests into full account when formulating data classification standards andcreate global data asset catalogs.

Data Lake Governance CenterFAQs 1 Product Consulting

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 3

Page 9: Data Lake Governance Center - HUAWEI CLOUD

2. Perform fine-grained authorization. Enterprises can adopt differentiatedcontrol measures for data of different levels to implement refined datamanagement. Permission authorization must comply with the principle of"minimum user authorization and full-process protection".

3. Properly manage data sharing. Enterprise can standardize the data sharingprocess to ensure that data users apply for data based on business needs onthe premise of legal compliance and security assurance. The data ownerreviews and determines the data usage scope and sharing mode based onrules, for the purpose of orderly data transfer and secure application throughthe data exchange mechanism.

4. Audit and identify risks. Enterprises need to record data usage to ensurethat data user actions can be traced and audited throughout the process. Inaddition, enterprises should fully evaluate potential risks, establish a dynamicand real-time risk identification and alarm mechanism, and detect and handlerisks timely.

5. Implement security management throughout the data lifecycle.Enterprises need to enhance data lifecycle security management to preventuser data leakage, tampering, and abuse. For example, enterprises need tohonestly inform users of the purpose, method, and scope of data collectionand usage and collect data only after obtaining user authorization. In thestorage phase, technologies such as feature extraction and labeling are usedto anonymize original information, separate the information from sensitiveinformation, and store the information independently to implement strictaccess control and reduce data leakage risks. In the data usage phase,technologies such as model computing and multi-party security computingare used to provide only anonymized calculation results without collectingand sharing raw data.

Data Lake Governance CenterFAQs 1 Product Consulting

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 4

Page 10: Data Lake Governance Center - HUAWEI CLOUD

2 Billing and Usage

2.1 Can I Try DGC for Free?

2.2 How Do I Renew My Instance When It Is About to Expire?

2.3 Why Are the DIS Service Fees Included in DGC?

2.4 What Can I Do If I Cannot Select an IAM Project When I Buy a DGC Instance?

2.1 Can I Try DGC for Free?Yes, you can. For new users, DGC Basic provides a 30-day free trial period.

You can log in to the DGC management console and click Try Basic to apply for afree trial of DGC instances. Select a region with caution because resources indifferent regions cannot communicate with each other.

Precautions for free trial are as follows:

1. Each account can experience the free trial only once.2. A basic DGC instance for free trial does not include CDM clusters. To use

CDM, buy a CDM incremental package. For details, see Buying DAYUIncremental Packages.

3. DGC Basic is available during the 30-day free trial period. After the trialperiod expires, you need to buy an instance of one of the official versionsprovided.

4. During the trial period, you can buy any official version of DGC. The trialperiod expires immediately.

5. DGC instances created during the trial period cannot be accessed when thetrial period expires. After the trial period expires, the instances are retained forseven days and will be automatically deleted seven days later. You shall takefull responsibility for any losses resulting from not buying a DGC instance intime. To access the free trial, follow the following instructions:

Data Lake Governance CenterFAQs 2 Billing and Usage

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 5

Page 11: Data Lake Governance Center - HUAWEI CLOUD

2.2 How Do I Renew My Instance When It Is About toExpire?

When a free trial instance is about to expire, you can buy an instance of anyofficial versions provided. Log in to the DGC management console, find the freetrial instance that is about to expire, and click Buy.

If you want to retain the resources and data of the original instance when buyinga new one, pay attention to the following points:

● The region of the bought instance must be the same as that of the free trialinstance.

● You need to buy an instance of the basic or a later version.● By default, the resources of the trial instance are moved to the first instance

you bought.

NO TE

Note that if you buy a basic instance, the resources and data of the Data Developmentand Data Integration modules in the original free trial instance will be retained, butother resources will be deleted.

When you buy an instance, the system automatically creates a CDM cluster in it.After the instance is bought, the CDM cluster will show up on the ClusterManagement page in Data Integration.

2.3 Why Are the DIS Service Fees Included in DGC?On February 18, 2020, HUAWEI CLOUD has notified users via emails, SMSmessages, and internal messages that "HUAWEI CLOUD plans to put theadvanced DIS stream into commercial use at 00:00 (Beijing time) on March 1,2020." In addition, the advanced DIS stream has been integrated into DGC.Therefore, the fees of the advanced DIS stream displayed to customers areincluded in DGC.

If you no longer need to use an advanced DIS stream, delete related resources toavoid additional fees.

2.4 What Can I Do If I Cannot Select an IAM ProjectWhen I Buy a DGC Instance?

Check whether the current account has enabled the enterprise project function.

The enterprise project and IAM project cannot be enabled at the same time. If theenterprise project is enabled, you can buy only one instance in this enterpriseproject.

Data Lake Governance CenterFAQs 2 Billing and Usage

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 6

Page 12: Data Lake Governance Center - HUAWEI CLOUD

Figure 2-1 Buying a DGC instance

Data Lake Governance CenterFAQs 2 Billing and Usage

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 7

Page 13: Data Lake Governance Center - HUAWEI CLOUD

3 Management Center

3.1 What Data Connections Are Supported?

3.2 What Are the Precautions for Creating Data Connections?

3.3 Why Do DWS/Hive/HBase Data Connections Fail to Obtain the InformationAbout Database or Tables?

3.4 Why Are MRS Hive/HBase Clusters Not Displayed on the Page for CreatingData Connections?

3.5 What Should I Do If the Connection Test Fails When I Enable the SSLConnection During the Creation of a DWS Data Connection?

3.1 What Data Connections Are Supported?For details on the data connections supported by DGC, see Data Sources.

3.2 What Are the Precautions for Creating DataConnections?

When creating a DWS, MRS Hive, RDS, and SparkSQL data connection, you mustbind an agent provided by the CDM cluster. Currently, a version of the CDM clusterearlier than 1.8.6 is not supported.

3.3 Why Do DWS/Hive/HBase Data Connections Fail toObtain the Information About Database or Tables?

That is probably because CDM clusters are disabled.

3.4 Why Are MRS Hive/HBase Clusters Not Displayedon the Page for Creating Data Connections?

Possible causes are as follows:

Data Lake Governance CenterFAQs 3 Management Center

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 8

Page 14: Data Lake Governance Center - HUAWEI CLOUD

● Hive/HBase components were not selected during MRS cluster creation.● The network between the CDM cluster and MRS cluster was disconnected

when an MRS data connection is created.The CDM cluster functions as a network agent. MRS data connections thatyou are going to create need to communicate with CDM.

3.5 What Should I Do If the Connection Test FailsWhen I Enable the SSL Connection During the Creationof a DWS Data Connection?

QuestionWhat should I do if the connection test fails when I enable the SSL connectionduring the creation of a DWS data connection?

AnswerOn the DWS console, click the corresponding cluster, choose Security Settings,and disable Rights Separation.

Figure 3-1 Disabling Rights Separation for a DWS cluster

Data Lake Governance CenterFAQs 3 Management Center

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 9

Page 15: Data Lake Governance Center - HUAWEI CLOUD

4 Data Integration

4.1 Is Field Conversion Supported?

4.2 What Data Formats Are Supported When the Data Source Is Hive?

4.3 Can I Synchronize Jobs to Other Clusters?

4.4 Does CDM Support Incremental Data Migration?

4.5 Can I Create Jobs in Batches?

4.6 Can I Back Up Jobs?

4.7 How Do I Connect the On-Premises Intranet or Third-Party Private Network toCDM?

4.8 What Is the Migration Performance in the Same VPC and Different VPCs?

4.9 Why Is Error ORA-01555 Reported During Migration from Oracle to DWS?

4.10 What Should I Do If the MongoDB Connection Migration Fails?

4.11 Why Does the Migration Fail When the Source End Keeps Changing Duringthe Migration in the HBase Scenario?

4.1 Is Field Conversion Supported?Yes. CDM supports the following field converters:

● Anonymization● Trim● Reverse String● Replace String● Expression Conversion

You can create a field converter on the Map Field tab page when creating a table/file migration job. See Figure 4-1.

Data Lake Governance CenterFAQs 4 Data Integration

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 10

Page 16: Data Lake Governance Center - HUAWEI CLOUD

Figure 4-1 Creating a field converter

AnonymizationThis converter is used to hide key information about the character string. Forexample, if you want to convert 12345678910 to 123****8910, configure theparameters as follows:● Set Reserve Start Length to 3.● Set Reserve End Length to 4.● Set Replace Character to *.

Data Lake Governance CenterFAQs 4 Data Integration

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 11

Page 17: Data Lake Governance Center - HUAWEI CLOUD

Figure 4-2 Anonymization

TrimThis converter is used to automatically delete the spaces before and after a string.No parameters need to be configured.

Reverse StringThis converter is used to automatically reverse a string. For example, reverse ABCinto CBA. No parameters need to be configured.

Replace StringThis converter is used to replace a character string. You need to configure theobject to be replaced and the new value.

Expression ConversionThis converter uses the JSP expression language (EL) to convert the current fieldor a row of data. The JSP EL is used to create arithmetic and logical expressions.Within a JSP EL expression, you can use integers, floating point numbers, strings,the built-in constants true and false for boolean values, and null.

The expression supports the following environment variables:● value: indicates the current field value.● row: indicates the current row, which is an array type.

The expression supports the following tool classes:● StringUtils: string processing tool class. For details, see

org.apache.commons.lang.StringUtils of the Java SDK code.

Data Lake Governance CenterFAQs 4 Data Integration

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 12

Page 18: Data Lake Governance Center - HUAWEI CLOUD

● DateUtils: date tool class● CommonUtils: common tool class● NumberUtils: string-to-value conversion class● HttpsUtils: network file read class

Application examples:

1. Set a string constant for the current field, for example, VIP.Expression: "VIP"

2. If the field is of the string type, convert all character strings into lowercaseletters, for example, convert aBC to abc.Expression: StringUtils.lowerCase(value)

3. Convert all character strings of the current field to uppercase letters.Expression: StringUtils.upperCase(value)

4. If the field value is a date string in yyyy-MM-dd format, extract the year fromthe field value, for example, extract 2017 from 2017-12-01.Expression: StringUtils.substringBefore(value,"-")

5. If the field value is of the numeric type, convert the value to a new valuewhich is two times greater than the original value:Expression: value*2

6. Convert the field value true to Y and other field values to N.Expression: value=="true"?"Y":"N"

7. If the field value is of the string type and is left empty, convert it to Default.Otherwise, the field value will not be converted.Expression: empty value? "Default":value

8. If the first and second fields are of the numeric type, convert the field to thecombination of the first and second field values.Expression: row[0]+row[1]

9. If the field is of the date or timestamp type, return the current year afterconversion. The data type is int.Expression: DateUtils.getYear(value)

10. If the field is a date and time string in yyyy-MM-dd format, convert it to thedate type:Expression: DateUtils.format(value,"yyyy-MM-dd")

11. Convert date format 2018/01/05 15:15:05 to 2018-01-05 15:15:05:Expression: DateUtils.format(DateUtils.parseDate(value,"yyyy/MM/ddHH:mm:ss"),"yyyy-MM-dd HH:mm:ss")

12. Obtain a 36-bit universally unique identifier (UUID):Expression: CommonUtils.randomUUID()

13. If the field is of the string type, capitalize the first letter, for example, convertcat to Cat.Expression: StringUtils.capitalize(value)

14. If the field is of the string type, convert the first letter to a lowercase letter,for example, convert Cat to cat.Expression: StringUtils.uncapitalize(value)

Data Lake Governance CenterFAQs 4 Data Integration

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 13

Page 19: Data Lake Governance Center - HUAWEI CLOUD

15. If the field is of the string type, use a space to fill in the character string tothe specified length and center the character string. If the length of thecharacter string is not shorter than the specified length, do not convert thecharacter string. For example, convert ab to meet the specified length 4.Expression: StringUtils.center(value,4)

16. Delete a newline (including \n, \r, and \r\n) at the end of a character string.For example, convert abc\r\n\r\n to abc\r\n.Expression: StringUtils.chomp(value)

17. If the string contains the specified string, true is returned; otherwise, false isreturned. For example, abc contains a so that true is returned.Expression: StringUtils.contains(value,"a")

18. If the string contains any character of the specified string, true is returned;otherwise, false is returned. For example, zzabyycdxx contains either z or aso that true is returned.Expression: StringUtils.containsAny("value","za")

19. If the string does not contain any one of the specified characters, true isreturned. If any specified character is contained, false is returned. Forexample, abz contains one character of xyz so that false is returned.Expression: StringUtils.containsNone(value,"xyz")

20. If the string contains only the specified characters, true is returned. If anyother character is contained, false is returned. For example, abab containsonly characters among abc so that true is returned.Expression: StringUtils.containsOnly(value,"abc")

21. If the character string is empty or null, convert it to the specified characterstring. Otherwise, do not convert the character string. For example, convertthe empty character string to null.Expression: StringUtils.defaultIfEmpty(value,null)

22. If the string ends with the specified suffix (case sensitive), true is returned;otherwise, false is returned. For example, if the suffix of abcdef is not null,false is returned.Expression: StringUtils.endsWith(value,null)

23. If the string is the same as the specified string (case sensitive), true isreturned; otherwise, false is returned. For example, after strings abc and ABCare compared, false is returned.Expression: StringUtils.equals(value,"ABC")

24. Obtain the first index of the specified character string in a character string. Ifno index is found, -1 is returned. For example, the first index of ab inaabaabaa is 1.Expression: StringUtils.indexOf(value,"ab")

25. Obtain the last index of the specified character string in a character string. Ifno index is found, -1 is returned. For example, the last index of k in aFkyk is4.Expression: StringUtils.lastIndexOf(value,"k")

26. Obtain the first index of the specified character string from the positionspecified in the character string. If no index is found, -1 is returned. Forexample, the first index of b obtained after the index 3 of aabaabaa is 5.

Data Lake Governance CenterFAQs 4 Data Integration

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 14

Page 20: Data Lake Governance Center - HUAWEI CLOUD

Expression: StringUtils.indexOf(value,"b",3)27. Obtain the first index of any specified character in a character string. If no

index is found, -1 is returned. For example, the first index of z or a inzzabyycdxx. is 0.Expression: StringUtils.indexOfAny(value,"za")

28. If the string contains any Unicode character, true is returned; otherwise, falseis returned. For example, ab2c contains only non-Unicode characters so thatfalse is returned.Expression: StringUtils.isAlpha(value)

29. If the string contains only Unicode characters and digits, true is returned;otherwise, false is returned. For example, ab2c contains only Unicodecharacters and digits, so that true is returned.Expression: StringUtils.isAlphanumeric(value)

30. If the string contains only Unicode characters, digits, and spaces, true isreturned; otherwise, false is returned. For example, ab2c contains onlyUnicode characters and digits, so that true is returned.Expression: StringUtils.isAlphanumericSpace(value)

31. If the string contains only Unicode characters and spaces, true is returned;otherwise, false is returned. For example, ab2c contains Unicode charactersand digits so that false is returned.Expression: StringUtils.isAlphaSpace(value)

32. If the string contains only printable ASCII characters, true is returned;otherwise, false is returned. For example, for !ab-c~, true is returned.Expression: StringUtils.isAsciiPrintable(value)

33. If the string is empty or null, true is returned; otherwise, false is returned.Expression: StringUtils.isEmpty(value)

34. If the string contains only Unicode digits, true is returned; otherwise, false isreturned.Expression: StringUtils.isNumeric(value)

35. Obtain the leftmost characters of the specified length. For example, obtainthe leftmost two characters ab from abc.Expression: StringUtils.left(value,2)

36. Obtain the rightmost characters of the specified length. For example, obtainthe rightmost two characters bc from abc.Expression: StringUtils.right(value,2)

37. Concatenate the specified character string to the left of the current characterstring and specify the length of the concatenated character string. If thelength of the current character string is not shorter than the specified length,the character string will not be converted. For example, if yz is concatenatedto the left of bat and the length must be 8 after concatenation, the characterstring is yzyzybat after conversion.Expression: StringUtils.leftPad(value,8,"yz")

38. Concatenate the specified character string to the right of the currentcharacter string and specify the length of the concatenated character string. Ifthe length of the current character string is not shorter than the specifiedlength, the character string will not be converted. For example, if yz is

Data Lake Governance CenterFAQs 4 Data Integration

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 15

Page 21: Data Lake Governance Center - HUAWEI CLOUD

concatenated to the right of bat and the length must be 8 afterconcatenation, the character string is batyzyzy after conversion.Expression: StringUtils.rightPad(value,8,"yz")

39. If the field is of the string type, obtain the length of the current characterstring. If the character string is null, 0 is returned.Expression: StringUtils.length(value)

40. If the field is of the string type, delete all the specified character strings fromit. For example, delete ue from queued to obtain qd.Expression: StringUtils.remove(value,"ue")

41. If the field is of the string type, remove the substring at the end of the field. Ifthe specified substring is not at the end of the field, no conversion isperformed. For example, remove .com at the end of www.domain.com.Expression: StringUtils.removeEnd(value,".com")

42. If the field is of the string type, delete the substring at the beginning of thefield. If the specified substring is not at the beginning of the field, noconversion is performed. For example, delete www. at the beginning ofwww.domain.com.Expression: StringUtils.removeStart(value,"www.")

43. If the field is of the string type, replace all the specified character strings inthe field. For example, replace a in aba with z to obtain zbz.Expression: StringUtils.replace(value,"a","z")

44. If the field is of the string type, replace multiple characters in the characterstring at a time. For example, replace h in hello with j and o with y to obtainjelly.Expression: StringUtils.replaceChars(value,"ho","jy")

45. If the field is of the string type, use the specified delimiter to split the textinto arrays. For example, use : to split ab:cd:ef into ["ab","cd","ef"].Expression: StringUtils.split(value,":")

46. If the string starts with the specified prefix (case sensitive), true is returned;otherwise, false is returned. For example, abcdef starts with abc, so that trueis returned.Expression: StringUtils.startsWith(value,"abc")

47. If the field is of the string type, delete all the specified characters from thefield. For example, delete all x, y, and z from abcyx to obtain abc.Expression: StringUtils.strip(value,"xyz")

48. If the field is of the string type, delete all the specified characters at the endof the field, for example, delete all spaces at the end of the field.Expression: StringUtils.stripEnd(value,null)

49. If the field is of the string type, delete all the specified characters at thebeginning of the field, for example, delete all spaces at the beginning of thefield.Expression: StringUtils.stripStart(value,null)

50. If the field is of the string type, obtain the substring after the specifiedposition (excluding the character at the specified position) of the characterstring. If the specified position is a negative number, calculate the position in

Data Lake Governance CenterFAQs 4 Data Integration

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 16

Page 22: Data Lake Governance Center - HUAWEI CLOUD

the descending order. For example, obtain the character string after thesecond character of abcde, that is, cde.Expression: StringUtils.substring(value,2)

51. If the field is of the string type, obtain the substring within the specified rangeof the character string. If the specified range is a negative number, calculatethe range in the descending order. For example, obtain the character stringbetween the second and fifth characters of abcde, that is, cd.Expression: StringUtils.substring(value,2,5)

52. If the field is of the string type, obtain the substring after the first specifiedcharacter. For example, obtain the substring after the first b in abcba, that is,cba.Expression: StringUtils.substringAfter(value,"b")

53. If the field is of the string type, obtain the substring after the last specifiedcharacter. For example, obtain the substring after the last b in abcba, that is,a.Expression: StringUtils.substringAfterLast(value,"b")

54. If the field is of the string type, obtain the substring before the first specifiedcharacter. For example, obtain the substring before the first b in abcba, thatis, a.Expression: StringUtils.substringBefore(value,"b")

55. If the field is of the string type, obtain the substring before the last specifiedcharacter. For example, obtain the substring before the last b in abcba, thatis, abc.Expression: StringUtils.substringBeforeLast(value,"b")

56. If the field is of the string type, obtain the substring nested within thespecified string. If no substring is found, null is returned. For example, obtainthe substring between tag in tagabctag, that is, abc.Expression: StringUtils.substringBetween(value,"tag")

57. If the field is of the string type, delete the control characters (char≤32) atboth ends of the character string, for example, delete the spaces at both endsof the character string.Expression: StringUtils.trim(value)

58. Convert the character string to a value of the byte type. If the conversion fails,0 is returned.Expression: NumberUtils.toByte(value)

59. Convert the character string to a value of the byte type. If the conversion fails,the specified value, for example, 1, is returned.Expression: NumberUtils.toByte(value,1)

60. Convert the character string to a value of the double type. If the conversionfails, 0.0d is returned.Expression: NumberUtils.toDouble(value)

61. Convert the character string to a value of the double type. If the conversionfails, the specified value, for example, 1.1d, is returned.Expression: NumberUtils.toDouble(value,1.1d)

62. Convert the character string to a value of the float type. If the conversionfails, 0.0f is returned.

Data Lake Governance CenterFAQs 4 Data Integration

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 17

Page 23: Data Lake Governance Center - HUAWEI CLOUD

Expression: NumberUtils.toFloat(value)

63. Convert the character string to a value of the float type. If the conversionfails, the specified value, for example, 1.1f, is returned.

Expression: NumberUtils.toFloat(value,1.1f)64. Convert the character string to a value of the int type. If the conversion fails,

0 is returned.

Expression: NumberUtils.toInt(value)

65. Convert the character string to a value of the int type. If the conversion fails,the specified value, for example, 1, is returned.

Expression: NumberUtils.toInt(value,1)

66. Convert the character string to a value of the long type. If the conversion fails,0 is returned.

Expression: NumberUtils.toLong(value)

67. Convert the character string to a value of the long type. If the conversion fails,the specified value, for example, 1L, is returned.

Expression: NumberUtils.toLong(value,1L)

68. Convert the character string to a value of the short type. If the conversionfails, 0 is returned.

Expression: NumberUtils.toShort(value)

69. Convert the character string to a value of the short type. If the conversionfails, the specified value, for example, 1, is returned.

Expression: NumberUtils.toShort(value,1)

70. Convert the IP string to a value of the long type, for example, convert10.78.124.0 to 172915712.

Expression: CommonUtils.ipToLong(value)

71. Read an IP address and physical address mapping file from the network, anddownload the mapping file to the map collection. url indicates the address forstoring the IP mapping file, for example, http://10.114.205.45:21203/sqoop/IpList.csv.

Expression: HttpsUtils.downloadMap("url")72. Cache the IP address and physical address mappings and specify a key for

retrieval, for example, ipList.

Expression: CommonUtils.setCache("ipList",HttpsUtils.downloadMap("url"))73. Obtain the cached IP address and physical address mappings.

Expression: CommonUtils.getCache("ipList")74. Check whether the IP address and physical address mappings are cached.

Expression: CommonUtils.cacheExists("ipList")75. Based on the specified offset type (month/day/hour/minute/second) and

offset (positive number indicates increase and negative number indicatesdecrease), convert the time in the specified format to a new time, forexample, add 8 hours to 2019-05-21 12:00:00.

Expression: DateUtils.getCurrentTimeByZone("yyyy-MM-dd HH:mm:ss",value,"hour", 8)

Data Lake Governance CenterFAQs 4 Data Integration

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 18

Page 24: Data Lake Governance Center - HUAWEI CLOUD

4.2 What Data Formats Are Supported When the DataSource Is Hive?

CDM can read and write data in SequenceFile, TextFile, ORC, or Parquet formatfrom the Hive data source.

4.3 Can I Synchronize Jobs to Other Clusters?CDM does not support direct job migration across clusters. However, you can usethe batch job import and export function to indirectly implement cross-clustermigration as follows:

1. Export all jobs from CDM cluster 1 and save the jobs' JSON files to a local PC.

For security purposes, no link password is exported when jobs are exported.All passwords are replaced by Add password here.

2. Edit each JSON file on the local PC by replacing Add password here with theactual password of the corresponding link.

3. Import the edited JSON files to CDM cluster 2 in batches to implement jobmigration between cluster 1 and cluster 2.

4.4 Does CDM Support Incremental Data Migration?CDM supports incremental data migration. With scheduled jobs and macrovariables of date and time, CDM provides incremental data migration in thefollowing scenarios:

● Incremental file migration

● Incremental migration of relational databases

● Incremental migration of HBase/CloudTable

● Incremental synchronization using the macro variables of date and time

For details, see Cloud Data Migration.

4.5 Can I Create Jobs in Batches?CDM supports batch job creation with the help of the batch import function. Youcan create jobs in batches as follows:

1. Create a job manually.

2. Export the job and save the job's JSON file to a local PC.

3. Edit the JSON file and replicate more jobs in the JSON file according to thejob configuration.

4. Import the JSON file to a cluster to implement batch job creation.

Data Lake Governance CenterFAQs 4 Data Integration

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 19

Page 25: Data Lake Governance Center - HUAWEI CLOUD

4.6 Can I Back Up Jobs?Yes. If you do not need to use the CDM cluster for a long time, you can stop ordelete it.

Before the deletion, you can use the batch export function of CDM to save all jobscripts to a local PC. Then, you can create a cluster and import the jobs againwhen necessary.

4.7 How Do I Connect the On-Premises Intranet orThird-Party Private Network to CDM?

Many enterprises deploy key data sources on the intranet, such as databases andfile servers. CDM runs on the cloud. To migrate the intranet data to the cloudusing CDM, use any of the following methods to connect the intranet to the cloud:

1. Bind the Internet IP addresses to the intranet data source nodes to enableCDM to access the data from the Internet directly.

2. Establish a VPN between the on-premises data center and the VPC where theservice resides.

3. Use Direct Connect to connect the data center to the cloud service.

4. Leverage Network Address Translation (NAT) or port forwarding to access thenetwork in proxy mode.

The following describes how to use the port forwarding tool to access intranetdata. The process is as follows:

1. Use a Windows computer as the gateway. The computer must be able toaccess both the Internet and the intranet.

2. Install the port mapping tool IPOP on the computer.

3. Configure port mapping using the tool.

NO TICE

If the intranet database is exposed to the public network for a long time, securityrisks exist. Therefore, after data migration is complete, stop port mapping.

Scenario

Suppose that the MySQL database on the intranet is migrated to DWS. Figure 4-3shows the network topology.

In the figure, the intranet can be either an enterprise's data center or the intranetof the virtual data center on a third-party cloud.

Data Lake Governance CenterFAQs 4 Data Integration

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 20

Page 26: Data Lake Governance Center - HUAWEI CLOUD

Figure 4-3 Network topology example

ProcedureStep 1 Use a Windows computer as the gateway. Configure both the intranet and

Internet IP addresses on the computer. Conduct the following test to checkwhether the gateway computer can fulfill service needs.

1. Run the ping command on the computer to check whether the intranetaddress of the MySQL database is pingable. For example, run ping192.168.1.8.

2. Run the ping command on another computer that can access the Internet tocheck whether the public network address of the gateway computer ispingable. For example, run ping 202.xx.xx.10.

Step 2 Download the port mapping tool IPOP and install it on the gateway computer.

Step 3 Run the port mapping tool and select PORT Map. See Figure 4-4.● Local IP and Local Port: Configure these two parameters to the public

network address and port number of the gateway computer respectively,which must be entered when creating MySQL links on CDM.

● Mapping IP and Map Port: Configure these two parameters to the IP addressand port number of the MySQL database on the intranet.

Figure 4-4 Configuring port mapping

Data Lake Governance CenterFAQs 4 Data Integration

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 21

Page 27: Data Lake Governance Center - HUAWEI CLOUD

Step 4 Click ADD to add a port mapping relationship.

Step 5 Click START to start mapping and receive data packets.

Then, you can use the EIP to read data from the MySQL database on the intraneton CDM and import the data to DWS.

NO TE

1. To access the on-premises data source, you must also bind an EIP to the CDM cluster.2. Generally, DWS is accessible within the same VPC. When creating a CDM cluster, you

must ensure that the VPC of the CDM cluster must be the same as that of DWS. Inaddition, it is recommended that CDM and DWS be in the same intranet and securitygroup. If their security groups are different, you also need to enable data accessbetween the security groups.

3. Port mapping can be used to migrate data between databases on the intranet or theSFTP servers.

4. For Linux computers, port mapping can also be implemented using IPTABLE.5. When the FTP server on the intranet is mapped to the public network using port

mapping, you need to check whether the PASV mode is enabled. In this case, the clientand server are connected through a random port. Therefore, in addition to port 21mapping, you also need to configure the port range mapping in PASV mode. Forexample, you can specify the vsftp port range by configuring pasv_min_port andpasv_max_port.

----End

4.8 What Is the Migration Performance in the SameVPC and Different VPCs?

The transmission rate depends on the bandwidth and file read/write speed.

4.9 Why Is Error ORA-01555 Reported DuringMigration from Oracle to DWS?

SymptomWhen CDM is used to migrate Oracle data to DWS, an error is reported, as shownin Figure 4-5.

Data Lake Governance CenterFAQs 4 Data Integration

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 22

Page 28: Data Lake Governance Center - HUAWEI CLOUD

Figure 4-5 Symptom

Cause Analysis1. During data migration, if the entire table is queried and the table contains a

large amount of data, the query takes a long time.2. During the query, other users frequently perform the commit operation.3. The RBS (the tablespace used for rollback) of Oracle is small. As a result, the

migration task is not complete, the source database has been updated, andthe rollback times out.

Summary and Suggestions1. Reduce the data volume queried each time.2. Modify the database configurations to increase the RBS of the Oracle

database.

4.10 What Should I Do If the MongoDB ConnectionMigration Fails?

By default, the userAdmin role has only the permissions to manage roles andusers and does not have the read and write permissions on a database.

If the MongoDB connection fails to be migrated, you need to view the userpermission information in the MongoDB connection to ensure that the user hasthe read and write permissions on the specified database.

Data Lake Governance CenterFAQs 4 Data Integration

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 23

Page 29: Data Lake Governance Center - HUAWEI CLOUD

4.11 Why Does the Migration Fail When the SourceEnd Keeps Changing During the Migration in theHBase Scenario?

In the HBase scenario, the migration is performed based on the snapshot file ofthe HBase table. After the snapshot file is generated, if the source-end datachanges, the file cannot be found. As a result, the migration fails.

When an error occurs for the first time, CDM migrates the files that do notchange. For files that have changed, you need to retry the migration task.

Data Lake Governance CenterFAQs 4 Data Integration

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 24

Page 30: Data Lake Governance Center - HUAWEI CLOUD

5 Data Design

5.1 What Is the Relationship Between Lookup Tables and Data Standards?

5.2 What Is the Difference Between ER Modeling and Dimensional Modeling?

5.3 What Data Modeling Methods Are Supported by Data Design?

5.4 How Can I Use Standardized Data?

5.5 Does Data Design Support Database Reverse?

5.1 What Is the Relationship Between Lookup Tablesand Data Standards?

A lookup table consists of the names, codes, and data types of multiple tablefields. The table fields in a code table can be associated with a data standard, andthe data standard is applied to the fields in a model table.

5.2 What Is the Difference Between ER Modeling andDimensional Modeling?

ER modeling complies with 3NF modeling. Dimensional modeling mainly refers tothe design of fact tables and dimension tables. Dimensional modeling is mainlyused to implement multi-angle and multi-layer data query and analysis.

DGC is a data lake operations platform. Dimensional modeling is used morefrequent.

5.3 What Data Modeling Methods Are Supported byData Design?

DGC Data Design supports entity relationship (ER) modeling and dimensionalmodeling:

● ER modeling

Data Lake Governance CenterFAQs 5 Data Design

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 25

Page 31: Data Lake Governance Center - HUAWEI CLOUD

ER modeling describes the business activities within an enterprise. Compliantwith the third normal form (3NF), ER modeling is designed for dataintegration. It is used for combining and merging data with similarities bysubject. ER modeling results cannot be used directly for decision-making, butthey are a useful tool.You can divide ER modeling into three levels of abstraction: design conceptualmodels, logical models, and physical models.– Conceptual model: A conceptual model is a representation of business

processes and business data involved in different activities. It can be usedto represent the relationships between business entities.

– Logical model: A logical model is more detailed than a conceptualmodel. It is used to outline the entities, attributes, and relationships of abusiness. It enables communication between IT and business staff. Alogical model is a set of standardized logic table structures. Determinedby business rules, a logical model outlines business objects, data items ofthe business objects, and relationships between business objects.

– Physical model: A physical model is based on logical models and is usedto design the database architecture for data storage with a range oftechnical factors all considered. For example, the selected datawarehouse could be defined as DWS or DLI.

● Dimensional modelingDimensional modeling is the construction of models based on analysis anddecision-making requirements. It is mainly used for data analysis. Dimensionalmodeling is focused on how to quickly analyze user requirements and respondrapidly to complicated large-scale queries.A multidimensional model is a fact table that consists of numericmeasurement metrics. The fact table is associated with a group ofdimensional tables that contain description attributes through primary orforeign keys.Typical dimensional models include star models and snowflake models usedin some special scenarios.In DGC Data Design, construct a bus matrix, a dimension model and a factmodel by extracting facts and dimensions; and a summary modelincorporating metrics abstracted from BI analysis.

5.4 How Can I Use Standardized Data?Standardized data can be used as basic BI information, source data of upper-layerapplications, and visualized reports of various data.

5.5 Does Data Design Support Database Reverse?Yes. Currently, database reverse can be performed on Data Warehouse Service(DWS), Data Lake Insight (DLI), and MapReduce Service (MRS Hive).

Data Lake Governance CenterFAQs 5 Data Design

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 26

Page 32: Data Lake Governance Center - HUAWEI CLOUD

6 Data Development

6.1 How Many Jobs Can Be Created in Data Development? Is There a Limit on theNumber of Nodes in a Job?

6.2 How Can I Quickly Rectify a Deleted CDM Cluster Associated with a Job?

6.3 Why Is There a Large Difference Between Job Execution Time and Start Timeof a Job?

6.4 Will Subsequent Jobs Be Affected If a Job Fails to Be Executed DuringScheduling of Dependent Jobs? What Should I Do?

6.5 What Do I Do If Node Error Logs Cannot Be Viewed When a Job Fails?

6.6 What Should I Do If the Agency List Fails to Be Obtained During AgencyConfiguration?

6.7 How Do I Locate Job Scheduling Nodes with a Large Number?

6.8 Why Cannot Specified Peripheral Resources Be Selected When a DataConnection Is Created in Data Development?

6.9 Why Cannot I Receive a Job Failure Alarm Notification After SMN IsConfigured?

6.10 Why Is There No Job Running Scheduling Log on the Monitor Instance PageAfter Periodic Scheduling Is Configured for a Job?

6.11 Why Does the GUI Display Only the Failure Result but Not the Specific ErrorCause After Hive SQL and Spark SQL Scripts Fail to Be Executed?

6.12 What Do I Do If the Token Is Invalid During the Running of a DataDevelopment Node?

6.13 Why Cannot I View the Existing Workspaces After I Have the Required Policy?

6.14 How Do I View Run Logs After a Job Is Tested?

6.15 Why Does a Job Scheduled by Month Start Running Before the Job Scheduledby Day Is Complete?

6.16 How Do I Execute Presto SQL in Data Development?

Data Lake Governance CenterFAQs 6 Data Development

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 27

Page 33: Data Lake Governance Center - HUAWEI CLOUD

6.17 What Should I Do If Invalid Authentication Is Reported When I Run a DLIScript?

6.18 Why Cannot I Select the Desired CDM Cluster in Proxy Mode When Creatinga Data Connection?

6.1 How Many Jobs Can Be Created in DataDevelopment? Is There a Limit on the Number ofNodes in a Job?

By default, each user can create a maximum of 10,000 jobs, and each job cancontain a maximum of 1000 nodes.

In addition, the system allows you to adjust the maximum quota as required. Ifyou have any requirements, submit a service ticket.

6.2 How Can I Quickly Rectify a Deleted CDM ClusterAssociated with a Job?

After the CDM cluster is deleted, the association information in the datadevelopment job remains intact. You only need to create a cluster and job with thesame names on CDM. The data development job will remind you that the originalCDM cluster and job will be replaced before using the newly created ones.

6.3 Why Is There a Large Difference Between JobExecution Time and Start Time of a Job?

On the Running History page, there is a large difference between Job ExecutionTime and Start Time, as shown in the figure below. Job Execution Time is thetime when the job is expected to be executed. Start Time is the time when the jobstarts to be executed.

Figure 6-1 Running History page

In Data Development, a maximum of five instances can be concurrently executedin a job. If Start Time of a job is later than Job Execution Time, the job instancesin the subsequent batch will be queued.

If you find that the difference between Job Execution Time and Start Timebecomes large, adjust Job Execution Time accordingly.

Data Lake Governance CenterFAQs 6 Data Development

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 28

Page 34: Data Lake Governance Center - HUAWEI CLOUD

6.4 Will Subsequent Jobs Be Affected If a Job Fails toBe Executed During Scheduling of Dependent Jobs?What Should I Do?

The subsequent jobs may be suspended, continued, or terminated, depending onthe configuration.

Figure 6-2 Job dependencies

In this case, do not stop the job. You can rerun the failed job instance or stop theabnormal instance and then run it again. After the instance failure is removed, thesubsequent operations will continue. If you manually process the failure not inData Development but in other ways, you can force the job instance to succeedafter the failure is removed and then subsequent jobs will continue to runproperly.

6.5 What Do I Do If Node Error Logs Cannot Be ViewedWhen a Job Fails?

Error logs are stored in OBS. The current account must have the OBS readpermissions to view logs. You can check the OBS permissions and OBS bucketpolicies in IAM.

NO TE

When you create a job, a bucket named dlf-log-{projectID} will be created by default. Ifthe bucket exists, you do not need to create a bucket again.

6.6 What Should I Do If the Agency List Fails to BeObtained During Agency Configuration?

When a workspace- or job-level agency is configured, the following error isreported when the agency list is viewed:

Policy doesn't allow iam:agencies:listAgencies to be performed.

Add the View Agency List policy for the current user.

You can create a custom policy (query the agency list based on specifiedconditions) and assign it to a user group for refined access control.

Step 1 Log in to HUAWEI CLOUD and click Console in the upper right corner.

Data Lake Governance CenterFAQs 6 Data Development

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 29

Page 35: Data Lake Governance Center - HUAWEI CLOUD

Step 2 On the management console, hover the mouse pointer over the username in theupper right corner, and choose Identity and Access Management from the drop-down list.

Step 3 In the navigation pane, choose Permissions. Then, click Create Custom Policy.

Step 4 Enter a policy name.

Step 5 Set Scope to Global services. The scope you set is where the custom policy takeseffect. In this example, the custom policy has the permissions required to view theagency lists based on specified conditions.

Step 6 Set Policy View to Visual editor.

Step 7 Configure a policy in Policy Content.

1. Select Allow.2. Select Identity and Access Management (IAM) for Select service.3. Select iam:agencies:listAgencies for Select action.

Step 8 Click OK.

Step 9 Add the policy defined in Step 7 to the group to which the current user belongs.For details, see Creating a User Group and Granting Permissions.

The current user can log out of the system and then log in again to obtain theagency list.

----End

6.7 How Do I Locate Job Scheduling Nodes with aLarge Number?

If the number of daily executed nodes exceeds the upper limit, it may be causedby frequent job scheduling. Perform the following operations:

1. In the left navigation tree of Data Development, choose Monitoring >Monitor Instance, select the current day, and view the jobs that arefrequently scheduled.

2. In the left navigation tree of Data Development, choose Monitoring >Monitor Job to check whether the scheduling period of jobs that arefrequently scheduled is set properly. If the scheduling period is inappropriate,

Data Lake Governance CenterFAQs 6 Data Development

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 30

Page 36: Data Lake Governance Center - HUAWEI CLOUD

adjust the scheduling period or stop the scheduling. Generally, the number ofminute-level scheduling jobs executed every day exceeds the upper limit.

Figure 6-3 Viewing the scheduling period

6.8 Why Cannot Specified Peripheral Resources BeSelected When a Data Connection Is Created in DataDevelopment?

Ensure that the current instance and peripheral resources are in the same regionand IAM project. If the enterprise project function is enabled for your account, thecurrent instance and peripheral resources must be in the same enterprise project.

6.9 Why Cannot I Receive a Job Failure AlarmNotification After SMN Is Configured?

What can I do if the SMN notification for job exception/failure is configured butthe alarm notification for job exception/failure is not received?

Figure 6-4 Notification management

To solve the problem, perform the following steps:

Step 1 Check whether the failed job is being scheduled. No notification is sent for jobs inthe test running state. SMN notifications are sent only for jobs in the schedulingstate.

Step 2 On the Data Development page, choose Monitoring > Manage Notification tocheck whether the notification function is enabled.

Data Lake Governance CenterFAQs 6 Data Development

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 31

Page 37: Data Lake Governance Center - HUAWEI CLOUD

Step 3 Log in to the SMN console and check whether the SMN topic has been subscribedto.

Step 4 Check whether the subscription endpoint of the SMN topic has its own name andwhether the subscription is confirmed.

Step 5 Check whether the SMN channel is normal. You can send messages to your topicon the SMN console to check whether you can receive notifications from SMN.

----End

6.10 Why Is There No Job Running Scheduling Log onthe Monitor Instance Page After Periodic Scheduling IsConfigured for a Job?

1. On the Data Development page, choose Monitoring > Monitor Job to checkwhether the target job is being scheduled. A job can be scheduled only withinthe scheduling period.

Figure 6-5 Viewing the job scheduling status

2. If a job depends on other jobs, choose Monitoring > Monitor Instance toview the running status of the dependent jobs. If the job is self-dependent,expand the search time to check whether the job is waiting for running due tothe failure of a historical job instance.

6.11 Why Does the GUI Display Only the Failure Resultbut Not the Specific Error Cause After Hive SQL andSpark SQL Scripts Fail to Be Executed?

Check whether the data connection used by the Hive SQL and Spark SQL scripts isdirect connection or proxy connection.

In direct connection mode, DGC users submit the scripts to MRS through APIs andthen check whether the scripts are executed successfully. MRS does not send thespecific error cause to DGC. Therefore, the GUI displays only the execution result(success or failure) but does not display the error cause.

If you want to view the error cause, go to the job management page of MRS.

Data Lake Governance CenterFAQs 6 Data Development

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 32

Page 38: Data Lake Governance Center - HUAWEI CLOUD

6.12 What Do I Do If the Token Is Invalid During theRunning of a Data Development Node?

Check whether the permissions of the current user in IAM are changed, whetherthe user is removed from the user group, or whether the permission policy of theuser group to which the user belongs is changed.

If they are indeed changed, log in to the system again.

6.13 Why Cannot I View the Existing Workspaces AfterI Have the Required Policy?

Log in to the system as a user who has the permissions required by the currentworkspace and check whether the required permissions are assigned to all users inthe workspace.

If they are not, assign them to users.

Figure 6-6 Viewing workspace members

6.14 How Do I View Run Logs After a Job Is Tested?Method 1: After the node test is complete, right-click the current node and chooseView Log from the shortcut menu.

Method 2: Click Monitor in the upper part of the canvas, expand the job instanceon the Monitor Instance page, and view node logs.

6.15 Why Does a Job Scheduled by Month StartRunning Before the Job Scheduled by Day Is Complete?

Jobs scheduled by month depend on jobs scheduled by day. Why does a jobscheduled by month start running before the job scheduled by day is complete?

Data Lake Governance CenterFAQs 6 Data Development

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 33

Page 39: Data Lake Governance Center - HUAWEI CLOUD

Figure 6-7 Viewing the job scheduling period and dependency attributes

Although jobs scheduled by month depend on jobs scheduled by day, whether jobsscheduled by month in the current month are executed depends on whether alljobs scheduled by day in the previous month are complete, not the jobs scheduledby day in the current month.

For example, whether the monthly scheduled jobs run in November depends onwhether the daily scheduled jobs were complete in October.

6.16 How Do I Execute Presto SQL in DataDevelopment?

For details, see Using Presto to Dump Data in DLF.

6.17 What Should I Do If Invalid Authentication IsReported When I Run a DLI Script?

Check whether the current user has the DLI Service User or DLI Service Adminpermissions in IAM.

6.18 Why Cannot I Select the Desired CDM Cluster inProxy Mode When Creating a Data Connection?

Check whether the CDM cluster is stopped. If it is stopped, restart it.

Data Lake Governance CenterFAQs 6 Data Development

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 34

Page 40: Data Lake Governance Center - HUAWEI CLOUD

7 Data Assets

7.1 What Are the Functions of the Data Assets Module?

7.2 What Assets Can Be Collected?

7.3 What Is Data Lineage?

7.4 How Can Data Lineage Be Displayed on a Data Map?

7.1 What Are the Functions of the Data Assets Module?The Data Assets module displays enterprise data assets in the form of data maps,including all metadata information and data lineage.

7.2 What Assets Can Be Collected?Currently, the following assets can be collected: DWS, DLI, MRS HBase, MRS Hive,MySQL, RDS MySQL, and RDS PosetSQL

7.3 What Is Data Lineage?In the era of big data, various types of data are rapidly generated due to explosivedata growth. The massive and complex data information is converged,transformed, and transferred to generate new data and aggregate into an oceanof data.

During this process, a relationship is formed between the data, and theserelationships are their lineages. They are analogous to the genetic relationshipsbetween people. However, in contrast from our human lineages, data lineageshave the following distinct features:

● Belongingness: Specific data belongs to a specific organization or individual.● Multi-source: One piece of data can have multiple sources. One piece of data

may be generated by processing multiple pieces of data, and there may bemultiple such processes.

● Traceability: The data lineage is traceable. It reflects the data lifecycle andthe entire process from data generation to data disappearance.

Data Lake Governance CenterFAQs 7 Data Assets

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 35

Page 41: Data Lake Governance Center - HUAWEI CLOUD

● Hierarchy: The data lineage is hierarchical. Data classification and summaryform new data, and different levels of description result in data layers.

Figure 7-1 Data lineage example

7.4 How Can Data Lineage Be Displayed on a DataMap?

To display data lineage, you need to collect metadata first and then schedulerelated jobs in Data Development.

Data Lake Governance CenterFAQs 7 Data Assets

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 36

Page 42: Data Lake Governance Center - HUAWEI CLOUD

8 Data Lake Mall

8.1 What Languages Do Data Lake Mall SDKs Support?

8.1 What Languages Do Data Lake Mall SDKs Support?Data Lake Mall SDKs support C#, Python, Go, JavaScript, PHP, C++, C, Android, andJava.

Data Lake Governance CenterFAQs 8 Data Lake Mall

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 37

Page 43: Data Lake Governance Center - HUAWEI CLOUD

9 Data Security

9.1 Why Is Data in a Data Table Not Masked Based on Rules After a Data MaskingTask Is Executed?

9.2 Why Does the System Display a Message Indicating that Some DataIdentification Rules Are in Use When They Are Deleted Although No Task Is UsingThem?

9.3 What Should I Do If Authentication Audit Logging Is Not Enabled?

9.1 Why Is Data in a Data Table Not Masked Based onRules After a Data Masking Task Is Executed?

This is because the masking task depends on the sensitive data discovery task. Youmust create a sensitive data discovery task first. After sensitive fields arediscovered, the sensitive fields are masked based on rules.

9.2 Why Does the System Display a Message Indicatingthat Some Data Identification Rules Are in Use WhenThey Are Deleted Although No Task Is Using Them?

This is because Data Security and Data Assets use the same set of dataidentification rules. Although a data identification rule is not used in any datasecurity tasks, it may be used in Data Assets. Therefore, when you delete theidentification rule, the system displays a message indicating that the identificationrule is in use.

9.3 What Should I Do If Authentication Audit LoggingIs Not Enabled?

Authentication audit logging is unavailable in this version.

Data Lake Governance CenterFAQs 9 Data Security

Issue 01 (2021-03-29) Copyright © Huawei Technologies Co., Ltd. 38