13
Data Warehouse Release Notes Release Notes Date of publish: 2020-02-20 https://docs.cloudera.com/

Release Notes - docs.cloudera.com · released under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms. Other

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Release Notes - docs.cloudera.com · released under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms. Other

Data Warehouse Release Notes

Release NotesDate of publish: 2020-02-20

https://docs.cloudera.com/

Page 2: Release Notes - docs.cloudera.com · released under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms. Other

Legal Notice

© Cloudera Inc. 2020. All rights reserved.

The documentation is and contains Cloudera proprietary information protected by copyright and other intellectualproperty rights. No license under copyright or any other intellectual property right is granted herein.

Copyright information for Cloudera software may be found within the documentation accompanying eachcomponent in a particular release.

Cloudera software includes software from various open source or other third party projects, and may bereleased under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3(AGPLv3), or other license terms. Other software included may be released under the terms of alternative opensource licenses. Please review the license and notice files accompanying the software for additional licensinginformation.

Please visit the Cloudera software product page for more information on Cloudera software. For moreinformation on Cloudera support services, please visit either the Support or Sales page. Feel free to contact usdirectly to discuss your specific needs.

Cloudera reserves the right to change any products at any time, and without notice. Cloudera assumes noresponsibility nor liability arising from the use of products, except as expressly agreed to in writing by Cloudera.

Cloudera, Cloudera Altus, HUE, Impala, Cloudera Impala, and other Cloudera marks are registered orunregistered trademarks in the United States and other countries. All other trademarks are the property of theirrespective owners.

Disclaimer: EXCEPT AS EXPRESSLY PROVIDED IN A WRITTEN AGREEMENT WITH CLOUDERA,CLOUDERA DOES NOT MAKE NOR GIVE ANY REPRESENTATION, WARRANTY, NOR COVENANT OFANY KIND, WHETHER EXPRESS OR IMPLIED, IN CONNECTION WITH CLOUDERA TECHNOLOGY ORRELATED SUPPORT PROVIDED IN CONNECTION THEREWITH. CLOUDERA DOES NOT WARRANT THATCLOUDERA PRODUCTS NOR SOFTWARE WILL OPERATE UNINTERRUPTED NOR THAT IT WILL BEFREE FROM DEFECTS NOR ERRORS, THAT IT WILL PROTECT YOUR DATA FROM LOSS, CORRUPTIONNOR UNAVAILABILITY, NOR THAT IT WILL MEET ALL OF CUSTOMER’S BUSINESS REQUIREMENTS.WITHOUT LIMITING THE FOREGOING, AND TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLELAW, CLOUDERA EXPRESSLY DISCLAIMS ANY AND ALL IMPLIED WARRANTIES, INCLUDING, BUT NOTLIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY, QUALITY, NON-INFRINGEMENT, TITLE, ANDFITNESS FOR A PARTICULAR PURPOSE AND ANY REPRESENTATION, WARRANTY, OR COVENANTBASED ON COURSE OF DEALING OR USAGE IN TRADE.

Page 3: Release Notes - docs.cloudera.com · released under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms. Other

Data Warehouse Release Notes | Contents | iii

Contents

What's new................................................................................................... 4May 14, 2020...................................................................................................................................... 4April 7, 2020........................................................................................................................................ 5March 13, 2020................................................................................................................................... 5February 20, 2020...............................................................................................................................6November 14, 2019.............................................................................................................................7September 23, 2019............................................................................................................................7August 22, 2019.................................................................................................................................. 7

Known issues and limitations....................................................................7

Page 4: Release Notes - docs.cloudera.com · released under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms. Other

Data Warehouse Release Notes What's new

What's new

This section lists major features and updates for the Data Warehouse service.

May 14, 2020This release of the Cloudera Data Warehouse service introduces the following new features andimprovements:

Simplified private networking deployments on AWS

Private networking deployments on AWS for Data Warehouse service have been simplified. When usingthe Private Load Balancer, Private Worker Nodes deployment mode, both public and private subnetsin your AWS VPC are no longer needed. Now this deployment mode only requires 3 private subnets. Formore information, see Supported deployment modes for private networking in AWS.

Generate and download Impala diagnostic bundles

You can now generate and download diagnostic bundles of log files for Impala Virtual Warehouses onAWS environments. For more information, see Downloading Impala diagnostic bundles from AWS.

Single Sign-On (SSO) support for Hue with Hive Virtual Warehouses

You can now access Hue to query Hive Virtual Warehouses from the Virtual Warehouse tile:

4

Page 5: Release Notes - docs.cloudera.com · released under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms. Other

Data Warehouse Release Notes What's new

DAS updates

In this release, DAS has been updated as follows:

• You can now specify administrative groups for the DAS web app. For more information, see SpecifyingadminUsers and adminGroups for DAS.

• Multi-DAG support for a single Hive query.

April 7, 2020This release of the Data Warehouse service introduces the following new features and improvements:

Hue single-sign-on (SSO) support using SAML

In this release, Hue now supports SSO by using SAML. For more information, see Authenticating userswith SAML in the Hue documentation.

Private networking feature now available for AWS environments when only private subnets areavailable

When you are unable to use public subnets in your AWS VPC, CDP Data Warehouse service nowsupports private networking for your Data Warehouse in this situation. To configure private networking forAWS environments where only private subnets are available to use, register a CDP environment in AWSwith three private subnets. Then network administrators must make sure there is outbound internet accessfrom private subnets by way of the transit gateway or by another means. For information about configuringprivate networking with these AWS environments, see Step 4 in Activating an AWS environment withprivate subnet support.

Azure support improvements

• Additional regions are now supported, including:

• Central US• East US• East US 2• West US• West US 2• Australia East

• Improved identity management.

Improved resiliency for Impala Virtual Warehouses

Resiliency for Impala Virtual Warehouses has been improved by reducing the amount of ephemeralstorage used on compute nodes.

March 13, 2020This release of the Data Warehouse service introduces the following new features and improvements:

Azure support for Data Warehouse service

This release of Data Warehouse service now supports Microsoft Azure as a public cloud provider. Formore details about this added support, see Azure support for Data Warehouse service.

Impala Virtual Warehouse improvements

Impala Virtual Warehouses now include the following improvements:

5

Page 6: Release Notes - docs.cloudera.com · released under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms. Other

Data Warehouse Release Notes What's new

• Improved resiliency for client connections.• Added more health checks to more accurately indicate the start of a Virtual Warehouse. This helps with

query timeouts due to provisioning lags.

February 20, 2020This release of the Data Warehouse service introduces the following new features and improvements:

Data Warehouse service security improvements

Continued improvements that include general hardening of the service platform.

Improved auto-scaling in Impala Virtual Warehouses

In this release, the following configuration options have been added to Impala auto-scaling:

• Enable or disable high availability (having one or two coordinators) to save on cloud resourceconsumption.

• Enable or disable AutoSuspend, which gives you the ability to accelerate the system's response toqueries after idle periods.

• Scale Up Delay, which sets the length of time in seconds that the system waits before adding moreexecutors when it detects queries waiting in the queue to execute.

• Scale Down Delay, which sets the length of time in seconds that the system waits before it removesexecutors when it detects idle executor groups.

For details, see Impala auto-scaling overview.

Dedicated DWUser role added

The DWUser role has been added, which grants a CDP user/group the ability to view Cloudera DataWarehouse clusters within a CDP environment.

For details, see the "CDP resource roles" section of Understanding roles and resource roles in theManagement Console documentation set.

Support for private deployments in AWS to set up private networking

Cloudera Data Warehouse service now supports private deployments in AWS, which use private subnets.In AWS, a public subnet is connected to an internet gateway which can send and receive traffic directly toand from the internet. Private subnets send outbound traffic from nodes to the internet by using a networkaddress translation (NAT) gateway, and then forwards the traffic to an internet gateway. Private subnetsreceive no direct inbound connections from the internet. This provides private network connectivity forworkload endpoints in Data Warehouse service.

For details, see Set up private networking in AWS.

Support for restricting access to endpoints in AWS

In this release, you can now whitelist IP CIDRs on your network so they can access Kubernetes endpointsand endpoints for the services such as Hive, Impala, Data Analytics Studio (DAS), or Hue in environmentsthat use AWS. For more details, see Restricting access to endpoints in AWS.

New section in documentation on "Managing environments"

A new How To section has been added to the documentation set called Managing environments. In thissection, learn how to configure CDP environments specifically to support the Data Warehouse service.

6

Page 7: Release Notes - docs.cloudera.com · released under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms. Other

Data Warehouse Release Notes Known issues and limitations

November 14, 2019This release of the Data Warehouse service introduces the following new features and improvements:

Hive 3 stability improvements

Continued improvements that include:

• HiveServer2 cleanup of existing DAGs from Tez session pools when HiveServer2 restarts.• Improved memory use for Hive LLAP Virtual Warehouses.• Improved accuracy of table statistics for partitioned tables.

Support for using Impala shell to connect to Virtual Warehouses from remote computers

You can download and install the Impala shell to your local computer and connect to Impala VirtualWarehouses in CDP. For details, see Using Impala shell.

Improved auto-scaling in Impala Virtual Warehouses

In this release, there is greater fault-tolerance and improved coordination between the processes used toauto-scale Impala Virtual Warehouses. For details, see Impala auto-scaling overview.

September 23, 2019This release of the Data Warehouse service introduces the following new features:

• Impala is now available as an execution engine for Cloudera Data Warehouse service VirtualWarehouses. For more information, see Tuning Impala data marts.

August 22, 2019This release of the Data Warehouse service introduces the following new features:

This is the first release of the Data Warehouse service. For an overview of Data Warehouse functionality,refer to the "Related information."

Related InformationData Warehouse Overview

Known issues and limitations

This section lists known issues and limitations that you might run into while using the Data Warehouseservice.

General Known Issues in Data Warehouse service

DWX-3420: For Azure environments, the ShowKubeconfig option does not work

Problem: The more menu option used to display thekubeconfig file for Azure environments is grayedout and does not work:

7

Page 8: Release Notes - docs.cloudera.com · released under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms. Other

Data Warehouse Release Notes Known issues and limitations

Workaround: Use a combination of the Azure CLIaz aks get-credentials command and thenuse the Kubernetes kubectl config viewcommand to view the kubeconfig file for yourAKS cluster. For more information, see Get andverify the configuration information in the MicrosoftAzure documentation.

DWX-2049: For the private networking featureon AWS environments. Only the default DHCPoption set created when the VPC is created inAWS is supported.

Problem: When the VPC is created in AWS, adefault DHCP option set is created. For this defaultDHCP option set the domain name option isset to <REGION>.compute.internal, where<REGION> is the AWS region where the VPC wascreated. The Data Warehouse service sets up annginx ingress controller (the LoadBalancer service)with externalTrafficPolicy set to Local(externalTrafficPolicy=Local) for betterperformance because it means there is one lessnetwork hop.

Workaround: For this feature to work correctly,the domain name in the DHCP option set cannotbe changed. Only the default DHCP option setis supported. Changing the domain name to acustom domain or having multiple domain namescauses the kube-proxy to not start correctly.If the Kubernetes network proxy (kube-proxy)does not start correctly, the Amazon ELB (loadbalancer) does not have healthy targets. Thiscauses workload endpoints, such as Data AnalyticsStudio (DAS), JDBC, or Hue, to return 503 errors.This is a known issue in Kubernetes and is yet to befixed.

Data Analytics Studio (DAS) in Data Warehouse service

DWX-4020: Add column functionality via uploadtable option doesn't work.

Problem: You may not be able to add or deletecolumns or change the table schema after creatinga new table using the upload table feature.

Workaround: N/A

DWX-929: DAS UI displays the internal JDBCURL.

Problem: DAS displays the internal JDBC URL onits About page instead of the correct JDBC URL touse to connect to the data warehouse.

Workaround: To copy the correct JDBC URL touse to connect to the data warehouse, in the DataWarehouse service Overview page, go to Virtual

8

Page 9: Release Notes - docs.cloudera.com · released under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms. Other

Data Warehouse Release Notes Known issues and limitations

Warehouse > , and then click Copy JDBCURL.

DWX-2592: DAS cannot parse certain charactersin strings and comments.

Problem: DAS cannot parse semicolons (;) anddouble hyphens (--) in strings and comments. Forexample if you have a semicolon in a query such asthe following, the query might fail:

SELECT * FROM properties WHERE prop_value = "name1;name2";

Queries with double hyphens (--) might also fail. Forexample:

SELECT * FROM test WHERE option = '--name';

Workaround: If a semicolon is present in acomment, then remove the semicolon beforerunning the query or remove the comment entirely.For example:

SELECT * FROM test; -- SELECT * FROM test;

Should be changed to:

SELECT * FROM test; /* comment; comment */

In the same manner, remove any double-hyphensbefore running queries to avoid failure in DAS.

Older versions of Google Chrome browsermight cause issues.

Problem: You might experience problems whileusing faceted search in older versions of the GoogleChrome browser.

Workaround: Use the latest version (71.x or later) ofGoogle Chrome.

BUG-94611: Visual Explain for the same queryshows different graphs.

Problem: Visual Explain for the same query showsdifferent graphs on the Compose page and theQuery Details page.

Workaround: N/A

Database Catalog

There are no known issues.

9

Page 10: Release Notes - docs.cloudera.com · released under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms. Other

Data Warehouse Release Notes Known issues and limitations

Hive 3 in Data Warehouse service

Result caching: This feature is limited to 10 GB.

Data caching: This feature is limited to 200 GB per compute node,multiplied by the total number of compute nodes.

DWX-3443: ANALYZE TABLE…COMPUTESTATISTICS fails with NullPointerException onVirtual Warehouse version 7.1.1.0-236

Problem: The ANALYZE TABLE…COMPUTESTATISTICS statement is run to gather statisticson a table for writing to the metastore. For example:

ANALYZE TABLE <table_name> PARTITION(<partition_name> COMPUTE STATISTICS;

However, if you run this statement against a tablein a Hive Virtual Warehouse version 7.1.1.0-236, aNullPointerException (NPE) might be returned.

To determine the version of the Virtual Warehouse:

1. In the Data Warehouse service UI, select VirtualWarehouses in the left navigation menu.

2. On the Virtual Warehouses page, locate theVirtual Warehouse that is returning the error,and click on its Name.

3. On the details page for the Virtual Warehouse,the version is listed at the top:

Workaround: Upgrade to a later version of ClouderaRuntime for the Virtual Warehouse:

1. In the Data Warehouse service UI, selectOverview in the left navigation menu.

2. In the Overview page, click More… in theEnvironments column to expand it and searchfor the environment that is being used for theVirtual Warehouse which is returning the error:

3. After you locate the environment, click the deleteicon in the upper right corner of the environmenttile:

10

Page 11: Release Notes - docs.cloudera.com · released under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms. Other

Data Warehouse Release Notes Known issues and limitations

Clicking this icon launches the Action dialogbox, but it does not delete the environment.

4. In the Action dialog box, click OK:

Clicking OK in the Action dialog box de-activatesthe environment.

5. After the environment has been de-activated,an activation icon appears on the tile. Click theactivation icon to re-activate the environment:

When you re-activate the environment, itautomatically refreshes the Cloudera Runtimeversion for the Virtual Warehouse and youshould no longer get the NPE error.

DWX-2690: Older versions of Beeline returnSSLPeerUnverifiedException when submitting aquery

Problem: When submitting queries to VirtualWarehouses that use Hive, older Beeline clientsreturn an SSLPeerUnverifiedException error:

javax.net.ssl.SSLPeerUnverifiedException: Host name ‘ec2-18-219-32-183.us-east-2.compute.amazonaws.com’ does notmatch the certificate subject provided by the peer (CN=*.env-c25dsw.dwx.cloudera.site) (state=08S01,code=0)

Workaround: Only use Beeline clients from CDPRuntime version 7.0.1.0 or later.

DWX-1952: Cloned Hive Virtual Warehouses donot have query executors or query coordinators

Problem: When you clone an existing Hive VirtualWarehouse, it is created with only HiveServer andData Analytics Studio (DAS) application containergroups (Kubernetes pods). This means that thecloned Virtual Warehouse cannot execute queries.

11

Page 12: Release Notes - docs.cloudera.com · released under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms. Other

Data Warehouse Release Notes Known issues and limitations

Workaround:

To manually add query executors and querycoordinators to the cloned Hive Virtual Warehouse:

1. Click the options menu on the cloned VirtualWarehouse, and then select Edit:

2. In the Virtual Warehouse edit page, changea value, such as the AutoSuspend Timeoutsetting, and then click Apply:

This causes the Data Warehouse service tocreate query executors and query coordinators

12

Page 13: Release Notes - docs.cloudera.com · released under the Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms. Other

Data Warehouse Release Notes Known issues and limitations

so you can execute queries on the cloned VirtualWarehouse.

Impala in Data Warehouse service

DWX-3914: Collect Diagnostic Bundle optiondoes not work on older environments

The Collect Diagnostic Bundle menu option inImpala Virtual Warehouses does not work for olderenvironments:

Data caching: This feature is limited to 200 GB per compute node,multiplied by the total number of compute nodes.

Sessions with Impala continue to run for 15minutes after the connection is disconnected.

When a connection to Impala is disconnected, thesession continues to run for 15 minutes in caseso the user or client can reconnect to the samesession again by presenting the session_token.After 15 minutes, the client must re-authenticate toImpala to establish a new connection.

13