HP Vertica Analytics Platform 7.0.x Concepts Guide Warranty TheonlywarrantiesforMicroFocusproductsandservicesaresetforthintheexpresswarrantystatementsaccompanyingsuchproductsandservices.Nothingherein

Concepts GuideHP Vertica Analytic DatabaseSoftware Version: 7.0.x

Document Release Date: 5/2/2018

Legal Notices

WarrantyThe only warranties for Micro Focus products and services are set forth in the express warranty statements accompanying such products and services. Nothing hereinshould be construed as constituting an additional warranty. Micro Focus shall not be liable for technical or editorial errors or omissions contained herein.

The information contained herein is subject to change without notice.

Restricted Rights LegendConfidential computer software. Valid license from Micro Focus required for possession, use or copying. Consistent with FAR 12.211 and 12.212, CommercialComputer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standardcommercial license.

Copyright Notice© Copyright 2006 - 2014Micro Focus International plc.

Trademark NoticesAdobe® is a trademark of Adobe Systems Incorporated.

Microsoft® andWindows® are U.S. registered trademarks of Microsoft Corporation.

UNIX® is a registered trademark of TheOpenGroup.

This document may contain branding from Hewlett-Packard Company (now HP Inc.) and HewlettPackard Enterprise Company. As of September 1, 2017, the document is now offered by MicroFocus, a separately owned and operated company. Any reference to the HP and Hewlett PackardEnterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPEmarks are the property of their respective owners.

Micro Focus Vertica Analytic Database (7.0.x) Page 2 of 61

Contents

Contents 3

The Vertica Approach 7

Column Storage 7

Compression 7

Clustering 8

Continuous Performance 8

Vertica Components 11

Column Store Architecture with FlexStore 11

How FlexStore™Enhances Your Column-Based Architecture 11

Architecture of the Vertica Cluster 12

Terminology 12

Data Encoding and Compression 13

Encoding 13

Compression 13

K-Safety 13

K-Safety Example 14

K-Safety Requirements 16

Determining K-Safety 16

Monitoring K-Safety 17

Finding Critical Nodes 17

High Availability and Recovery 17

K-Safety Requirements 17

Determining K-Safety 18

Monitoring K-Safety 18

Loss of K-Safety 18

High Availability With Projections 19

Replication (Unsegmented Projections) 19

Buddy Projections (Segmented Projections) 19

High Availability With Fault Groups 20


Automatic fault groups 21

User-defined fault groups 21

Example cluster topology 21

How to create fault groups 22

Hybrid StorageModel 22

Logical Schema 24

Physical Schema 24

How Projections Are Created 25

Anatomy of a Projection 25

Column List and Encoding 26

BaseQuery 26

Sort Order 26

Segmentation 27

Projection Concepts 27

Projection Performance 27

Encoding and Compression 27

Sort Order 27

Segmentation 28

Projection Segmentation 28

Hash Segmentation 28

Projection Naming 28

Database Setup 29

Prepare SQL Scripts and Data Files 29

Create the Database 30

Test the Empty Database 30

Test the Partially-Loaded Database 30

Complete the Fact Table Load 31

Set up Security 31

Set up Incremental Loads 31

Database Connections 31

The Administration Tools 32

Concepts GuideContents


Running the Administration Tools 32

First TimeOnly 33

Between Dialogs 34

Management Console 35

What You CanDowith Management Console 35

How to Get MC 35

What You Need to Know 36

Management Console Architecture 36

MC Components 36

Application/web Server 37

MC Agents 37

Management Console Security 38

OAuth and SSL 38

User Authentication and Access 38

Management Console Home Page 39

Database Designer 39

Database Security 40

Data Loading andModification 41

WorkloadManagement 41

SQL Overview 43

Vertica Support for ANSI SQL Standards 43

Support for Historical Queries 43

Joins 43

Transactions 43

About Query Execution 45

Snapshot IsolationMode 45

Historical Queries 45

Transactions 47

Implementation Details 47

Automatic Rollback 47

Savepoints 48



READ COMMITTED Isolation 48

SERIALIZABLE Isolation 52

International Languages and Character Sets 55

Unicode Character Encoding 55

Locales 55

String Functions 55

Character String Literals 56

Extending Vertica 57

User-Defined SQL Functions 57

User Defined Extensions and User Defined Functions 57

Get Started 59

We appreciate your feedback! 61



The Vertica ApproachVertica is built from theGround Up on the 4 C’s:

Column StorageStores data the way it is typically queried for best performance. Column storage is ideal for read-intensive workloads because it can dramatically reduce disk I/O.

CompressionStores more data, provides more views, and uses less hardware, which lets you keepmuchmorehistorical data in physical storage.

Concepts GuideThe Vertica Approach


l When similar data is grouped, you have evenmore compression options

l Vertica applies over twelve compression schemas

n Dependent on data

n System chooses which to apply

n NULLs take virtually no space

l Typically see 50% - 90% compression

l Vertica queries data in encoded form

ClusteringLets you scale out your database cluster easily by addingmore hardware.

l Columns are duplicated across cluster nodes. If onemachine goes down, you still have a copy:

n Data warehouse log based recovery is impractical

n Instead, store enough projections for K-safety

l New cluster node queries existing nodes for the data it needs

n Rebuilds missing objects from other nodes

n Another benefit of multiple sort orders

Continuous PerformanceQueries and loads data 24x7 with virtually no database administration.



l Concurrent loading and queryingmeans that you get real-time views and eliminate nightly loadwindows.

l On-the-fly schema changes mean that you can add columns and projections without databasedowntime.

l Automatic data replication, failover, and recovery provides for active redundancy, whichincreases performance. Nodes recover automatically by querying the system.



Page 10 of 61Micro Focus Vertica Analytic Database (7.0.x)


Vertica ComponentsThis section describes the unique components that make up Vertica.

Column Store Architecture with FlexStoreTraditionally databases were designed for OLTP and used a row-store architecture. To process aquery, a row store reads all of the columns in all of the tables named in the query, regardless of howwide the tables might be or how many columns are actually needed. Often, analytic queries accessonly two or three columns from tables containing up to several hundred columns, resulting in a lot ofunnecessary data retrieval.

Unlike other RDBMS, Vertica reads the columns from database objects called projections, whichare described in the Physical Schema section of this guide. No resources are wasted by readinglarge numbers of unused columns. Every byte of data is used by the execution engine. Forexample, consider this simple two-table schema:

Suppose you want to run this query:

SELECT A, C, NFROM Table1 JOIN Table2ON H = J;

A row storemust read 16 columns (A through H and J throughQ) from physical storage for eachrecord in the result set. A column store with a query-specific projection reads only three columns: A,C, and N.

How FlexStore™ Enhances Your Column-BasedArchitecture

FlexStore™ is a combination of physical design, database storage, and query execution techniquesthat Vertica applies to the database to optimize it for the analytic workload supports at the time.These techniques include:

l Column grouping. Refers to a technique for storing column data together to optimize I/Oduring query processing. Such groupings can be advantageous for correlated columns and forcolumns that are always accessed together for projecting, but not for filtering or joining. Groupedcolumns also benefit from special compression and retrieval techniques. An examplemight be

Concepts GuideVertica Components


bid and ask prices in a TickStore database. Column grouping is described in the CREATEPROJECTION statement's GROUPED clause.

l Intelligent disk use. Allows optimizing performance to place frequently-needed disk resourcesonto faster media. This includes mixing solid-state and rotating "disk" storage in the databasenodes. You can prioritize disk use for:

n data versus temporary storage

n storage for columns in a projection

SeeWorkingWith Storage Locations in the Administrator's Guide for details.

l Fast deletes. Refers to projection design techniques to speed up delete processing, togetherwith the function EVALUATE_DELETE_PERFORMANCE() to help identify potential deleteproblems. SeeOptimizing Deletes and Updates for Performance in the Administrator's Guide fordetails.

Architecture of the Vertica Cluster

TerminologyIn Vertica, the physical architecture is designed to distribute physical storage and to allow parallelquery execution over a potentially large collection of computing resources.

Themost important terms to understand are host, instance, node, cluster, and database:

Host—

A computer system with a 32-bit (non-production use only) or 64-bit Intel or AMD processor, RAM,hard disk, and TCP/IP network interface (IP address and hostname). Hosts share neither diskspace nor mainmemory with each other.

Instance—

An instance of Vertica consists of the running Vertica process and disk storage (catalog and data)on a host. Only one instance of Vertica can be running on a host at any time.

Node—

A host configured to run an instance of Vertica. It is amember of a database cluster. For a databaseto have the ability to recover from the failure of a node requires at least three nodes. Micro Focusrecommends that you use aminimum of four nodes.

Cluster—

Refers a collection of hosts (nodes) bound to a database. A cluster is not part of a databasedefinition and does not have a name.

Database—

A cluster of nodes that, when active, can perform distributed data storage and SQL statementexecution through administrative, interactive, and programmatic user interfaces.



Data Encoding and Compression

EncodingThe process of converting data into a standard format. In Vertica, encoded data can be processeddirectly, while compressed data cannot. Vertica uses a number of different encoding strategies,depending on column data type, table cardinality, and sort order.

The query executor in Vertica operates on the encoded data representation whenever possible toavoid the cost of decoding. It also passes encoded values to other operations, savingmemorybandwidth. In contrast, row stores andmost other column stores typically decode data elementsbefore performing any operation.

CompressionThe process of transforming data into a compact format. Compressed data cannot be directlyprocessed; it must first be decompressed. Vertica uses integer packing for unencoded integers andLZO for compressible data. Although compression is generally considered to be a form of encoding,the terms have different meanings in Vertica.

The size of a database is often limited by the availability of storage resources. Typically, when adatabase exceeds its size limitations, the administrator archives data that is older than a specifichistorical threshold.

The extensive use of compression allows a column store to occupy substantially less storage thana row store. In a column store, every value stored in a column of a projection has the same datatype. This greatly facilitates compression, particularly in sorted columns. In a row store, each valueof a row can have a different data type, resulting in amuch less effective use of compression.

Vertica's efficient storage allows the database administrator to keepmuchmore historical data inphysical storage. In other words, the archiving threshold can be set to amuch earlier date than in aless efficient store.

K-SafetyK-safety is ameasure of fault tolerance in the database cluster. The value K represents the numberof replicas of the data in the database that exist in the database cluster. These replicas allow othernodes to take over for failed nodes, allowing the database to continue running while still ensuringdata integrity. If more than K nodes in the database fail, some of the data in the databasemaybecome unavailable. In that case, the database is considered unsafe and automatically shutsdown.

It is possible for an Vertica database to havemore than K nodes fail and still continue runningsafely, because the database continues to run as long as every data segment is available on atleast one functioning cluster node. Potentially, up to half the nodes in a database with a K-safetylevel of 1 could fail without causing the database to shut down. As long as the data on each failednode is available from another active node, the database continues to run.



Note: If half or more of the nodes in the database cluster fail, the database will automaticallyshut down even if all of the data in the database is technically available from replicas. Thisbehavior prevents issues due to network partitioning.

In Vertica, the value of K can be zero (0), one (1), or two (2). The physical schema designmustmeet certain requirements. To create designs that are K-safe, Micro Focus recommends using theDatabase Designer.

K-Safety Example

The diagram above shows a 5-node cluster that has a K-safety level of 1. Each of the nodescontains buddy projections for the data stored in the next higher node (node 1 has buddy projectionsfor node 2, node 2 has buddy projections for node 3, etc.). Any of the nodes in the cluster could fail,and the database would still be able to continue running (although with lower performance, sinceone of the nodes has to handle its ownworkload and the workload of the failed node).



If node 2 fails, node 1 handles requests on its behalf using its replica of node 2's data, in addition toperforming its own role in processing requests. The fault tolerance of the database will fall from 1 to0, since a single node could cause the database to become unsafe. In this example, if either node 1or node 3 fail, the database would become unsafe since not all of its data would be available. If node1 fails, then node 2's data will no longer be available. If node 3 fails, its data will no longer beavailable, since node 2 is also down and could not fill in for it. In this case, nodes 1 and 3 areconsidered critical nodes. In a database with a K-safety level of 1, the node that contains the buddyprojection of a failed node and the node whose buddy projections were on the failed node willalways become critical nodes.

With node 2 down, either node 4 or 5 in the cluster could fail and the database would still have all ofits data available. For example, if node 4 fails, node 3 is able to use its buddy projections to fill in forit. In this situation, any further loss of nodes would result in a database shutdown, since all of thenodes in the cluster are now critical nodes. (In addition, if onemore node were to fail, half or more of



the nodes would be down, requiring Vertica to automatically shut down, nomatter if all of the datawere available or not.)

In a database with a K-safety level of 2, any node in the cluster could fail after node 2 and thedatabase would be able to continue running. For example, if in the 5-node cluster each nodecontained buddy projections for both its neighbors (for example, node 1 contained buddy projectionsfor both node 5 and node 2), then nodes 2 and 3 could fail and the database could continue running.Node 1 could fill in for node 2, and node 4 could fill in for node 3. Due to the requirement that half ormore nodes in the cluster be available in order for the database to continue running, the clustercould not continue running if node 5 were to fail as well, even though nodes 1 and 4 both have buddyprojections for its data.

K-Safety RequirementsWhen creating projections with the Database Designer, projection definitions that meet K-Safedesign requirements are recommended andmarked with the K-safety level. Note the output fromrunning the optimized design script generated by the Database Designer in the following example:

=> \i VMart_Schema_design_opt_1.sql

CREATE PROJECTIONCREATE PROJECTIONmark_design_ksafe----------------------Marked design 1-safe(1 row)

Determining K-SafetyTo determine the K-safety state of a running database, execute the following SQL command:

=> SELECT current_fault_tolerance FROM system;



current_fault_tolerance----------------

1(1 row)

Monitoring K-SafetyMonitoring tables can be accessed programmatically to enable external actions, such as alerts.Youmonitor the K-safety level by polling the SYSTEM table column and checking the value. SeeSYSTEM in the SQLReferenceManual.

Finding Critical NodesYou can view a list of critical nodes in your database by querying the v_monitor.critical_nodestable:

=> SELECT * FROM v_monitor.critical_nodes;

node_name--------------------v_exampleDB_node0001v_exampleDB_node0003

(2 rows)

High Availability and RecoveryVertica's unique approach to failure recovery is based on the distributed nature of a database. AnVertica database is said to be K-safe if any node can fail at any given time without causing thedatabase to shut down. When the lost node comes back online and rejoins the database, it recoversits lost objects by querying the other nodes. For more information, see theManaging Nodes andMonitoring Recovery sections in the Administrator's Guide of Vertica Documentation.

K-Safety RequirementsYour databasemust have aminimum number of nodes to be able to have a K-safety level greaterthan zero, as shown in the following table.:

K-level Number of Nodes Required

0 1+

1 3+

2 5+

K 2K+1

Note: Vertica does not officially support values of K higher than 2.



The value of K can be 1 or 2 only when the physical schema designmeets certain redundancyrequirements. To create designs that are K-safe, Micro Focus recommends that you use theDatabase Designer.

By default, Vertica creates K-safe superprojections when the database has a K-safety greater than0. When creating projections with Database Designer, projection definitions that meet K-safedesign requirements are recommended andmarked with the K-safety level. Database Designercreates a script that uses theMARK_DESIGN_KSAFE function to set the K-safety of the physicalschema to 1:

=> SELECT MARK_DESIGN_KSAFE (1);MARK_DESIGN_KSAFE

----------------------Marked design 1-safe(1 row)

Determining K-SafetyTo determine the K-safety state of a running database, run the following SQL command:

-> SELECT current_fault_tolerance FROM system;current_fault_tolerance

-------------------------1

(1 row)

Monitoring K-SafetyMonitoring tables can be accessed programmatically to enable external actions, such as alerts.Youmonitor the K-safety level by polling the SYSTEM table and checking the value.

Loss of K-SafetyWhen K nodes in your cluster fail, your database continues to run, although performance isaffected. Further node failures could potentially cause the database to shut down if the failed node'sdata is not available from another functioning node in the cluster.



High Availability With ProjectionsTo ensure high availability and recovery for database clusters of three or more nodes, Vertica:

l Replicates small, unsegmented projections

l Creates buddy projections for large, segmented projections.

Replication (Unsegmented Projections)When it creates projections, Database Designer does not segment projections for small tables;rather it replicates them, creating and storing duplicates of these projections on all nodes within thedatabase.

Replication ensures:

l Distributed query execution across multiple nodes.

l High availability and recovery. In a K-safe database, replicated projections serve as buddyprojections. This means that a replicated projection on any node can be used for recovery.

Note:We recommend you use Database Designer to create your physical schema. If youchoose not to, be sure to segment all large tables across all database nodes, and replicatesmall, unsegmented table projections on all database nodes.

The following illustration shows two projections, B and C, replicated across a three node cluster.

Buddy Projections (Segmented Projections)Vertica creates buddy projections, which are copies of segmented projections that are distributedacross database nodes. (See Projection Segmentation.) Vertica ensures that segments thatcontain the same data are distributed to different nodes. This ensures that if a node goes down, allthe data is available on the remaining nodes. Vertica distributes segments to different nodes byusing offsets. For example, segments that comprise the first buddy projection (A_BP1) would beoffset from projection A by one node and segments from the second buddy projection (A_BP2)would be offset from projection A by two nodes.



The following illustration shows the segmentation for a projection called A and its buddyprojections, A_BP1 and A_BP2, for a three node cluster.

The following illustration shows how Vertica uses offsets to ensure that every node has a full set ofdata for the projection.

This example illustrates how one projection and its buddies are segmented across nodes.However, each node can store a collection of segments from various projections.

High Availability With Fault GroupsFault groups let you configure Vertica for your physical cluster layout in order to reduce the risk ofcorrelated failures inherent in your environment. Correlated failures occur when two or more nodesfail as a result of a single failure event. These failures often occur due to shared resources, such aspower, networking, or storage.

Although correlated failure scenarios cannot always be avoided, Vertica helps youminimize the riskof failure by letting you define fault groups on your cluster. Vertica then uses your fault groupsdefinitions to distribute data segments across the cluster, so the database stays up if a singlefailure event occurs.

The following list describes some of the causes of correlated failures:



l Servers in the same data center machine rack:

n Power loss to amachine rack could cause all nodes on those servers to fail

n User error duringmaintenance could affect an entire machine rack

l Multiple virtual machines that reside on a single VM host server

l Nodes that use other shared infrastructure that could cause correlated failures of a subset ofnodes

Note: If your cluster layout is managed by a single network switch, a switch failure wouldcause a single point of failure. Fault groups cannot help with single-point failures.

Vertica supports complex, hierarchical fault groups of different shapes and sizes. Fault groups areintegrated with elastic cluster and large cluster arrangements to provide added cluster flexibility andreliability.

Automatic fault groupsWhen you configure a cluster of 120 nodes or more, Vertica automatically creates fault groupsaround control nodes. Control nodes are a subset of cluster nodes that manage spread (controlmessaging). Vertica places nodes that share a control node in the same fault group. See LargeCluster in the Administrator's Guide for details.

User-defined fault groupsIf your cluster layout has the potential for correlated failures, or if you want to influence whichcluster hosts manage control messaging, you should define your own fault groups.

Example cluster topologyThe following image provides an example of hierarchical fault groups configured on a single cluster:

l Fault group FG–A contains nodes only

l Fault group FG-B (parent) contains child fault groups FG-C and FG-D. Each child fault group alsocontain nodes.

l Fault group FG–E (parent) contains child fault groups FG-F and FG-G. The parent fault group FG–Ealso contains nodes.



How to create fault groupsBefore you define fault groups, youmust have a thorough knowledge of your physical clusterlayout. Fault groups require careful planning.

The simplest way to define fault groups is to create an input file of your cluster arrangement. Youpass the file to an Vertica-supplied script, which returns to the console the SQL statements youneed to run. See Fault Groups in the Administrator's Guide for details.

Hybrid Storage ModelTo support DataManipulation Language (DML) commands (INSERT, UPDATE, and DELETE) andbulk load operations (COPY), intermixed with queries in a typical data warehouse workload, Verticaimplements the storagemodel shown in the illustration below. This model is the same on eachVertica node.



Write Optimized Store (WOS) is amemory-resident data structure for storing INSERT, UPDATE,DELETE, and COPY (without DIRECT hint) actions. Like the ReadOptimized Store (ROS), theWOS is arranged by projection. To support very fast data load speeds, theWOS stores recordswithout data compression or indexing. A projection in theWOS is sorted only when it is queried. Itremains sorted as long as no further data is inserted into it. TheWOS organizes data by epoch andholds both committed and uncommitted transaction data.

The TupleMover (TM) is the Vertica database optimizer component that moves data frommemory(WOS) to disk (ROS). The TM also combines small ROS containers into larger ones, and purgesdeleted data. Duringmoveout operations, the TM is also responsible for adhering to any storagepolicies that are in effect for the storage location. The TupleMover runs in the background,performing some tasks automatically (ATM) at time intervals determined by its configurationparameters. For information about changing the TM configuration parameters, see TupleMoverParameters in the Administrator's Guide for further information.



The ReadOptimized Store (ROS) is a highly optimized, read-oriented, disk storage structure,organized by projection. The ROS makes heavy use of compression and indexing. You can use theCOPY...DIRECT and INSERT (with /*+direct*/ hint) statements to load data directly into the ROS.

Note: Vertica allows optional spaces before and after the plus sign in direct hints (between the/* and the +).

A grouped ROS is a highly-optimized, read-oriented physical storage structure organized byprojection. A grouped ROS makes heavy use of compression and indexing. Unlike a ROS,however, a grouped ROS stores data for two or more grouped columns in one disk file.

The COPY command is designed for bulk load operations and can load data into theWOS or theROS.

Logical SchemaDesigning a logical schema for an Vertica database is no different than designing for any other SQLdatabase. A logical schema consists of objects such as Schemas, Tables, Views and ReferentialIntegrity constraints that are visible to SQL users. Vertica supports any relational schema design ofyour choice.

For more information, see Designing a Logical Schema in the Administrator's Guide.

Physical SchemaIn traditional database architectures, data is primarily stored in tables. Additionally, secondarytuning structures such as index andmaterialized view structures are created for improved queryperformance. In contrast, tables do not occupy any physical storage at all in Vertica. Instead,physical storage consists of collections of table columns called projections.

Projections store data in a format that optimizes query execution. They are similar to materializedviews in that they store result sets on disk rather than compute them each time they are used in aquery. The result sets are automatically refreshed whenever data values are inserted or loaded.

Using projections provides the following benefits:

l Projections compress and encode data to greatly reduce the space required for storing data.Additionally, Vertica operates on the encoded data representation whenever possible to avoidthe cost of decoding. This combination of compression and encoding optimizes disk space whilemaximizing query performance. See Projection Performance.

l Projections can be segmented or replicated across database nodes depending on their size. Forinstance, projections for large tables can be segmented and distributed across all nodes.Unsegmented projections for small tables can be replicated across all nodes in the database.See Projection Performance.

l Projections are transparent to end-users of SQL. The Vertica query optimizer automaticallypicks the best projections to use for any query.



l Projections also provide high availability and recovery. To ensure high availability and recovery,Vertica duplicates table columns on at least K+1 nodes within the cluster. Thus, if onemachinefails in a K-Safe environment, the database continues to operate normally using duplicate dataon the remaining nodes. Once the node resumes its normal operation, it automatically recoversits data and lost objects by querying other nodes. See High Availability and Recovery for anoverview of this feature and High Availability With Projections for an explanation of how Verticauses projections to ensure high availability and recovery.

How Projections Are CreatedFor each table in the database, Vertica requires aminimum of one projection, called asuperprojection. A superprojection is a projection for a single table that contains all the columns inthe table.

To get your database up and running quickly, Vertica automatically creates a superprojection whenyou load or insert data into an existing table created using the CREATE TABLE or CREATETEMPORARY TABLE statement.

By creating a superprojection for each table in the database, Vertica ensures that all SQL queriescan be answered. Default superprojections do not exploit the full power of Vertica. Therefore,Vertica recommends loading a sample of your data and then running the Database Designer tocreate optimized projections. Database Designer creates new projections that optimize yourdatabase based on its data statistics and the queries you use. The Database Designer:

1. Analyzes your logical schema, sample data, and sample queries (optional)

2. Creates a physical schema design (projections) in the form of a SQL script that can bedeployed automatically or manually

In most cases, the designs created by the Database Designer provide excellent query performancewithin physical constraints. The Database Designer uses sophisticated strategies to provideexcellent ad-hoc query performance while using disk space efficiently. If you prefer, you can designcustom projections.

For more information about creating projections, see Designing a Physical Schema in theAdministrator's Guide.

Anatomy of a ProjectionThe CREATE PROJECTION statement defines the individual elements of a projection, as thefollowing graphic shows.



The previous example contains the following significant elements:

Column List and EncodingLists every column in the projection and defines the encoding for each column. Unlike traditionaldatabase architectures, Vertica operates on encoded data representations. Therefore, Micro Focusrecommends that you use data encoding because it results in less disk I/O.

Base QueryIdentifies all the columns to incorporate in the projection through column name and table namereferences. The base query for large table projections can contain PK/FK joins to smaller tables.

Sort OrderThe sort order optimizes for a specific query or commonalities in a class of queries based on thequery predicate. The best sort orders are determined by theWHERE clauses. For example, if aprojection's sort order is (x, y), and the query's WHERE clause specifies (x=1 AND y=2), all ofthe needed data is found together in the sort order, so the query runs almost instantaneously.

You can also optimize a query by matching the projection's sort order to the query's GROUP BYclause. If you do not specify a sort order, Vertica uses the order in which columns are specified inthe column definition as the projection's sort order.

TheORDER BY clause specifies a projection's sort order, which localizes logically grouped valuesso that a disk read can pick upmany results at once. For maximum performance, do not sortprojections on LONGVARBINARY and LONGVARCHAR columns.



SegmentationThe segmentation clause determines whether a projection is segmented across nodes within thedatabase. Segmentation distributes contiguous pieces of projections, called segments, for largeandmedium tables across database nodes. Segmentationmaximizes database performance bydistributing the load. Use SEGMENTED BY HASH to segment large table projections.

For small tables, use the UNSEGMENTED keyword to direct Vertica to replicate these tables,rather than segment them. Replication creates and stores identical copies of projections for smalltables across all nodes in the cluster. Replication ensures high availability and recovery.

For maximum performance, do not segment projections on LONGVARBINARY and LONGVARCHAR columns.

Projection ConceptsFor each table in the database, Vertica requires a projection, called a superprojection. Asuperprojection is a projection for a single table that contains all the columns in the table. Bycreating a superprojection for each table in the database, Vertica ensures that all SQL queries canbe answered.

In addition to superprojections, you can optimize your queries by creating one or more projectionsthat contain only the subset of table columns required to process the query. These projections arecalled query-specific projections.

Projections can contain joins between tables that are connected by PK/FK constraints. Theseprojections are called pre-join projections. Pre-join projections can have only inner joins betweentables on their primary and foreign key columns. Outer joins are not allowed. Pre-join projectionsprovide a significant performance advantage over joining tables at query run-time.

Projection PerformanceVertica provides the followingmethods for maximizing the performance of all projections:

Encoding and CompressionVertica operates on encoded data representations. Therefore, Micro Focus encourages you to usedata encoding whenever possible because it results in less disk I/O and requires less disk space.For a description of the available encoding types, see encoding-type in the SQLReferenceManual.

Sort OrderThe sort order optimizes for a specific query or commonalities in a class of queries based on thequery predicate. For example, if theWHERE clause of a query is (x=1 AND y=2) and a projection issorted on (x, y), the query runs almost instantaneously. It is also useful for sorting a projection tooptimize a group by query. Simply match the sort order for the projection to the query group byclause.



SegmentationSegmentation distributes contiguous pieces of projections, called segments, for large tables acrossdatabase nodes. This maximizes database performance by distributing the load. See ProjectionSegmentation.

In many cases, the performance gain for superprojections provided through thesemethods issufficient enough that creating additional query-specific projections is unnecessary.

Projection SegmentationProjection segmentation splits individual projections into chunks of data of similar size, calledsegments. One segment is created for and stored on each node. Projection segmentation provideshigh availability and recovery and optimizes query execution. Specifically, it:

l Ensures high availability and recovery through K-Safety.

l Spreads the query execution workload across multiple nodes.

l Allows each node to be optimized for different query workloads.

Vertica segments large tables, to spread the query execution workload across multiple nodes.Vertica does not segment small tables; instead, Vertica replicates small projections, creating aduplicate of each unsegmented projection on each node.

Hash SegmentationVertica uses hash segmentation to segment large projections. Hash segmentation allows you tosegment a projection based on a built-in hash function that provides even distribution of data acrossmultiple nodes, resulting in optimal query execution. In a projection, the data to be hashed consistsof one or more column values, each having a large number of unique values and an acceptableamount of skew in the value distribution. Primary key columns that meet the criteria could be anexcellent choice for hash segmentation.

Projection NamingVertica uses a standard naming convention for projections. The first part of the projection name isthe name of the associated table, followed by characters that Vertica appends to the table name;this string is called the projection's base name. All buddy projections have the same base name sothey can be identified as a group.

Vertica then appends a suffix that indicates the projection type. The projection type suffix,described in the following table, can be:

l _super

l _<node_name>



l _b<offset>

Projection Type Suffix Examples

Unsegmented or segmented(when only one auto projectionwas created with the table)

_super Example:

customer_dimension_vmart_super

Unique name example:

customer_dimension_vmart_super_v1

Replicated (unsegmented) onall nodes

_<node_name> Example:

customer_dimension_vmart_node01customer_dimension_vmart_node02customer_dimension_vmart_node03


customer_dimension_vmart_v1_node01customer_dimension_vmart_v1_node02customer_dimension_vmart_v1_node03

Segmented (whenmultiplebuddy projections were createdwith the table)

_b<offset> Example:

customer_dimension_vmart_b0customer_dimension_vmart_b1


customer_dimension_vmart_v1_b0customer_dimension_vmart_v2_b1

If the projection-naming convention will result in a duplicate name, Vertica automatically appendsv1 or v2 to the projection name. Vertica uses this naming convention for projections created by theCREATE TABLE statement or by the Database Designer.

Note: If the projection name exceeds themaximum length, Vertica truncates the projectionname.

Database SetupThe process of setting up an Vertica database is described in detail in the Administrator's Guide. Itinvolves the following tasks:

Prepare SQL Scripts and Data FilesThe first part of the setup procedure can be done well before Vertica is installed. It consists ofpreparing the following files:



l Logical schema script

l Loadable data files

l Load scripts

l Sample query script (training set)

Create the DatabaseThis part requires that Vertica be installed on at least one host. The following tasks are not insequential order.

l Use the Administration Tools to:

n Create a database

n Connect to the database

l Use the Database Designer to design the physical schema.

l Use the vsql interactive interface to run SQL scripts that:

n Create tables and constraints

n Create projections

Test the Empty Databasel Test for sufficient projections using the sample query script

l Test the projections for K-safety

Test the Partially-Loaded Databasel Load the dimension tables

l Partially load the fact table

l Check system resource usage

l Check query execution times

l Check projection usage



Complete the Fact Table Loadl Monitor system usage

l Complete the fact table load

Set up SecurityFor security-related tasks, see Implementing Security.

l [Optional] Set up SSL

l [Optional] Set up client authentication

l Set up database users and privileges

Set up Incremental LoadsSet up periodic ("trickle") loads.

Database ConnectionsYou can connect to an Vertica database in the following ways:

l Interactively using the vsql client, as described in Using vsql in the Administrator's Guide.

vsql is a character-based, interactive, front-end utility that lets you type SQL statements andsee the results. It also provides a number of meta-commands and various shell-like features thatfacilitate writing scripts and automating a variety of tasks.

You can run vsql on any node within a database. To start vsql, use the Administration Tools orthe shell command described in Using vsql.

l Programmatically using the JDBC driver provided by Vertica, as described in ProgrammingJDBC Client Applications in the Programmer's Guide.

An abbreviation for Java Database Connectivity, JDBC is a call-level application programminginterface (API) that provides connectivity between Java programs and data sources (SQLdatabases and other non-relational data sources, such as spreadsheets or flat files). JDBC isincluded in the Java 2 Standard and Enterprise editions.

l Programmatically using theODBC driver provided by Vertica, as described in ProgrammingODBC Client Applications in the Programmer's Guide.

An abbreviation for Open DataBase Connectivity, ODBC is a standard application programminginterface (API) for access to databasemanagement systems.



l Programmatically using theADO.NET driver provided by Vertica, as described in ProgrammingADO.NET Applications in the Programmer's Guide.The Vertica driver for ADO.NET allows applications written in C# and Visual Studio to read datafrom, update, and load data into Vertica databases. It provides a data adapter that facilitatesreading data from a database into a data set, and then writing changed data from the data setback to the database. It also provides a data reader (VerticaDataReader) for reading data andautocommit functionality for committing transactions automatically.

l Programmatically usingPerl and the DBI driver, as described in Programming Perl ClientApplications in the Programmer's Guide.

Perl is a free, stable, open source, cross-platform programming language licensed under itsArtistic License, or the GNU General Public License (GPL).

l Programmatically usingPython and the pyodbc driver, as described in Programming PythonClient Applications in the Programmer's Guide.

Python is a free, agile, object-oriented, cross-platform programming language designed toemphasize rapid development and code readability.

Micro Focus recommends that you deploy Vertica as the only active process on eachmachine inthe cluster and connect to it from applications on different machines. Vertica expects to use allavailable resources on themachine, and to the extent that other applications are also using theseresources, suboptimal performance could result.

The Administration ToolsVertica provides a set of tools that allows you to perform administrative tasks quickly and easily.Most of the database administration tasks in Vertica can be done using the Administration Tools.

Always run the Administration Tools using the Database Administrator account on theAdministration host, if possible. Make sure that no other Administration Tools processes arerunning.

If the Administration host is unresponsive, run the Administration Tools on a different node in thecluster. That node permanently takes over the role of Administration host.

A man page is available for admintools. If you are running as the dbadmin user, simply type: manadmintools. If you are running as a different user, type: man -M /opt/vertica/man admintools.

Running the Administration ToolsAt the Linux command line:

$ /opt/vertica/bin/admintools [ -t | --tool ] toolname [ options ]

toolname Is one of the tools described in the Administration Tools Reference.



options -h--help Shows a brief help message and exits.

-a--help_all Lists all command-line subcommands and options as describedinWriting Administration Tools Scripts.

If you omit toolname and options parameters, theMainMenu dialog box appears inside yourconsole or terminal window with a dark blue background and a title on top. The screen capturesused in this documentation set are cropped down to the dialog box itself, as shown below.

If you are unfamiliar with this type of interface, read Using the Administration Tools Interface beforeyou do anything else.

First Time OnlyThe first time you log in as the Database Administrator and run the Administration Tools, the userinterface displays.

1. In the EULA (end-user license agreement) window, type accept to proceed.

A window displays, requesting the location of the license key file you downloaded from theMicro Focus Web site. The default path is /tmp/vlicense.dat.

2. Type the absolute path to your license key (for example, /tmp/vlicense.dat) and click OK.



Between DialogsWhile the Administration Tools are working, you see the command line processing in a windowsimilar to the one shown below. Do not interrupt the processing.



Management ConsoleManagement Console (MC) is a databasemanagement tool that provides a unified view of yourVertica cluster. Through a single point of access—a browser connection—you can create, import,manage, andmonitor multiple databases on one or more clusters. You can also create andmanageMC users that youmap to an Vertica database and thenmanage on theMC interface.

What You Can Do with Management Consolel Create a database cluster on hosts that do not have Vertica installed

l Create, import, andmonitor multiple Vertica databases on one or more clusters from a singlepoint of control

l CreateMC users and grant them access toMC andMC-managed databases

l Manage user information andmonitor their activity onMC

l Configure database parameters and user settings dynamically

l Access a single message box of alerts for all managed databases

l Export all databasemessages or log/query details to a file

l View license usage and conformance

l Diagnose and resolveMC-related issues through a browser

l Access a quick link to recent databases and clusters

l View dynamic metrics about your database cluster

Management Console provides some, but not all of the functionality that the Administration Toolsprovides. In addition, MC provides extended functionality not available in the Administration Tools,such as a graphical view of your Vertica database and detailedmonitoring charts and graphs,described inMonitoring Vertica UsingMC. See Administration Tools andManagement Console inthe Administrator's Guide.

How to Get MCDownload the Vertica server rpm and theMC package frommyVertica portal. You then have twooptions:

l Install Vertica andMC at the command line and import one or more Vertica database clustersinto theMC interface

l Install Vertica through theMC itself

See the Installation Guide for details.



http://my.vertica.com/

What You Need to KnowIf you plan to useMC, review the following topics in the Administrator's Guide:

If you want to ... See ...

Create a new, empty Vertica database Create a Database on a Cluster

Import into MC an existing Vertica database cluster Managing Database Clusters onMC

Understand how MC users are different from databaseusers

About MC Users

Read about theMC privilegemodel About MC privileges and roles

Create new MC users Creating anMC user

Grant MC users privileges on one or moreMC-managedVertica databases

Granting database access toMCusers

Use Vertica functionality through theMC interface UsingManagement Console

Monitor MC andMC-managed Vertica databases Monitoring Vertica UsingManagement Console

Management Console ArchitectureMC accepts HTTP requests from a client web browser, gathers information from the Verticadatabase cluster, and returns that information back to the browser for monitoring.

MC ComponentsThe primary components that driveManagement Console are an application/web server and agentsthat get installed on each node in the Vertica cluster.

The following diagram is a logical representation of MC, theMC user's interface, and the databasecluster nodes.



Application/web ServerThe application server hosts MC's web application and uses port 5450 for node-to-MCcommunication and to perform the following jobs:

l Manage one or more Vertica database clusters

l Send rapid updates fromMC to the web browser

l Store and report MC metadata, such as alerts and events, current node state, andMC users, ona lightweight, embedded (Derby) database

l Retain workload history

MC AgentsMC agents are internal daemon process that run on each Vertica cluster node. The default agentport, 5444, must be available for MC-to-node and node-to-node communications. Agents monitorMC-managed Vertica database clusters and communicate with MC to provide the followingfunctionality:

l Provide local access, command, and control over database instances on a given node, usingfunctionality similar to Administration Tools

l Report log-level data from the Administration Tools and Vertica log files

l Cache details from long-running jobs—such as create/start/stop database operations—that youcan view through your browser

l Track changes to data-collection andmonitoring utilities and communicate updates toMC

l Communicate between all cluster nodes andMC through a webhook subscription, whichautomates information sharing and reports on cluster-specific issues like node state, alerts,events, and so on



See Alsol

Management Console SecurityThrough a single point of control, theManagement Console (MC) platform is designed tomanagemultiple Vertica clusters, all whichmight have differing levels and types of security, such as usernames and passwords and LDAP authentication. You can alsomanageMC users who havevarying levels of access across these components.

OAuth and SSLMC uses a combination of OAuth (Open Authorization), Secure Socket Layer (SSL), and locally-encrypted passwords to secure HTTPS requests between a user's browser andMC, as well asbetweenMC and the agents. Authentication occurs throughMC and between agents within thecluster. Agents also authenticate and authorize jobs.

TheMC configuration process sets up SSL automatically, but youmust have the openssl packageinstalled on your Linux environment first.

See the following topics in the in the Administrator's Guide for more information:

l SSL Prerequisites

l Implementing SSL

l Generating certifications and keys for MC

l Importing a new certificate toMC

User Authentication and AccessMC provides two authentication schemes for users: LDAP orMC. You can use only onemethod ata time. For example, if you chose LDAP, all MC users will be authenticated against yourorganization's LDAP server.

You set LDAP authentication up throughMC Settings > Authentication on theMC interface.

Note: MC uses LDAP data for authentication purposes only—it does not modify userinformation in the LDAP repository.

TheMC authenticationmethod stores MC user information internally and encrypts passwords.TheseMC users are not system (Linux) users; they are accounts that have access toMC and,optionally, to one or moreMC-managed Vertica databases through theMC interface.



Management Console also has rules for what users can see when they sign in toMC from a clientbrowser. These rules are governed by access levels, each of which is made up of a set of roles.

See the following topics in the Administrator's Guide for more information:

l About MC users

l About MC privileges and roles

l Creating anMC user

Management Console Home PageTheMC Home page is the entry point to all MC-managed Vertica database clusters andMC users.Information on this page, as well as throughout theMC interface, will appear or be hidden, based onthe signed-on user's permissions (access levels). Layout and navigation are described in UsingManagement Console.

Database DesignerVertica's Database Designer is a tool that:

l Analyzes your logical schema, sample data, and, optionally, your sample queries.

l Creates a physical schema design (a set of projections) that can be deployed automatically (ormanually).



l Can be used by anyone without specialized database knowledge (even business users can runDatabase Designer).

l Can be run and re-run any time for additional optimization without stopping the database.

There are three ways to run Database Designer:

l Using theManagement Console, as described UsingManagement Console to Create a Design

l Programmatically, using the steps described in About Running Vertica Programmatically.

l Using the Administration Tools by selectingConfiguration Menu > Run Database Designer.For details, see Using the Administration Tools to Create a Design

There are two types of designs you can create with Database Designer:

l A comprehensive design, which allows you to create new projections for all tables in yourdatabase.

l An incremental design, which creates projections for all tables referenced in the queries yousupply.

Some of the benefits that Database Designer provides:

l Accepts up to 100 queries in the query input file for an incremental design.

l Accepts unlimited queries for a comprehensive design.

l Produces higher quality designs by considering UPDATE and DELETE statements.

In most cases, the designs created by Database Designer provide excellent query performancewithin physical constraints. Database Designer uses sophisticated strategies to provide excellentad-hoc query performance while using disk space efficiently.

See Alsol Physical Schema

l Creating a Database Design

Database SecurityVertica secures access to the database and its resources by enabling you to control who hasaccess to the database and what they are authorized to do with database resources once they havegained access. See Implementing Security.



Data Loading and ModificationSQL datamanipulation language (DML) commands INSERT, UPDATE, and DELETE perform thesame functions in Vertica as they do in row-oriented databases. These commands follow the SQL-92 transactionmodel and can be intermixed.

In Vertica, the COPY statement is designed for bulk loading data into the database. COPY readsdata from text files or data pipes and inserts it intoWOS (memory) or directly into the ROS (disk).COPY automatically commits itself and any current transaction but is not atomic; some rows couldbe rejected. Note that COPY does not automatically commit when copying data into temporarytables.

Note: You can use the COPY statement's NOCOMMIT option to prevent COPY fromcommitting a transaction when it finishes copying data. You often want to use this option whensequentially running several COPY statements to ensure the data in the bulk load is eithercommitted or rolled back at the same time. Also, combiningmultiple smaller data loads into asingle transaction allows Vertica tomore efficiently load the data. See the entry for the COPYstatement in the SQLReferenceManual for more information.

You can usemultiple, simultaneous database connections to load and/or modify data.

Workload ManagementVertica provides a sophisticated resourcemanagement scheme that allows diverse, concurrentworkloads to run efficiently on the database. For basic operations, the built-in GENERAL pool ispre-configured based on RAM andmachine cores, but you can customized this pool to handlespecific concurrency requirements.

You can also define new resource pools that you configure to limit memory usage, concurrency,and query priority. You can then optionally restrict each database user to use a specific resourcepool, which control memory resources used by their requests.

User-defined pools are useful if you have competing resource requirements across differentclasses of workloads. Example scenarios include:

l A large batch job takes up all server resources, leaving small jobs that update a web page tostarve, which can degrade user experience.

In this scenario, you can create a resource pool to handle web page requests and ensure usersget resources they need. Another option is to create a limited resource pool for the batch job, sothe job cannot use up all system resources.

l A certain application has lower priority than other applications, and you would like to limit theamount of memory and number of concurrent users for the low-priority application.

In this scenario, you could create a resource pool with an upper limit on the query's memory andassociate the pool with users of the low-priority application.



Formore information, best practices, and additional scenarios, seeManagingWorkload Resourcesin the Administrator's Guide.



SQL OverviewAn abbreviation for Structured Query Language, SQL is a widely-used, industry standard datadefinition and datamanipulation language for relational databases.

Note: In Vertica, use a semicolon to end a statement or to combinemultiple statements on oneline.

Vertica Support for ANSI SQL StandardsVertica SQL supports a subset of ANSI SQL-99.

See BNF Grammar for SQL-99

Support for Historical QueriesUnlikemost databases, the DELETE command in Vertica does not delete data; it marks records asdeleted. The UPDATE command performs an INSERT and a DELETE. This behavior is necessaryfor historical queries. See Historical (Snapshot) Queries in the Programmer's Guide.

JoinsVertica supports typical data warehousing query joins. For details, see Joins in the Programmer'sGuide.

Vertica also provides the INTERPOLATE predicate, which allows for a special type of join. Theevent series join is an Vertica SQL extension that lets you analyze two event series when theirmeasurement intervals don’t align precisely—such as when timestamps don't match. These joinsprovide a natural and efficient way to query misaligned event data directly, rather than having tonormalize the series to the samemeasurement interval. See Event Series Joins in theProgrammer's Guide for details.

TransactionsSession-scoped isolation levels determine transaction characteristics for transactions within aspecific user session. You set them through the SET SESSION CHARACTERISTICS command.Specifically, they determine what data a transaction can access when other transactions arerunning concurrently. See Transactions in the Concepts Guide.

Concepts GuideSQLOverview


http://savage.net.au/SQL/sql-99.bnf.html


Concepts GuideSQLOverview

About Query ExecutionWhen you submit a query, the initiator chooses the projections to use, optimizes and plans thequery execution, and logs the SQL statement to its log. Planning and optimization are quick,requiring at most a few milliseconds.

Based on the tables and projections chosen, the query plan that the optimizer produces isdecomposed into “mini-plans.” Thesemini-plans are distributed to the other nodes, known asexecutors, to handle, for example, other segments of a segmented fact table. (The initiator nodetypically does executor work as well.) The nodes process themini-plans in parallel, interspersedwith datamovement operations.

The query execution proceeds in data-flow style, with intermediate result sets (rows) flowingthrough network connections between the nodes as needed. Some, but not all, of the tasksassociated with a query are recorded in the executors' log files.

In the final stages of executing a query plan, somewrapup work is done at the initiator, such as:

l Combining results in a grouping operation

l Mergingmultiple sorted partial result sets from all the executors

l Formatting the results to return to the client

The initiator has a little more work to do than the other nodes, but if the projections are well designedfor the workload, the nodes of the cluster sharemost of the work of executing expensive queries.

Some small queries, for example, queries on replicated dimension tables, can be executed locally.In these types of queries, the query planning avoids unnecessary network communication.

For detailed information about writing and executing queries, seeWriting Queries in theProgrammer's Guide.

Snapshot Isolation ModeVertica can run any SQL query in snapshot isolationmode in order to obtain the fastest possibleexecution. To be precise, snapshot isolationmode is actually a form of a historical query. Thesyntax is:

AT EPOCH LATEST SELECT...

The command queries all data in the database up to but not including the current epoch withoutholding a lock or blocking write operations, which could cause the query tomiss rows loaded byother users up to (but nomore than) a specific number of minutes before execution.

Historical Queries(missing or bad snippet)Historical queries, also known as snapshot queries, are useful because they access data in pastepochs only. Historical queries do not need to hold table locks or block write operations because

Concepts GuideAbout Query Execution


they do not return the absolute latest data. Their content is private to the transaction and valid onlyfor the length of the transaction.

Historical queries behave in the samemanner regardless of transaction isolation level. Historicalqueries observe only committed data, even excluding updates made by the current transaction,unless those updates are to a temporary table.

Be aware that there is only one snapshot of the logical schema. This means that any changes youmake to the schema are reflected across all epochs. If, for example, you add a new column to atable and you specify a default value for the column, all historical epochs display the new columnand its default value.

The DELETE command in Vertica does not actually delete data; it marks records as deleted. (TheUPDATE command is actually a combined INSERT and a DELETE.) Thus, you can control howmuch deleted data is stored on disk. For more information, seeManaging Disk Space in theAdministrator's Guide.

Concepts GuideAbout Query Execution


TransactionsSession-scoped isolation levels determine transaction characteristics for transactions within aspecific user session. You set them through the SET SESSION CHARACTERISTICS command.Specifically, they determine what data a transaction can access when other transactions arerunning concurrently.

A transaction retains its isolation level until it completes, even if the session's transaction isolationlevel changes mid-transaction. Vertica internal processes (such as the TupleMover and refreshoperations) and DDL operations are always run at SERIALIZABLE isolation level to ensureconsistency.

Although the Vertica query parser understands all four standard SQL isolation levels (READUNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE) for a user session,internally Vertica uses only READ COMMITTED and SERIALIZABLE. Vertica automatically translatesREAD UNCOMMITTED to READ COMMITTED and REPEATABLE READ to SERIALIZABLE. Therefore, theisolation level Vertica uses could bemore strict than the isolation level you choose.

By default, Vertica uses the READ COMMITTED isolation level. However, you can change theisolation level for the database or individual transactions. See Change Transaction Isolation Levels.

The following table highlights the behaviors of transaction isolation. For specific information see,SERIALIZABLE Isolation and READ COMMITTED Isolation.

Isolation Level Dirty Read Non Repeatable Read Phantom Read

READ COMMITTED Not Possible Possible Possible

SERIALIZABLE Not Possible Not Possible Not Possible

Implementation DetailsVertica supports conventional SQL transactions with standard ACID properties:

l ANSI SQL 92 style-implicit transactions. You do not need to run a BEGIN or STARTTRANSACTION command.

l No redo/undo log or two-phase commits.

l The COPY command automatically commits itself and any current transaction (except whenloading temporary tables). Micro Focus recommends that you COMMIT or ROLLBACK the currenttransaction before you use COPY. Please note that DDL statements are autocommitted.

Automatic RollbackReverts data in a database to an earlier state by discarding any changes to the database state thathave been performed by a transaction's statements. In addition, it releases any locks that thetransactionmight have held. A rollback can be done automatically in response to an error or throughan explicit ROLLBACK transaction.

Concepts GuideTransactions


Vertica supports transaction-level and statement-level rollbacks. A transaction-level rollbackdiscards all modifications made by a transaction. A statement-level rollback reverses just theeffects made by a particular statement. Most errors caused by a statement result in a statement-level rollback to undo the effects of the erroneous statement. Vertica uses ERROR messages toindicate this type of error. DDL errors, systemic failures, dead locks, and resource constraintsresult in transaction-level rollback. Vertica uses ROLLBACK messages to indicate this type of error.

To implement automatic, statement-level rollbacks in response to errors, Vertica automaticallyinserts an implicit savepoint after each successful statement one at a time. This marker allows thenext statement, and only the next statement, to be rolled back if it results in an error. If thestatement is successful, themarker automatically rolls forward. Implicit savepoints are available toVertica only and cannot be referenced directly.

To explicitly roll back an entire transaction, use the ROLLBACK statement. To explicitly roll backindividual statements, use explicit savepoints.

SavepointsA savepoint is a special mark inside a transaction that allows all commands run after the savepointwas established to be rolled back, restoring the transaction to its former state in which thesavepoint was established.

Savepoints are useful when creating nested transactions. For example, a savepoint could becreated at the beginning of a subroutine. That way, the result of the subroutine could be rolled back,if necessary.

Vertica supports using savepoints.

Use the SAVEPOINT statement to establish a savepoint, the RELEASE SAVEPOINT statement todestroy it, or the ROLLBACK TO SAVEPOINT statement to roll back all operations that occur within thetransaction after the savepoint was established.

READ COMMITTED IsolationA SELECT query sees a snapshot of the committed data at the start of the transaction. It also seesthe results of updates run within its transaction, even if they have not been committed. This isstandard ANSI SQL semantics for ACID transactions. Any SELECT query within a transactionshould see the transactions's own changes regardless of isolation level.

DML statements acquire write locks to prevent other READ COMMITTED transactions frommodifyingthe same data. SELECT statements do not acquire locks, which prevents read and write statementsfrom conflicting.

READ COMMITTED is the default isolation level used by Vertica. For most general querying purposes,the READ COMMITTED isolation effectively balances database consistency and concurrency.However, data can be changed by other transactions between individual statements within thecurrent transaction. This can result in nonrepeatable and phantom reads. Applications that requirecomplex queries and updates might need amore consistent view of the database. If this is thecase, use SERIALIZABLE isolation.

The following example illustrates reads and writes using READ COMMITTED isolation.



Session A Session B Description

SELECT C1 FROM T1;C1

----(0 rows)

The SELECTstatement inSession Areadscommitteddata fromT1.

COMMIT;

INSERT INTO T1 (C1) VALUES (1);OUTPUT

--------1

(1 row)

Session Ainserts arow, butdoes not yetcommit.


----1

(1 rows)


----(0 rows)

Session Areads theinserted rowbecause itwasinsertedduring thesametransaction.However,Session Bdoes notsee theinsertedvaluebecause itcan onlyreadcommitteddata.

COMMIT; Session Acommits theINSERT andends thetransaction.




----1(1 row)

TheSELECTstatement inSession Bnowobservesthe insertthat sessionAcommitted.

This is anexample ofa non-repeatableread.


----1

(1 row)

The SELECTstatement inSession Abegins anewtransaction.It sees thepreviousinsertedvaluebecause itwascommitted.

COMMIT;

READ COMMITTED isolation uses exclusive (X) write locks that aremaintained until the end of thetransaction. The following example illustrates this.



--------1

(1 row)

Thetransactionin session Aacquires aninsert (I)lock toinsert row 2into tableT1.



DELETE FROM T1 WHERE C1 >1; The DELETEstatement inSession B isblockedbecause thetransactioncannotacquire anexclusivelock (X lock)until theentiretransactionin Session Aiscompletedand theinsert (I)lock isreleased.


--------1

(1 rows)

Thetransactionin session Ainserts row3. (It alreadyhas aninsert (I)lock.)

COMMIT; The COMMITstatementends thetransactionin Session Aandreleases itsinsert (I)lock.

(2 rows) Thetransactionin Session Bobtains its Xlock anddeletesrows 2 and3.



See Alsol LOCKS

l SET SESSION CHARACTERISTICS

l Configuration Parameters

SERIALIZABLE IsolationSERIALIZABLE is the strictest level of SQL transaction isolation. Although this isolation levelpermits transactions to run concurrently, it creates the effect that transactions are running in serialorder. It acquires locks for both read and write operations, which ensures that successive SELECTcommands within a single transaction always produce the same results. SERIALIZABLE isolationestablishes the following locks:

l Table-level read locks are acquired on selected tables and released at the end of the transaction.This prevents a transaction frommodifying rows that are currently being read by anothertransaction.

l Table-level write locks are acquired on update and are released at the end of the transaction.This prevents a transaction from reading uncommitted changes to rows made within anothertransaction.

A SELECT sees, in effect, a snapshot of the committed data at the start of the transaction. It alsosees the results of updates run within its transaction, even if they have not been committed.

The following example illustrates locking within concurrent transactions running with SERIALIZABLEisolation.



----(0 rows)


----(0 rows)

Transactions in sessions A and Bacquire shared locks. Bothtransactions can read from table T1.

COMMIT; The COMMIT statement in Session 2ends the transaction and releases itsread lock.

INSERT INTO T1 (C1) VALUES (1); The transaction in Session A acquiresan exclusive lock (X lock) and insertsa new row.

COMMIT; The COMMIT statement in Session 1ends the transaction and releases itsX lock.



SELECT C1 FROM T1;C1--1(1 rows)

SELECT C1 FROM T1;C1--1(1 rows)

New transactions in Sessions A and Buse a SELECT statement to see thenew row previously created in SessionA. They acquire shared locks.

INSERT INTO T1 (C1) VALUES (2); The transaction in Session A isblocked from inserting

a row because it cannot upgrade to anX lock.

The advantage of SERIALIZABLE isolation is that it provides a consistent view. This is useful forapplications that require complex queries and updates. However, it reduces concurrency. Forexample, it is not possible to perform queries during a bulk load.

Additionally, applications using SERIALIZABLEmust be prepared to retry transactions due toserialization failures. Serialization failures can occur due to deadlocks. When a deadlock occurs,the transaction that is waiting for the lock automatically times out after five (5) minutes. Thefollowing example illustrates a condition that can create a deadlock.



----(0 rows)


----(0 rows)

Transactions in sessionsA and B acquire sharedlocks on table T1. Bothtransactions can read fromT1.

INSERT INTO T1 (C1) VALUES (1);

OUTPUT--------

1(1 row)

The transaction in SessionA is blocked because itcannot upgrade to anexclusive lock (X lock) onT1 unless the transactionin Session B releases itslock on T1.


--------1

(1 row)

The transaction in SessionB is blocked because itcannot acquire anexclusive lock (X lock)unless the transaction inSession A releases itslock on T1. Neithersession can proceedbecause each one iswaiting for the other.



ROLLBACK Vertica automaticallybreaks the deadlock byrolling back the transactionin Session B and releasingthe locks.

(1 rows)COMMIT; The transaction in sessionA is able to upgrade to anX lock on T1 and insertthe row.

Note: SERIALIZABLE isolation does not acquire locks on temporary tables, which are isolatedby their transaction scope.



International Languages and CharacterSets

This section describes how Vertica handles internationalization and character sets.

Unicode Character EncodingUTF-8 is an abbreviation for Unicode Transformation Format-8 (where 8 equals 8-bit) and is avariable-length character encoding for Unicode created by Ken Thompson and Rob Pike. UTF-8can represent any universal character in the Unicode standard, yet the initial encoding of bytecodes and character assignments for UTF-8 is coincident with ASCII (requiring little or no changefor software that handles ASCII but preserves other values).

All input data received by the database server is expected to be in UTF-8, and all data output byVertica is in UTF-8. TheODBC API operates on data in UCS-2 onWindows systems, and normallyUTF-8 on Linux systems. (A UTF-16ODBC driver is available for use with the DataDirect ODBCmanager.) JDBC and ADO.NET APIs operate on data in UTF-16. The client drivers automaticallyconvert data to and from UTF-8 when sending to and receiving data from Vertica using API calls.The drivers do not transform data loaded by executing a COPY or COPY LOCAL statement.

See Implement Locales for International Data Sets in the Administrator's Guide for details.

LocalesThe locale is a parameter that defines the user's language, country, and any special variantpreferences, such as collation. Vertica uses the locale to determine the behavior of various stringfunctions as well for collation for various SQL commands that require ordering and comparison; forexample, GROUP BY, ORDER BY, joins, the analytic ORDER BY clause, and so forth.

By default, the locale for the database is en_US@collation=binary (English US). You canestablish a new default locale that is used for all sessions on the database, as well as overrideindividual sessions with different locales. Additionally the locale can be set throughODBC, JDBC,and ADO.net.

See the following topics in the Administrator's Guide for details:

l Implement Locales for International Data Sets

l Supported Locales in the Appendix

String FunctionsVertica provides string functions to support internationalization. Unless otherwise specified, thesestring functions can optionally specify whether VARCHAR arguments should be interpreted asoctet (byte) sequences, or as (locale-aware) sequences of characters. This is accomplished byadding "USINGOCTETS" and "USINGCHARACTERS" (default) as a parameter to the function.

See String Functions in the SQLReferenceManual for details.

Concepts GuideInternational Languages and Character Sets


Character String LiteralsBy default, string literals ('...') treat back slashes literally, as specified in the SQL standard.

Tip: If you have used previous releases of Vertica and you do not want string literals to treatback slashes literally (for example, you are using a back slash as part of an escape sequence),you can turn off the StandardConformingStrings configuration parameter. SeeInternationalization Parameters in the Administrator's Guide. You can also use theEscapeStringWarning parameter to locate back slashes which have been incorporated intostring literals, in order to remove them.

See Character String Literals in the SQLReferenceManual for details.

Concepts GuideInternational Languages and Character Sets


Extending VerticaVertica lets you extend its capabilities through several different features:

l User-Defined SQL Functions let you define a function using Vertica SQL statements.

l User Defined Extensions and User Defined Functions are high-performance extensions toVertica's capabilities you develop using the Vertica Software Development Kit (SDK).

l External Procedures let you pipe data from Vertica through external programs or shell scripts toperform some form of processing on it.

User-Defined SQL FunctionsUser-Defined SQL Functions let you define and store commonly-used SQL expressions as afunction. User-Defined SQL Functions are useful for executing complex queries and combiningVertica built-in functions. You simply call the function name you assigned in your query.

A User-Defined SQL Function can be used anywhere in a query where an ordinary SQL expressioncan be used, except in the table partition clause or the projection segmentation clause.

User Defined Extensions and User DefinedFunctions

User Defined Extension (UDx) refers to all extensions to Vertica developed using the APIs in theVertica SDK. UDxs encompass functions such as User Defined Scalar Functions (UDSFs), andutilities such as the User Defined Load (UDL) feature that let you create custom data load routines.

Thanks to their tight integration with Vertica, UDxs usually have better performance than User-defined SQL functions or External Procedures.

User Defined Functions (UDFs) are a specific type of UDx. You use them in SQL statements toprocess data similarly to Vertica's own built-in functions. They give you the power of creating yourown functions that run just slightly slower than Vertica's own function.

The Vertica SDK uses the term UDx extensively, even for APIs that deal exclusively withdeveloping UDFs.

Concepts GuideExtending Vertica



Concepts GuideExtending Vertica

Get StartedTo get started using Vertica, follow the steps presented in the Getting Started Guide. The tutorialrequires that you install Vertica on one or more hosts as described in the Installation Guide.

Concepts GuideGet Started



Concepts GuideGet Started

We appreciate your feedback!If you have comments about this document, you can contact the documentation team by email. Ifan email client is configured on this system, click the link above and an email window opens withthe following information in the subject line:

Feedback on Concepts Guide (Vertica Analytic Database 7.0.x)

Just add your feedback to the email and click send.

If no email client is available, copy the information above to a new message in a webmail client,and send your feedback to [email protected].


mailto:[email protected]?subject=Feedback on Concepts Guide (Vertica Analytic Database 7.0.x)

Documents

HP Vertica Analytics Platform 7.0.x Concepts Guide Warranty TheonlywarrantiesforMicroFocusproductsandservicesaresetforthintheexpresswarrantystatementsaccompanyingsuchproductsandservices.Nothingherein