View
253
Download
2
Category
Preview:
Citation preview
www.flydata.com
How to Extract Data
From Amazon Redshift
Copyright © 2014 FlyData Inc. All rights reserved. www.flydata.com
How to Extract Data From Amazon Redshift
Copyright © 2014 FlyData Inc. All rights reserved.
After using FlyData to load your data into Amazon
Redshift, you may want to extract data from your
Redshift Tables to S3.
Here are some reasons that you might want to do this:
● You want to load data from your Redshift tables to
some other data source
● You want to load some unused data to S3
● Extract transformed data to S3
● You want to copy data from one Redshift cluster to
another
www.flydata.com
Whatever the reason, Redshift provides you with the
UNLOAD SQL command to accomplish this.
www.flydata.com
How to Use
UNLOAD
Copyright © 2014 FlyData Inc. All rights reserved. www.flydata.com
Copyright © 2014 FlyData Inc. All rights reserved. www.flydata.com
Before we start, let’s create a sample table “customer” as
defined below and insert a few records into it.
To unload all the records in this table to S3 at
location ‘s3://flydata-test-unload/unload-folder’, we
can run the following command..
This generates the following files:
www.flydata.com
Why did the command generate multiple files
even though there were few records in the table?
The reason is that I was running UNLOAD on a
single node cluster that has two slices.
By default, UNLOAD writes one or more files per
slice. Also, as you can see, each file takes the
prefix that was specified in the command
(customer_).
www.flydata.com
Let’s say, you want to copy data in these files back
to the cluster (or a different cluster). You can use
the below COPY command to do that:
www.flydata.com
Things to be Aware
of When Using
UNLOAD
Copyright © 2014 FlyData Inc. All rights reserved. www.flydata.com
Below is a list of things to be aware of when using
UNLOAD:
● Pipe (|) is the default delimiter. You can override this by
using DELIMITER AS ‘delimiter_char’ option.
● The S3 bucket specified in the command should be in
the same region as your cluster. If not, you’ll most likely
see an error when trying to UNLOAD.
● If there are quotes in the query, they need to be
escaped.
● You cannot use the LIMIT clause in the outer SELECT
statement of your query.
www.flydata.com
● (Cont. from previous page) So, if you try to execute an UNLOAD
command that has SELECT query like this:
● ...you will see an error like below!
● To work around this limitation, you can use a nested LIMIT
clause, the above query can be rewritten as:
www.flydata.com
● If the destination folder already contains unloaded data,
UNLOAD will fail with with “ERROR: Specified unload
destination on S3 is not empty”.
○ You should either manually delete the files or
specify ALLOWOVERWRITE option.
● If the data you are unloading includes the delimiter you
are using, then you should use ESCAPE option with the
UNLOAD command.
○ If you don’t, subsequent copy operations may fail
or they will give undesirable results
● To compress the unloaded data, you can use the GZIP
option with the command
www.flydata.com
Some More Tips
Copyright © 2014 FlyData Inc. All rights reserved. www.flydata.com
● Problem: you have limited data in your table and would like to
unload the data into a single S3 file and not one file per slice
○ Solution: UNLOAD provides the PARALLEL option for
this.
● By default, PARALLEL is ON, you can set it to OFF using the
PARALLEL OFF option.
○ When you turn it off, Redshift writes to S3 serially and will
not create multiple files according to the number of slices
in the cluster.
○ Please be aware that the maximum size of a data file is
6.2GB. If you UNLOAD data exceeding that limit, you will
still see multiple files on S3.
www.flydata.com
● Problem: You would like to unload data with column headers
○ Solution: there is no direct option for this. But, you can
tweak the query to output column names along with the
data.
● For the customers table, below will be the query:
www.flydata.com
Conclusion
● Though it’s easy to use the UNLOAD command, there
are a few things you need to take care of, as explained
in this article.
○ First, try the command on sample data and make
sure that all your options are set right before
trying to extract large amounts of data!
● There are also a few other options that UNLOAD
supports which we have not covered here. You can
find them in AWS documentation!
www.flydata.com
www.flydata.com www.flydata.com
Check us out!
-> http://flydata.com
sales@flydata.com
Toll Free: 1-855-427-9787
http://flydata.com
We are an official data integration
partner of Amazon Redshift
Recommended