36

TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every
Page 2: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

TOOLS & TECHNIQUES FOR WORKING WITH DATA

Page 3: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

TL;DR 1. There are many tools & techniques to work with data.

2. Know about alternatives and try them.

3. Not every data management task is the same.

4. You might need different tools for separate parts of larger tasks.

3

Page 4: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every
Page 5: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

Tools we will use working with our example data

1. csvkit

a. csvcut

b. csvstat

c. csvsql

d. csvsort

e. in2csv

2. Unix/Mac commands

a. wc

b. head

c. pbcopy

d. ‘piping’

3. DB Browser for SQLite5

Page 6: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every
Page 7: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every
Page 8: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

8

A brief ode to CSV:

● Possibly the most widely supported structured data format in the world.

● One of the simplest possible structured formats for data.

● Strikes a delicate balance, remaining readable by both machines & humans.

Source: http://frictionlessdata.io/guides/csv/

Page 9: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

● csvkit is a suite of command-line tools for converting to and working with CSV, the king of tabular file formats.

● It is inspired by pdftk, gdal and the original csvcut tool.

9

csvkit

Page 10: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

~ fulcrum-live

$ wc -l Fire_Inspections.csv 213210 Fire_Inspections.csv

$ csvstat --count Fire_Inspections.csvRow count: 213209

10

Row count

● wc (short for word count) is a Unix-like command. wc -l prints the line count

● csvstat --count outputs total row count.

Page 11: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

~ fulcrum-live$ csvstat -n Fire_Inspections.csv 1: Inspection Number 2: Inspection Type 3: Inspection Type Description 4: Address 5: Inspection Address Zipcode 6: Battalion 7: Station Area 8: Fire Prevention District 9: Billable Inspection 10: Inspection Start Date 11: Inspection End Date 12: Inspection Status 13: Return Date 14: Corrective Action Date ...

11

csvstat -n, --names

Display column names and indices.

Page 12: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

~ fulcrum-live

$ head -n 101 Fire_Inspections.csv > subset.csv

$ csvstat --count subset.csvRow count: 100

12

Preview a subset

● head is a program on Unix-like systems used to display the beginning of a text file.

Page 13: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

13

Preview the data

Can also be done in command line with csvlook.

Page 14: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

14

Heads up!

We’ll need to split this column into separate column for latitude and longitude.

Page 15: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

~ fulcrum-live

$ csvstat Fire_Inspections.csv

1. "Inspection Number"

Type of data: NumberContains null values: FalseUnique values: 213209Smallest value: 220Largest value: 333,298Sum: 29,621,444,707Mean: 138,931.493Median: 120,958StDev: 89,204.887Most common values: 234,121 (1x) 234,119 (1x)

15

csvstat

● Prints descriptive statistics for all columns in a CSV file.

● Will intelligently determine the type of each column and then print analysis.

Page 16: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

~ fulcrum-live

9. "Billable Inspection"

Type of data: BooleanContains null values: FalseUnique values: 2Most common values: False (202888x) True (10321x)

10. "Inspection Start Date"

Type of data: DateContains null values: FalseUnique values: 4523Smallest value: 2004-01-01Largest value: 2017-07-26Most common values: 2016-07-01 (661x)

16

csvstat

csvstat returning Boolean and Date types.

Page 17: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

17

Gotcha!

Upon import of our data subset, we discovered that the date format was not correct for import into Fulcrum.

Don’t worry, csvkit has got your back!

Page 18: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

~ fulcrum-live

14. "Neighborhood District"

Type of data: Text Contains null values: True Unique values: 42 Longest value: 30 characters Most common values: Financial District (2034x)

Tenderloin (727x) South of Market (644x) Mission (622x) Nob Hill (525x)

18

csvstat

csvstat returning a Text type and listing the most common values.

Page 19: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

~ fulcrum-live

$ csvsql --query 'SELECT DISTINCT "Neighborhood District" FROM "Fire_Inspections";' Fire_Inspections.csv

MissionLone Mountain/USFNoe ValleyHaight AshburyNob HillLakeshoreTenderloinRussian HillChinatownMission BayFinancial District/South BeachSouth of Market

19

csvsql

Run SQL queries directly on your CSV !!

Page 20: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

~ fulcrum-live

$ csvcut -C 2,13-24,26-32 Fire_Inspections.csv > fire-inspections-subset.csv

$ csvsort -r -c 9 fire-inspections-subset.csv | head -n 10001 | in2csv -f csv > fire-inspections-subset-sorted.csv

20

Putting it all together

Page 21: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

~ fulcrum-live

$ csvcut -C 2,13-24,26-32 Fire_Inspections.csv > fire-inspections-subset.csv

$ csvsort -r -c 9 fire-inspections-subset.csv | head -n 10001 | in2csv -f csv > fire-inspections-subset-sorted.csv

21

Putting it all together

Page 22: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

~ fulcrum-live

$ csvcut -C 2,13-24,26-32 Fire_Inspections.csv > fire-inspections-subset.csv

$ csvsort -r -c 9 fire-inspections-subset.csv | head -n 10001 | in2csv -f csv > fire-inspections-subset-sorted.csv

22

Putting it all together

Page 23: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

~ fulcrum-live

$ csvcut -C 2,13-24,26-32 Fire_Inspections.csv > fire-inspections-subset.csv

$ csvsort -r -c 9 fire-inspections-subset.csv | head -n 10001 | in2csv -f csv > fire-inspections-subset-sorted.csv

23

Putting it all together

Page 24: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every
Page 25: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

25

DB Browser for SQLite

csvkit could not do it all (for me), so we turn to the SQLite Browser tool for our last task.

Page 26: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

Steps in SQLite Browser:● Create new database● Import CSV● Create new columns for latitude & longitude● Execute UPDATE statement

UPDATE "fire-inspections-subset-sorted"

SET lat = replace(substr(Location, 1, instr(Location, ', ') - 1), '(', ''),

lon = replace(substr(Location, instr(Location, ', ') + 1), ')', '');

● Export CSV containing our lat & lon columns● Import into Fulcrum● Profit!

DB Browser for SQLite

26

Page 27: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

27

(37.751914, -122.421305)

Location column, sample value

UPDATE "fire-inspections-subset-sorted"SET lat = replace(substr(Location, 1, instr(Location, ', ') - 1), '(', ''), lon = replace(substr(Location, instr(Location, ', ') + 1), ')', '');

Page 28: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

28

(37.751914, -122.421305)

Location column, sample value

UPDATE "fire-inspections-subset-sorted"SET lat = replace(substr(Location, 1, instr(Location, ', ') - 1), '(', ''), lon = replace(substr(Location, instr(Location, ', ') + 1), ')', '');

Page 29: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

29

(37.751914, -122.421305)

Location column, sample value

UPDATE "fire-inspections-subset-sorted"SET lat = replace(substr(Location, 1, instr(Location, ', ') - 1), '(', ''), lon = replace(substr(Location, instr(Location, ', ') + 1), ')', '');

Page 30: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

30

(37.751914, -122.421305)

Location column, sample value

UPDATE "fire-inspections-subset-sorted"SET lat = replace(substr(Location, 1, instr(Location, ', ') - 1), '(', ''), lon = replace(substr(Location, instr(Location, ', ') + 1), ')', '');

Page 31: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

31

(37.751914, -122.421305)

Location column, sample value

UPDATE "fire-inspections-subset-sorted"SET lat = replace(substr(Location, 1, instr(Location, ', ') - 1), '(', ''), lon = replace(substr(Location, instr(Location, ', ') + 1), ')', '');

Page 32: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

32

The new lat & lon columns look good!

Page 33: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every
Page 34: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

34

Some things to note:

● csvstat and csvsql can be SLOW! csvkit is written in Python, so keep that in mind. If it’s too slow for a specific task - you’re probably better off pulling the data into SQLite for querying.

● pbcopy is a great little tool that allows you to direct your command line output to your clipboard.

- Instead of copying/pasting the results of a csvsql --query 'SELECT DISTINCT…' command from the command line, you can pipe it to pbcopy and paste it directly into your Choice field list in the Fulcrum app builder.

- For example:

csvsql --query 'SELECT DISTINCT "Neighborhood District" FROM "Fire_Inspections";' Fire_Inspections.csv | pbcopy

Page 35: TOOLS - Fulcrum and... · TOOLS & TECHNIQUES FOR WORKING WITH DATA. TL;DR 1. There are many tools & techniques to work with data. 2. Know about alternatives and try them. 3. Not every

Thank you very much for your time

35

But before we go...let’s enjoy the fruits of our labor and IMPORT THIS DATA !