21
SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

Embed Size (px)

Citation preview

Page 1: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

SQLfor Crime ANALYSTSBACIAA SessionThursday, 22 March 2012

James G. Beldock

Page 2: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

Today’s Agenda• Introductions• Preliminaries• Databases, Structured Data,

and Tables• Demo 1: Exploring Tables• How Databases Are

Structured (& Why)• Demo 2: Lots of Tables

• Break• A Sample CAD Database• SQL SELECT, part 1• Using database data in Excel

• Lunch

• Joins

• SQL Select, part 2

• Joining

• “Saving” Joins to a View

• Break

• Views

• Other SQL Commands

Page 3: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

PRELIMINARIES

1. Databases, Database Varieties, and SQL2. How Databases Are Structured (& Why)

Page 4: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

Databases• Store data permanently• Sometimes called “persistent storage”

• Data can be• Structured data

• A Person has: First Name; Last Name; Social Security Number; Photo.JPG• Unstructured data

• examples: Moby Dick; an entire website; email messages (sometimes)

• Sizes• Databases can be small (100K, 1MB, etc.) or• Quite Large (UK Land Registry is 23TB; that’s ~1.1 Libraries of Congress)1

• RIDICULOUSLY LARGE (Google’s index of the web; Facebook’s profiles database)

that

’s 1.

84 x

100

,000

,000

,000

,000

bi

ts!2

1 DB2 - the secret database (http://www.theregister.co.uk/2006/01/18/db2_neglected/)2 Wolfram Alpha is great for this sort of thing: http://www.wolframalpha.com/input/?i=23+terabytes

Page 5: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

[silicon valley moment]• Recently, SQL-running databases have fallen somewhat out of

fashion• SQL was never cool• Now it’s officially “uncool” for some purposes, like building

NetFlix• Highly scalable (thousands of servers?)• Very flexible data structures

• Today’s session is all about SQL, and SQL is (usually) used with relational databases, which are, if you ask the cool people, not as cool as they used to be.

• SQL is still the world’s most prolific database language, and certainly stores more structured data than any other environment ever built.

Page 6: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

Structured Data• SQL deals with structured data3

• Structured Data• Keeps track of one or more types of

things, called Entities (or TABLEs in SQL)• Knows certain, specific, structured pieces

of information about those entities,called Attributes (or COLUMNs in SQL)

3 Well, nearly always. But not always always: Storing Unstructured Data in SQL Server 2008 – Microsoft,4 source: SqlCourse2.com, http://www.sqlcourse2.com/index.html

Sample Structured Data:a TABLE of Customers4

Note: SQL keywords will be in blue.They are traditionally written in ALL CAPS.

and names of Tables or Columns willappear in Brown or Orange, respectively. Theyare traditionally Capitalized (but not ALL CAPS).

Page 7: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

Database TABLESName of TABLE

ROWs

COLUMNs

Names of COLUMNS

Question:What’s the name of a ROW?

Page 8: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

a ROW

• 5 COLUMNs (also called Fields):• customerid some type of number probably a Unique

Identifier• firstname text (called a String) probably not unique• lastname string probably not unique• city string probably not unique• state string probably not unique

Unique IDs are called

KEYs

The KEY used to name a ROW is called the

PRIMARY KEY

Page 9: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

Before We Go Further:SQL• That is why you’re here, right?

• Structured Query Language (SQL) is:• A language for asking a database for information (“querying”)• A language for changing information in a database

• Changing the structure of a database• Adjusting security, performance, and deployment of databases• Destroying everything in the database…but don’t worry :-)

Database Manipulation Language, DML Create Read Update Delete

Often called: DANGEROUS

(seriously, calledadmin functionality,or Database Definition Language, DDL)

Page 10: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

SQL’s SELECT Statement • The single most important SQL statement. Period.• “Selects” data out of a database, or performs a calculation on

a column, value, table, etc.

• Really simple examples:

• SELECT 'hello' → hello

• SELECT 1 + 3→ 4

Page 11: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

SELECT Statement, continued• More commonly, the basic SELECT statement returns ROWs from a TABLE:• SELECT firstname FROM customers

→ JohnLeroyElroyLisa

• SELECT firstname, city FROM customers → John Lynden

Leroy PinetopElroy SnoqualmieLisa Oshkosh

• SELECT * FROM customers → 10101 John Gray Lynden Washington

10298 Leroy Brown Pinetop Arizona10299 Elroy Keller Snoqualmie Washington10315 Lisa Jones Oshkosh Washington

A special COLUMN name:*Means “all COLUMNs”

Page 12: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

SELECT Statement: the Important optioNS5 (for one table)SELECT list of columns, functions on columns, or *

FROM name of table

WHERE list of conditions to include(called “predicates”)

ORDER BY list of columns and direction of sort(ascending/descending)

GROUP BY list of columns

5 The full definition of the SQL SELECT statement syntax is much longer and, to some extent, specific to the database software. See the definition of Microsoft SQL Server 2008 R2’s SELECT statement at http://msdn.microsoft.com/en-us/library/ms189499.aspx

Page 13: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

SELECT…ORDER BY• Use Order By to sort by one or more columns, in ascending

or descending order

Effect of ORDER BY clause

Page 14: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

SELECT…WHERE• Use WHERE to filter• based on one criterion:

• or more than one:

Why the [square brackets]?

The word state is a reserved SQL keyword. When it is used as a column name, it must be [bracketed] to avoid confusion.

Page 15: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

FUNCTIONS• You can add functions to a SQL SELECT statement to perform

various analyses.• The most common6 are• Aggregate functions

• count(), which returns the number of somethings, and• sum(), which adds up the somethings• Also: min(), max(), avg(), stdev(), var()

• Math, Date and String (text) Manipulation functions• Math: abs(), ceiling(), power(), sqrt(), others• String: len(), substring(), replace(), upper(), lower(), left(), right(),

others• Date: dateadd(), datediff(), datepart(), getdate(), day(), month(),

year(), others

6 The full list is quite long. For SQL Server, see http://msdn.microsoft.com/en-us/library/aa258899(v=sql.80).aspx.

Page 16: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

[DEMO]Using FunctionS, WHERE, and ORDER BY• Summary:

• count(*) gives you the count of rows resulting from your query• You can SELECT any combination of columns

• Unless you GROUP BY, in which case you are limited to the GROUPed BY columns and aggregate functions applied to other columns

• Gotchas• sum(*) doesn’t make sense, but sum(columnname) does—for

columns of numbers• GROUP BY is finicky: the list of columns you select is limited• Some things aren’t easy: for example, finding the percent of total

Page 17: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

Terminology: DBMS• “Database” is a generic term; it can refer to:• A specific set of data running on a Database Server• A Database Server itself (not really the right term)• A large body of information kept by a human being (“my recipe database”)

• Databases generally run on a Database Server• A computer running Database Management System (DBMS)• Accepts connections (“queries”) from many client computers• Returns a response (“result set”) to each client in response to each query• Can be distributed onto lots of servers (Facebook: 1,800+ MySQL servers)

• DBMS handle multiple databases• Each Database is stored in one or more “database files”• Database Files can sometimes be loaded/viewed/edited by other software

Page 18: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

Names You Might Encounter(in the Database World)• SQL Server, from Microsoft (also “Microsoft SQL Server”)• Oracle• DB2, from IBM• Less common:• Microsoft Access, dBase, Sybase

Page 19: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

Database Structures• Most databases have many TABLEs• 10 would be “few”; 50 would be normal; 150 would be many

• There is a method to this madness• Different TABLEs contain different categories of information• Example:

• Customers: contains lots of customers• Products: contains lots of products• Orders: combines customers and products (and quantities, etc.)

Page 20: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

Why So Many Tables?• Imagine a world with just 1 table• The problem of duplicate data:

OrderID CustomerName CustomerAddress Quantity ProductName

1000 James 123 Main Street, Arcadia, CA, 95000 3 Orange

1001 James 123 Main Street, Arcadia, CA, 95000 4 Apple

1002 George 444 1st Avenue, Sacramento, CA 97000 1 Fork

1003 James 123 Main Street, Arcadia, CA 95000 6 Pear

• Adding a new order is easy:

• But what happens when James changes his address?

• Answer: need to update every ROW where 'James' is the CustomerName (ugh!)

Page 21: SQL for Crime ANALYSTS BACIAA Session Thursday, 22 March 2012 James G. Beldock

Solution: Divide and Conquer• Divide data into Entities (TABLEs), specific to a given purpose:

CustomerID CustomerName CustomerAddress

1 James 123 Main Street, Arcadia, CA, 95000

2 George 444 1st Avenue, Sacramento, CA 97000

OrderID CustomerID Quantity ProductName

1000 1 3 Orange

1001 1 4 Apple

1002 2 1 Fork