Data Journalism 101 by Michael J. Berens - DeKalb, Illinois, NewsTrain - Oct. 29-30, 2015

Preview:

Citation preview

DATA JOURNALISM 101

MICHAEL J. BERENS | CHICAGO TRIBUNE | @MJBERENS1

1

2

The Death Ray No piece of information

is insignificant

3

The Death Ray

4

How to request data

1 – Fixed-width

2 - Delimited

5

Preferred format:

Comma-delimited text file

COMMA DELIMITED “RAW” DATA

6

Text for slides is ARIAL (BODY)

36 PT

7

File layout

(secret weapon to finding

stories)

FIELDS, POSITION, TYPE, LENGTH

Field

Number Variable Type Format Label Comment

1 SEQ_NO Char $10. Sequence Number

Unique sequence number assigned to each record within a year. First four digits

are the year of discharge.

2 REC_KEY Num 11. Record Key Unique number assigned to each CHARS record. Added in 2003.

3 STAYTYPE Char $1 Type of Stay

1 = Inpatient

2 = Observation patient

4 HOSPITAL Char $4 Hospital Number

DOH assigned hospital number.

Fourth character describes the Medicare certified unit type with:

blank = acute care

R = Rehabilitation unit

P = Psychiatric unit

S = Swing bed unit

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

A = Alcohol (discontinued after 1992)

B = Bone marrow transplants (discontinued after 2000)

E = Extended care (discontinued after 2001)

H = Tacoma General & Group Health combined (discontinued after 1992)

I = Group Health only at Tacoma General (discontinued after 1992)

5 LINENO Num 3. Number of Reported Revenue Items Codes

6 ZIPCODE Char $5 Patient's Zip Code

99999 indicates the zip code is unknown.

99998 indicates homelessness (some homeless patients may have a zip code for a shelter or

other temporary location).

Blanks indicate non-U.S. residence.

7 STATERES Char $2 State of Residence

State abbreviation used by U.S. Postal Service.

This is assigned from the zip code.

Residents with zip code 99998 are assigned to Washington

XX = invalid zip code or a non-U.S. residence.

FINDING

STORIES THAT

LURK IN CODE

KEYS

REPORTING TIP

Make a master copy

12

REPORTING TIP

Keep a log

13

DELIMITED FILE

14

Text for slides is ARIAL (BODY)

36 PT

HUNTING DATABASE

15

Text for slides is ARIAL (BODY)

36 PT

OTHER TYPES OF DELIMITERS

16

FIXED WIDTH

17

FIXED WIDTH

18

SEARCHING FOR MICROSOFT

20

CSV = EXCEL FORMAT

21

INSTANT DATABASE

17,583 RECORDS

22

FDA.GOV

23

FDA.GOV

24

WEB SCRAPING – COPY / PASTE

25

WORKING

WITH DATA

SORT * FILTER * CALCULATIONS

THE TIP

It’s hunting season, again, and your editor wants a

daily story about this seasonal rite of passage.

What is the most common type of injury?

What day was the deadliest for hunters?

How often is impairment (drugs / alcohol) a factor?

And dozens of other questions

“Interview the data”

HUNTING DATABASE

THE TIP

•Sources say that football coaches routinely

earn higher salaries than the most experienced

teachers.

•Anecdotally, it appears to be true, but other

sources say it’s an isolated issue.

•How do you attack the story and QUANTIFY

the issue conclusively – avoiding the he said,

she said conundrum?

CALCULATIONS

Don’t be

obsolete.

Journalism is a

continuing-

education job.

31

Recommended