Socio-economic status of the counties in the US By Jean Eric Rakotoarisoa GIS project / spring 2002

Preview:

Citation preview

Socio-economic status of the counties in the US

By

Jean Eric Rakotoarisoa

GIS project / spring 2002

Background information

• GIS has been known as a system that allows storage and retrieval, analysis, and display of spatial data

• GIS is often used to assist in conducting socio-economic studies

• In these studies, attributes come from geographic areas which are the units and levels of the study (e.g. county, state, or country)

Objective

• Identifying the most prosperous counties in the US

– Successful– Flourishing

Understanding the question

Defining the key word: “prosperous” by– Per capita status

• Income

• Education

– Social status of the county

• Crime

• Unemployment

• Health care facilities

Methods

Data– Source: ArcUSA 1:2M, published by ESRI in 1997– Characteristics

• 1:2,000,000 scale-data

• Albers conic Equal-area projection

• Lat / long

Criteria for choosing variables1. Standardized variables (to avoid effect of area, population size; e.g.

income per capita)

2. Variables that show enough variation (descriptive statistics)

3. Variables that can be seen as surrogates of other related variables (I.e. cause and effects relationship and simple correlation; for extrapolation of the results)

4. Ideally, data from the same year (some variables may be time sensitive)

Variables– Income: money per capita in 1985– Education: percentage of people > 25 years old with 12 years or

more education in 1980– Unemployment: unemployment rate of civilian labor force in

1986– Crime: serious crimes known to police per 100,000 population in

1985 – Health care facilities: number of hospital bed per 1000

population in 1985

Understanding each variable– Distribution (normal, skewed): information necessary for reclassification

( i.e. equal interval. quantile, SD)

– Degree of variation

• Mean, min, max, variance, SD

• Scale to be used was chosen as a function of both the degree of variation of the variables and the desired resolution of the output theme (coarse or high resolution)

Relationship among variables • To classify variables: “primary ” (cause) and “secondary “ (effects)

• Important when assigning weight (overlay)

IncomeEducation

Crime

Health carefacilities

UnemploymentPrimary

Intermediate

Secondary

GIS operations– Extract the data

– Convert to grid themes

– Construct the model (reclassify and weighted overlay)

Characteristics of the model

• Index model designed for unequal contribution of each variable

• Scale of 1 to 5 with 1 being the worst and 5 the best

• Assigning scale for each variable

– Income: highest is given 5

– Education: highest is given 5

– Unemployment: highest is given 1

– Crime: highest is given 1

– Health care facilities: highest is given 5

Weight – Income: 30% (Primary variable)

– Education: 30% (Primary variable)

– *Crime: 20% (Secondary)

– *Unemployment: 15% (Intermediate)

– Health care facilities: 5% (secondary, not a very good variable)

* Strong relationship which implies additive effects of weight

Expected output: counties that have…

A higher income per capita, a higher percentage of people that have received at least 12 years of education, that are safe with a lower rate of unemployment, and that have more health care facilities.

Flowchart of the model

Income Education CrimeHealth care

facilities

Rec income Rec education Rec crimeRec health

facilities

Weighted overlay

Final map

Reclassify Reclassify Reclassify Reclassify

30% 5%15%

20%20%

Unemployment

Rec unemployment

Reclassify

Results

Level of prosperityRestricted12345No Data

0 500 Miles

Map of the county prosperity in the US

What is revealed by the map?• Many counties meet our criteria

• Distribution of these counties follows a regional pattern

• The most prosperous regions are: New England, Upper Midwest, Great plains, western states (Arizona, Nevada, Colorado)

• There is not a huge difference between counties in terms of prosperity (based on our criteria) across the US (there are very few extremes values such as 1 and no 5, most counties fall into scale 3, 4)

Verifying the model

• Study question: Randomly chosen counties should belong to the level of prosperity assigned by the model

• GIS aspect: State-based study and county-based study should show the same pattern

Verifying the model (cont.)Map of the county prosperity in the US

Texas

Utah

Montana

California

Arizona

Idaho

Nevada

Oregon

Iowa

ColoradoKansas

Wyoming

New Mexico

Illinois Ohio

Missouri

Minnesota

Florida

Nebraska

Georgia

Oklahoma

Alabama

South Dakota

Arkansas

Washington

Wisconsin

North DakotaMaine

Virginia

Indiana

New York

Louisiana

Michigan

Kentucky

Mississippi

Tennessee

Pennsylvania

North Carolina

South Carolina

Vermont

Maryland

New Jersey

New Hampshire

MassachusettsConnecticut

Level of ProsperityRestricted12345No Data

States

Texas

Utah

Montana

California

Arizona

Idaho

Nevada

Oregon

Iowa

ColoradoKansas

Wyoming

New Mexico

IllinoisOhio

Missouri

Minnesota

Florida

Nebraska

Georgia

Oklahoma

Alabama

South Dakota

Arkansas

Washington

Wisconsin

North DakotaMaine

Virginia

Indiana

New York

Louisiana

Michigan

Kentucky

Mississippi

Tennessee

Pennsylvania

North Carolina

South Carolina

Vermont

Maryland

New Hampshire

MassachusettsConnecticut

Level of prosperityRestricted12345No Data

State

Map of the state prosperity in the US

Scale: 1:2M

Scale: 1:2M

Discussions

• Resolution– County data were indeed appropriate given the fact that these variables

are probably more uniform within counties than within states as shown by the map (e.g. income, rate of unemployment)

• Source of error– GIS

• Defining the extent of the output (decreases accuracy)• Label (misleading)

– Study question• Data do not come from the same year

Discussions (cont.)• Limitations of the results

– Despite the number of variables used, the output mainly refer to counties that have higher income and higher proportion of people graduated from high school (weighted overlay)

– High income does not necessarily imply better standard of living (e.g. need to look into cost of living)

Discussions (cont.)

• Did I have to use GIS ?– No !!

– Simple equation: Y = aX1+bX2+cX3+dX4+eX5

– Y= Counties

– Xi= variables (attributes)

– a,b,c,d,e= weight

– GIS was mostly used for visual purpose (e.g. distribution of the counties)

• What can be improved?– Adding more variables to better characterize the feature of interest (e.g.

number of doctors, nursing centers and hospitals)– Investigating relationship among variables (using inferential statistics)– Add other parameters (e.g. cost of living)

Discussions (cont.)

Discussions (cont.)

- Theoretical background: choosing variables, understanding their behavior

- GIS operations: understanding effects of different choices whenever

options are being presented (e.g. equal interval, quantile, SD used for

reclassification)

- At every step of the analysis, try to understand the assumptions behind

each option (e.g. defining scales) and always relate those to

the objective

i.e. how each option will affect the objective (how a choice for a particular scale will affect the objective)

• Difficulties

• Advice

Conclusions

• Study of interest : based on our criteria, the most prosperous counties in the US are in New England, Upper Midwest, Great plains, western states

• GIS is only a tool. A good understanding of the study phenomenon is crucial before any GIS operations can be undertaken

• A good understanding of the different options given through the GIS operations is important

• Poor knowledge of the study phenomenon or misuse of GIS only results in artifacts

Recommended