11
A SQL primer for R users with examples from Pokemon Neal Fultz UCLA Statistics Goal of talk Make SQL look easy And present R equivalents Not another 'customer db' Paradigms R: fundamental unit is the vector RDBMS: fundamental unit is the table Pokemon Best selling video game of the 90s sold in multiple versions (and major fad) Turn based JRPG Featuring hundreds(!) of characters to collect Gotta catch em all!

Los Angeles R users group - Dec 14 2010 - Part 1

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Los Angeles R users group - Dec 14 2010 - Part 1

A SQL primer for R userswith examples from PokemonNeal Fultz

UCLA Statistics

Goal of talkMake SQL look easyAnd present R equivalentsNot another 'customer db'

ParadigmsR: fundamental unit is the vectorRDBMS: fundamental unit is the table

PokemonBest selling video game of the 90s

sold in multiple versions(and major fad)

Turn based JRPGFeaturing hundreds(!) of characters to collect

Gotta catch em all!

Page 2: Los Angeles R users group - Dec 14 2010 - Part 1

Pokemon (2)

from http://www.giantbomb.com/pokemon-yellow-special-pikachu-edition/61-18673/

Pokemon (3)

Page 3: Los Angeles R users group - Dec 14 2010 - Part 1

from Pokemon for dummies

Pokemon (4)

Page 4: Los Angeles R users group - Dec 14 2010 - Part 1

from http://guides.ign.com/guides/818481/page_2.html

Data Model for PokemonPokemon

ID NumberNameType(s)Version

Type TableAttack TypeDefense TypeMultiplier

Page 5: Los Angeles R users group - Dec 14 2010 - Part 1

In R it's natural to represent this as a matrix.In SQL, it's natural to pivot it to tuples.

More concretelyid Name Type 1 Type 2 In Red In Blue

001 Bulbusaur Plant Poison T T002 Ivysaur Plant Poison T T003 Venusaur Plant Poison T T004 Charmander Fire T T005 Charmelion Fire T T006 Charzard Fire Flying T T

What's in Red only?select id, name from pokemon where red and not blue;

What's in Red only? (2)23;"Ekans"24;"Arbok"43;"Oddish"44;"Gloom"45;"Vileplume"56;"Mankey"57;"Primeape"58;"Growlithe"59;"Arcanine"

Page 6: Los Angeles R users group - Dec 14 2010 - Part 1

123;"Scyther"125;"Electabuzz"

What's in Red only? (R)pokemon[red & ! blue];

Consider Psyduck

select * from pokemon where name like 'Psyduck';

image from http://strategywiki.org/wiki/Pok%C3%A9mon_Gold_and_Silver/Ilex_Foresthttp://strategywiki.org/wiki/Pok%C3%A9mon_Gold_and_Silver/Ilex_Forest

Consider Psyduck (2)54;"Psyduck";"Water";"";t;t

Consider Psyduck (R)pokemon[grep('Psyduck', names)];

Page 7: Los Angeles R users group - Dec 14 2010 - Part 1

What types are least common?Select type1, Count(type1) as c from pokemon group by type1 order by c;

What types are least common? (2)"Ice";2"Ghost";3"Dragon";3...

What types are least common? (R)sort(table(type1));

Second Typesselect type1, type2, count(type2) as c from pokemon where type2 is not nullgroup by type1, type2 order by type2

Second Types (2)"Water";"Fighting";1"Normal";"Flying";8"Fire";"Flying";1

Page 8: Los Angeles R users group - Dec 14 2010 - Part 1

"Water";"Flying";1"Rock";"Flying";1

Second Types (R)table(type1, type2, exclude=type2==NULL);

Vs Gyarados?Select attackType, multiplier from pokemon, pokemonType where name like 'Gyarados' and defendType in (type1, type2)

Vs Gyarados (2)"Fighting";0.5"Ground";0"Rock";2"Bug";0.5"Fire";0.5"Water";0.5"Grass";0.5"Grass";2"Electric";2"Electric";2"Ice";2"Ice";0.5

Vs Gyarados (T)i <- grep("Gyarados", names);

Page 9: Los Angeles R users group - Dec 14 2010 - Part 1

multipliers <- types[, c(type1[i], type2[i])];multipliers[which(multipliers != 1)];

Vs Gyarados ContSelect attackType, round(exp(sum(ln(multiplier+.00000000000001))),3) from pokemon, pokemonType where name like 'Gyarados' and defendType in (type1, type2) group by AttackType

Vs Gyarados Cont (2)"Ground";0.000"Bug";0.500"Grass";1.000"Water";0.500"Ice";1.000"Rock";2.000"Fighting";0.500"Fire";0.500"Electric";4.00

Vs Gyarados Cont (R)i <- grep("Gyarados", names);multipliers <- types[, c(type1[i], type2[i])];apply(multipliers,2,prod);

Vs Gyarados FinalSelect o.name,

Page 10: Los Angeles R users group - Dec 14 2010 - Part 1

round(exp(sum(ln(multiplier+.00000000000001))),3) as mfrom pokemon p, pokemonType t, pokemon owhere p.name like 'Gyarados' and defendType in (p.type1, p.type2) and attackType in (o.type1, o.type2) group by o.nameorder by m desc;

Vs Gyarados Final (2)"Raichu";4.000"Electabuzz";4.000"Jolteon";4.000"Electrode";4.000"Zapados";4.000"Magneton";4.000"Pikachu";4.000"Magnemite";4.000"Voltorb";4.000"Aerodactyl";2.000"Bellsprout";1.000"Bulbasaur";1.000...

Vs Gyarados Final (R)i <- grep("Gyarados", names);multipliers <- types[, c(type1[i], type2[i])];totals <- apply(multipliers,2,prod);cbind(names, type1[totals] * type2[totals]);

Conclusions

Page 11: Los Angeles R users group - Dec 14 2010 - Part 1

See the pattern?SQL:

SELECT (cols) FROM (tables) WHERE (row condition)R:

Subsetting (Logical, index, multiple index)grep()table()apply()merge()See also: sqldf library

Questions/Comments

ResourcesPostgreSQL An open source RDBMSW3schools SQL tutorialWikipedia comparison pageBulbapedia Everything about pokemonPokemon for DummiesLog Parser A Win util for running SQL directly against files