12
SQL JOINs in R - merge() - Wayne Tai Lee

R merge-tutorial

Embed Size (px)

Citation preview

SQL JOINs in R - merge()

- Wayne Tai Lee

Question A = FieldGroup, SprFert

1, UAN282, UAN30

B = FieldGroup, FieldID, NO3N1, 2, 222, 3, 253, 4, 24

QuestionWant: FieldGroup, FieldID, NO3N, SprFert

1, 2, 22, UAN282, 3, 25, UAN303, 4, 24, NA

In SQL

SELECT*

FROMB

LEFT JOINA

ONB.FieldGroup = A.FieldGroup

R Equivalent

output = merge(x = A, y = B,

all.y = TRUE)

R Equivalent

output = merge(x = A, y = B,

all.y = TRUE) Default merges by

intersect(names(A), names(B))

R Equivalent

output = merge(x = A, y = B,

all.y = TRUE) Default merges by

intersect(names(A), names(B)) all.y ensures no records from y will be

dropped.

Default Join output =

merge(A, B) Will get:FieldGroup, FieldID, NO3N, SprFert

1, 2, 22, UAN282, 3, 25, UAN30

Default Join output =

merge(A, B) Will get:FieldGroup, FieldID, NO3N, SprFert

1, 2, 22, UAN282, 3, 25, UAN30

Known as “inner join” where only keep the records when both tables have a record.

Multiple records? A = FieldGroup, SprFert

1, UAN282, UAN30

B = FieldGroup, FieldID, NO3N1, 2, 221, 1, 202, 3, 253, 4, 24

Default Join output =

merge(A, B) Will get:FieldGroup, FieldID, NO3N, SprFert

1, 2, 22, UAN281, 1, 20, UAN282, 3, 25, UAN30

Good Practices Always check the dimensionality of your

data and output Your output can be larger than both A and

B when there are duplicate keys. Always check your output to make sure it's

not empty Merges may drop records so you should

double check