Upload
arthur8898
View
452
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Utilizing array processing allows us to reduce the amount of coding in the DATA step. In addition to learning how to create one- and multi-dimensional arrays, this paper will review how to create an explicit loop in the DATA step - the prerequisite of constructing an array. You will also be exposed to what happens in the Program Data Vector (PDV) during array processing. A wide range of applications in using loop structures with array processing, such as recoding missing values for a list of variables, transforming datasets, etc., will be covered in this paper.
Citation preview
The Many Ways to Effectively Utilize Array
ProcessingArthur Li
Why do we need to use Arrays?Allows us to reduce the amount of coding in the
DATA step
What is essential for learning Arrays?Compilation and execution of the DATA stepHow the Program Data Vector (PDV) works
INTRODUCTION
REVIEW: COMPILATION AND EXECUTION PHASES
Compilation phase:Each statement is scanned for syntax errors.
Execution phase:The DATA step reads and processes the input data.
If there is no syntax error
A DATA step is processed in two-phase sequences:
REVIEW IMPLICIT AND EXPLICIT LOOPSREVIEW IMPLICIT LOOP
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
The DATA step works like a loop – an implicit loopIt repetitively executes statements
reads data values creates observations in the PDV one at a time
Each loop is called an iteration Suppose you have the following dataset that contains
patient IDs for a clinical trial
You would like to assign each patient with either a drug or a placebo (50% chance of either/or)
REVIEW IMPLICIT LOOP
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
1st iteration:_N_ 1_ERROR_ 0The rest of variables are set to missing
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 .PDV:
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
REVIEW IMPLICIT LOOP
1st iteration:
The SET statement copies the 1st observation PDV
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 M2390 .PDV:
REVIEW IMPLICIT LOOP
1st iteration: RANNUM is generated
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 M2390 0.36993PDV:
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
REVIEW IMPLICIT LOOP
1st iteration: GROUP ‘P’ since RANNUM is not > 0.5
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 M2390 0.36993 PPDV:
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
REVIEW IMPLICIT LOOP
1st iteration:The implicit OUTPUT statement writes the variables
marked with (K) to the final datasetSAS returns to the beginning of the DATA step
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 M2390 0.36993 PPDV:
Trial1:ID GROUP
1 M2390 P
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
REVIEW IMPLICIT LOOP
2nd iteration:_N_ ↑2
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP K
2 0 M2390 .PDV:
Trial1:ID GROUP
1 M2390 P
Variables exist in the input dataset
SAS sets each variable to missing in the PDV only before the 1st iteration of the execution
Variables will retain their values in the PDV until they are replaced by the new values
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
REVIEW IMPLICIT LOOP
2nd iteration:
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP K
2 0 M2390 .PDV:
Trial1:ID GROUP
1 M2390 P
Variables being created in the DATA step
SAS sets each variable to missing in the PDV at the beginning of every iteration of the execution
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
REVIEW IMPLICIT LOOP
2nd iteration:The SET statement copies the 2nd observation PDV
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP K
2 0 M2390 .PDV:
Trial1:ID GROUP
1 M2390 P
Skip the rest iterations….
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
REVIEW: OUTPUT STATEMENT
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';
run;
The explicit OUTPUT statement:
Write the current observation from the PDV to the SAS dataset immediately
Not at the end of the DATA step
output;
REVIEW: OUTPUT STATEMENT
The implicit OUTPUT statement:
It tells SAS to write observations to the dataset at the end of the DATA step
Without explicit OUTPUT statements, every DATA step contains an implicit OUTPUT statement at the end of the DATA step
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';
run;
Placing an explicit OUTPUT
Override the implicit OUTPUTSAS adds an observation to a dataset only when
an explicit OUTPUT is executedWe can use more than one OUTPUT statement
in the DATA step
REVIEW: OUTPUT STATEMENT
REVIEW EXPLICIT LOOP
Suppose you don’t have a dataset containing the patient IDs
You are asked to assign four patients, ‘M2390’, ‘F2390’, ‘F2340’, ‘M1240’, with a 50% chance of receiving either the drug or the placebo
You can create the ID and assign each ID to a group in the DATA step at the same time. For example
REVIEW EXPLICIT LOOP
data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'F2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'F2340'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;
Assigning IDs in the DATA step
REVIEW EXPLICIT LOOP
data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'F2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'F2340'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;
4 explicit OUTPUT statements
REVIEW EXPLICIT LOOP
data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'F2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'F2340'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;
4 almost identical blocks
Put identical codes in a loop
Loop along the IDs
Reduce amount of coding
ITERATIVE DO LOOP
data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'F2390'; ...
id = 'F2340'; ...
id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;
DO INDEX-VARIABLE = VALUE1, VALUE2, …, VALUEN;SAS STATEMENTSEND;
INDEX-VARIABLE: IDVALUE1 – VALUEN: 'M2390’, 'F2390’, 'F2340’, 'M1240'SAS STATEMENTS:
rannum = ranuni(2);if rannum> 0.5 then group = 'D';else group ='P';output;
ITERATIVE DO LOOP
data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'F2390'; ...
id = 'F2340'; ...
id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;
DO INDEX-VARIABLE = VALUE1, VALUE2, …, VALUEN;SAS STATEMENTSEND;
data trial2 (drop = rannum); do id = 'M2390', 'F2390', 'F2340', 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
THE ITERATIVE DO LOOP ALONG A SEQUENCE OF INTEGERS
data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
Suppose you are using a sequence of numbers, say 1 to 4, as patient IDs
DO INDEX-VARIABLE = START TO STOP <BY INCREMENT>;SAS STATEMENTSEND;
INDEX-VARIABLE: IDSTART: 1STOP: 4INCREMENT: 1
PURPOSE OF USING ARRAYS
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
6 measurements of SBP for each patient
The missing values are coded as 999
Suppose you would like to recode 999 to periods (.)
data sbp1; set sbp; if sbp1 = 999 then sbp1 = .; if sbp2 = 999 then sbp2 = .; if sbp3 = 999 then sbp3 = .; if sbp4 = 999 then sbp4 = .; if sbp5 = 999 then sbp5 = .; if sbp6 = 999 then sbp6 = .;run;
Each of the IF statements are almost identical
Only the variable names are different
Use a DO loop?
PURPOSE OF USING ARRAYS
RECALL: DO LOOPdata trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2);
if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'F2390'; rannum = ranuni(2);
if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'F2340'; rannum = ranuni(2);
if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'M1240'; rannum = ranuni(2);
if rannum> 0.5 then group = 'D'; else group ='P'; output;run;
data trial2 (drop = rannum); do id = 'M2390', 'F2390', 'F2340', 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
The loop iterates along a sequence of values
The index variable holds these values
Difference:The values of ID variables
PURPOSE OF USING ARRAYS
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
data sbp1; set sbp; if sbp1 = 999 then sbp1 = .; if sbp2 = 999 then sbp2 = .; if sbp3 = 999 then sbp3 = .; if sbp4 = 999 then sbp4 = .; if sbp5 = 999 then sbp5 = .; if sbp6 = 999 then sbp6 = .;run;
Difference:Variable names
If we can group these variables into a single unitWe can loop along these variables
SBP
1 2 3 4 5 6 ARRAY: a temporary grouping of SAS variables
ARRAY DEFINITION AND SYNTAX
ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;
Must be a SAS nameCannot be the name of
a SAS variable in the same DATA step
See handouts for other rules
ARRAY DEFINITION AND SYNTAX
ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;
DIMENSION is the number of elements in the array
More on DIMENSION later…
ARRAY DEFINITION AND SYNTAX
ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;
$ indicates that the elements in the array are character elements
$ is not necessary if the elements have been previously defined as character elements
ARRAY DEFINITION AND SYNTAX
ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;
ELEMENTS are the variables to be included in the array
Must either be all numeric or characters
More on ELEMENTS later…
ARRAY DEFINITION AND SYNTAX
ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;
array sbparray [6] sbp1 sbp2 sbp3 sbp4 sbp5 sbp6;
ARRAY DEFINITION AND SYNTAX
ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;
array sbparray [*] sbp1 sbp2 sbp3 sbp4 sbp5 sbp6;
You can use an asterisk (*) as DIMENSION
You must include ELEMENTS
ARRAY DEFINITION AND SYNTAX
ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;
array sbparray (6) sbp1 sbp2 sbp3 sbp4 sbp5 sbp6; array sbparray {6} sbp1 sbp2 sbp3 sbp4 sbp5 sbp6; array sbparray [6] sbp1 sbp2 sbp3 sbp4 sbp5 sbp6;
DIMENSION can be enclosed in parentheses, braces, or brackets
ARRAY DEFINITION AND SYNTAX
ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;
array sbp [6]; = array sbp [6] sbp1 sbp2 sbp3 sbp4 sbp5 sbp6;
If ELEMENTS are not specified, for example:
Case1: sbp1 – sbp6 were previously defined in the DATA stepCase2: if sbp1 – sbp6 were not previously defined in the DATA step, they will be created by the ARRAY statement
ARRAY DEFINITION AND SYNTAX
ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;
array num [*] _numeric_; array char [*] _character_; array allvar [*] _all_;
_NUMERIC_ : all numeric variables_CHARACTER_ : all character variables _ALL_: all the variables; variables must be
either all numeric or character
ARRAY DEFINITION AND SYNTAX
ARRAY ARRAYNAME [DIMENSION] <$> <ELEMENTS>;
array sbp [6] sbp1 - sbp6;
A single dash format can be used to specify a range of variables
ARRAY DEFINITION AND SYNTAX
ARRAYNAME [INDEX];
must be closed in ( ), [ ], or { }is specified as an integer, a numeric variable, or
a SAS expressionmust be within the lower and upper bounds of
the DIMENSION of the array
To reference an array element:
ARRAY DEFINITION AND SYNTAX
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 sbp2 sbp3 sbp4 sbp5 sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
data sbp2 (drop=i); set sbp; array sbp [6]; do i = 1 to 6; if sbp [i] = 999 then sbp [i] = .; end;run;
ARRAY:
array sbparray [6] sbp1 - sbp6;
array sbp [6]; = array sbp [6] sbp1 - sbp6; data sbp1;
set sbp; if sbp1 = 999 then sbp1 = .; if sbp2 = 999 then sbp2 = .; if sbp3 = 999 then sbp3 = .; if sbp4 = 999 then sbp4 = .; if sbp5 = 999 then sbp5 = .; if sbp6 = 999 then sbp6 = .;run;
THE DIM FUNCTION
data sbp3 (drop=i); set sbp; array sbparray [*] sbp1 - sbp6; do i = 1 to dim(sbparray); if sbparray [i] = 999 then sbparray [i] = .; end;run;
Use the DIM function to determine the number of elements in an array
It is convenient when you use _NUMERIC_, _CHARACTER_, _ALL_ as array ELEMENTS
DIM(ARRAYNAME)
ASSIGNING INITIAL VALUES TO AN ARRAY
When creating a group of variables by using the ARRAY statement, you can assign initial values to the array elements
array num[3] n1 n2 n3 (1 2 3);
array chr[3] $ ('A', 'B', 'C');
TEMPORARY ARRAYS
Temporary arrays contain temporary data elementsUsing temporary arrays is useful when you want to
create an array only for calculation purposesWhen referring to a temporary data element, you
refer to it by the ARRAYNAME and its DIMENSIONYou cannot use the asterisk (*) with temporary arrays They are not output to the output datasetThey are always automatically retainedTo create a temporary array, you need to use the
keyword _TEMPORARY_
array num[3] _temporary_ (1 2 3);
COMPILATION AND EXECUTION PHASESCOMPILATION PHASE
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
PDV is created Array name SBPARRAY and references are not included in the PDVSBP1 – SBP6, is referenced by the ARRAY referenceSyntax errors in the ARRAY statement will be detected during the
compilation phase
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
EXECUTION PHASE
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
1 . . . . . . .
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
_N_ 1The rest of the variables
missing
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
1st iteration of the DATA step:
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
EXECUTION PHASE
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
1 141 142 137 117 116 124 .
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
SET statement copies the 1st obs. from Sbp to the PDV
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
1st iteration of the DATA step:
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
EXECUTION PHASE
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
1 141 142 137 117 116 124 .
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
The ARRAY statement is a compile-time only statement
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
1st iteration of the DATA step:
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
EXECUTION PHASE
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
1 141 142 137 117 116 124 1
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
I 1
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
1st iteration of the DATA step:1st iteration of the DO loop:
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
EXECUTION PHASE
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
1 141 142 137 117 116 124 1
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
SBPARRAY [ i ] SBPARRAY [1] SBPARRAY [1] SBP1 Since SBP1 ≠ 999, no execution
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
1st iteration of the DATA step:1st iteration of the DO loop:
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
EXECUTION PHASE
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
1 141 142 137 117 116 124 1
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
SAS reaches the end of the DO loop
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
1st iteration of the DATA step:1st iteration of the DO loop:
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
EXECUTION PHASE
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
1 141 142 137 117 116 124 2
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
I 2 Since I ≤ 6, the loop continues
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
1st iteration of the DATA step:2nd iteration of the DO loop:
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
EXECUTION PHASE
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
1 141 142 137 117 116 124 2
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
1st iteration of the DATA step:2nd iteration of the DO loop: SBPARRAY [ i ] SBPARRAY [2] SBPARRAY [2] SBP2 Since SBP2 ≠ 999, no execution
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
EXECUTION PHASE
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
1 141 142 137 117 116 124 2
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
1st iteration of the DATA step:2nd iteration of the DO loop:SAS reaches the end of the DO
loopSkip the rest of the iterations
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
EXECUTION PHASE
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
1 141 142 137 117 116 124 7
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
1st iteration of the DATA step:SAS reaches the end of the
DATA stepThe implicit OUTPUT executes
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
EXECUTION PHASE
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
2 141 142 137 117 116 124 .
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
2nd iteration of the DATA step:_N_ ↑ 2SBP1 – SBP6 are retained I missing
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
EXECUTION PHASE
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
2 999 141 138 119 119 122 .
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
2nd iteration of the DATA step:The SET statement copies the
2nd obs. to the PDV
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
EXECUTION PHASE
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
2 999 141 138 119 119 122 1
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
2nd iteration of the DATA step:
I 1
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 1241st iteration of the DO loop:
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
EXECUTION PHASE
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
2 999 141 138 119 119 122 1
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
SBPARRAY [ i ] SBPARRAY [1] SBPARRAY [1] SBP1
2nd iteration of the DATA step:1st iteration of the DO loop:
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
EXECUTION PHASE
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
2 . 141 138 119 119 122 1
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
SBPARRAY [ i ] SBPARRAY [1] SBPARRAY [1] SBP1 Since SBP1 = 999, SBP1 missing
2nd iteration of the DATA step:1st iteration of the DO loop:
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
EXECUTION PHASE
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
2 . 141 138 119 119 122 1
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
SAS reaches the end of loopSkip the rest of the loop
2nd iteration of the DATA step:1st iteration of the DO loop:
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
EXECUTION PHASE
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
2 . 141 138 119 119 122 7
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2nd iteration of the DATA step:SAS reaches the end of the
DATA step
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
EXECUTION PHASE
_N_ D SBP1 K SBP2 K SBP3 K SBP4 K SBP5 K SBP6 K I D
2 . 141 138 119 119 122 7
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 999 141 138 119 119 122
3 142 999 139 119 120 999
4 136 140 142 118 121 123
SBPARRAY[1] SBPARRAY[2] SBPARRAY[3] SBPARRAY[4] SBPARRAY[5] SBPARRAY[6]
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 . 141 138 119 119 122
2nd iteration of the DATA step:SAS reaches the end of the
DATA stepThe implicit OUTPUT executesSkip the rest of the iterations
data sbp2 (drop=i); set sbp; array sbparray[6] sbp1 - sbp6; do i = 1 to 6; if sbparray[i] = 999 then sbparray[i] = .; end;run;
SOME ARRAY APPLICATIONSCREATING A GROUP OF VARIABLES BY USING ARRAYS
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 . 141 138 119 119 122
3 142 . 139 119 120 .
4 136 140 142 118 121 123
Pre-treatment Post-treatment
MEAN SBP: 140 120
above1 above2 above3 above4 above5 above6
1 1 1 0 0 0 1
2 . 1 0 0 0 1
3 1 . 0 0 0 .
4 0 0 1 0 1 1
data sbp4 (drop=i); set sbp2; array sbp[6]; array above[6]; array threshhold[6] _temporary_ (140 140 140 120 120 120); do i = 1 to 6; if (not missing(sbp[i])) then above [i] = sbp[i] > threshhold[i]; end;run;
Used to group the existing variables: sbp1 – sbp6
CREATING A GROUP OF VARIABLES BY USING ARRAYS
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 . 141 138 119 119 122
3 142 . 139 119 120 .
4 136 140 142 118 121 123
Pre-treatment Post-treatment
MEAN SBP: 140 120
above1 above2 above3 above4 above5 above6
1 1 1 0 0 0 1
2 . 1 0 0 0 1
3 1 . 0 0 0 .
4 0 0 1 0 1 1
data sbp4 (drop=i); set sbp2; array sbp[6]; array above[6]; array threshhold[6] _temporary_ (140 140 140 120 120 120); do i = 1 to 6; if (not missing(sbp[i])) then above [i] = sbp[i] > threshhold[i]; end;run;
Used to create variables: above1 – above6
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 . 141 138 119 119 122
3 142 . 139 119 120 .
4 136 140 142 118 121 123
Pre-treatment Post-treatment
MEAN SBP: 140 120
above1 above2 above3 above4 above5 above6
1 1 1 0 0 0 1
2 . 1 0 0 0 1
3 1 . 0 0 0 .
4 0 0 1 0 1 1
data sbp4 (drop=i); set sbp2; array sbp[6]; array above[6]; array threshhold[6] _temporary_ (140 140 140 120 120 120); do i = 1 to 6; if (not missing(sbp[i])) then above [i] = sbp[i] > threshhold[i]; end;run;
The temporary array is for comparison purposes
CREATING A GROUP OF VARIABLES BY USING ARRAYS
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 . 141 138 119 119 122
3 142 . 139 119 120 .
4 136 140 142 118 121 123
Pre-treatment Post-treatment
MEAN SBP: 140 120
above1 above2 above3 above4 above5 above6
1 1 1 0 0 0 1
2 . 1 0 0 0 1
3 1 . 0 0 0 .
4 0 0 1 0 1 1
data sbp4 (drop=i); set sbp2; array sbp[6]; array above[6]; array threshhold[6] _temporary_ (140 140 140 120 120 120); do i = 1 to 6; if (not missing(sbp[i])) then above [i] = sbp[i] > threshhold[i]; end;run;
CREATING A GROUP OF VARIABLES BY USING ARRAYS
THE IN OPERATOR
sbp1 sbp2 sbp3 sbp4 sbp5 sbp6
1 141 142 137 117 116 124
2 . 141 138 119 119 122
3 142 . 139 119 120 .
4 136 140 142 118 121 123
miss
0
1
1
0
data sbp6 (drop = i); set sbp2; array sbp [6]; if . IN sbp then miss = 1; else miss = 0;run;
CALCULATING PRODUCTS OF MULTIPLE VARIABLES
num1 num2 num3 num4
1 4 . 2 3
2 . 2 3 1
data product (drop=i); set test; array num[4]; if missing(num[1]) then result = 1; else result = num[1]; do i = 2 to 4; if not missing(num[i]) then result =result*num[i]; end;run;
Approach:1. Create an array: num[4]2. Treat missing value as 13. Set result = num[1] Loop: i = 2 to 4 result = result * num[i] End Loop
Test:
Used to group the existing variables: num1 – num6
CALCULATING PRODUCTS OF MULTIPLE VARIABLES
num1 num2 num3 num4
1 4 . 2 3
2 . 2 3 1
data product (drop=i); set test; array num[4]; if missing(num[1]) then result = 1; else result = num[1]; do i = 2 to 4; if not missing(num[i]) then result =result*num[i]; end;run;
Approach:1. Create an array: num[4]2. Treat missing value as 13. Set result = num[1] Loop: i = 2 to 4 result = result * num[i] End Loop
Test:
CALCULATING PRODUCTS OF MULTIPLE VARIABLES
num1 num2 num3 num4
1 4 . 2 3
2 . 2 3 1
data product (drop=i); set test; array num[4]; if missing(num[1]) then result = 1; else result = num[1]; do i = 2 to 4; if not missing(num[i]) then result =result*num[i]; end;run;
Approach:1. Create an array: num[4]2. Treat missing value as 13. Set result = num[1] Loop: i = 2 to 4 result = result * num[i] End Loop
Test:
RESTRUCTURING DATASETS USING ARRAYS
Restructuring datasets:
data with one observation per
subject (the wide format)
data with multiple observations per
subject (the long format)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
FROM WIDE FORMAT TO LONG FORMAT (WITHOUT USING ARRAYS)
Wide:
Long:
Transform wide long2 obs. to read 2
DATA step iterationsUse multiple OUTPUT
statementAny missing values in
S1 – S3 will not be outputted to long
data long (drop=s1-s3); set wide;
time = 1; score = s1; if not missing(score) then output;
time = 2; score = s2; if not missing(score) then output;
time = 3; score = s3; if not missing(score) then output;run;
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
FROM WIDE FORMAT TO LONG FORMAT(USING ARRAYS)
Wide:
Long:
data long (drop=s1-s3); set wide;
time = 1; score = s1; if not missing(score) then output;
time = 2; score = s2; if not missing(score) then output;
time = 3; score = s3; if not missing(score) then output;run;
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
S
[1] [2] [3]
array s[3];
S[1];
S[2];
S[3];
FROM WIDE FORMAT TO LONG FORMAT(USING ARRAYS)
Wide:
Long:
data long (drop=s1-s3); set wide;
time = 1; score = s1; if not missing(score) then output;
time = 2; score = s2; if not missing(score) then output;
time = 3; score = s3; if not missing(score) then output;run;
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
S
[1] [2] [3]
array s[3];
S[1];
S[2];
S[3];
Create a DO loop – TIME as index variable
FROM WIDE FORMAT TO LONG FORMAT(USING ARRAYS)
Wide:
Long:
data long (drop=s1-s3); set wide;
time = 1; score = s1; if not missing(score) then output;
time = 2; score = s2; if not missing(score) then output;
time = 3; score = s3; if not missing(score) then output;run;
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
S
[1] [2] [3]
array s[3];
S[1];
S[2];
S[3];
do time = 1 to 3; score = s[time]; if not missing(score) then output;end;
data long (drop=s1-s3); set wide; array s[3];
run;
FROM LONG FORMAT TO WIDE FORMAT
Reading 5 observations but only creating 2 observations
You are not copying data from the PDV to the final dataset at each iteration
You only need to generate one observation once all the observations for each subject have been processed
Wide:Long:
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
REVIEW THE RETAIN STATEMENT
To prevents the VARIABLE from being initialized each time the DATA step executes, use the RETAIN statement:
RETAIN VARIABLE <VALUE>;
Name of the variable that we will want to retain
A numeric valueUsed to initialize the VARIABLE
only at the first iteration of the DATA step execution
Not specifying an initial value VARIABLE is initialized as missing
REVIEW: THE SUM STATEMENT
The SUM statement has the following form:
VARIABLE + EXPRESSION;
The numeric accumulator variable that is to be created
It is automatically set to 0 at the beginning of the first iteration of the DATA step execution
Retained in following iterations
Any SAS expression If EXPRESSION is evaluated
to a missing value, it is treated as 0
REVIEW: FIRST.VARIABLE AND LAST.VARIABLE
You only output the data after you finish reading the last observation of each subject
Thus, you need to identify the last observation
Wide:Long:
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
BY-group processing method
proc sort data=b; by by_variable;run;data a; set b; by by_variable; ... ...run;
For each BY-variable, SAS creates two temporary variables: FIRST.VARIABLELAST.VARIABLE
FIRST.VARIABLE & LAST.VARIABLE are set to 1 at the beginning of the execution phase
They are not being output to the final dataset
REVIEW: FIRST.VARIABLE AND LAST.VARIABLE
ID SCORE
1 A01 3
2 A01 3
3 A01 2
4 A02 4
5 A02 2
Suppose ID is the “BY” variable:
FIRST.ID
1
0
0
1
0
LAST.ID
0
0
1
0
1
SAS reads the 1st observation for ID = A01
SAS reads the last observation for ID = A01
“GROUPING”
1
2
Grouping based ID
REVIEW: FIRST.VARIABLE AND LAST.VARIABLE
REVIEW SUBSETTING IF STATEMENT
Use the IF statement to continue processing only the observations that meet the condition of the specified expression
IF EXPRESSION;
If the EXPRESSION is true for the observationSAS continues to execute statements in the
DATA step and includes the current observation in the data set
REVIEW SUBSETTING IF STATEMENT
Use the IF statement to continue processing only the observations that meet the condition of the specified expression
IF EXPRESSION;
If the EXPRESSION is falseno further statements are processed for that
observation the current observation is not written to the data set the remaining program statements in the DATA step
are not executedSAS immediately returns to the beginning of the
DATA step
FROM LONG FORMAT TO WIDE FORMAT(WITHOUT USING ARRAYS)
S1
S2
S3
S1
S3
if time = 1 then s1 = score;else if time = 2 then s2 = score;else s3 = score;
Use BY-group processing: BY ID Output to the final data when LAST.ID = 1
SCORE S1, S2 S3
RETAINID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
FROM LONG FORMAT TO WIDE FORMAT(WITHOUT USING ARRAYS)
S1
S2
S3
S1
S3
RETAINID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
proc sort data=long; by id;data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
1ST iteration:_N_ 1FIRST.ID 1, LAST.ID 1Other variables missing
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
1 1 1 . . . . .
EXECUTION PHASE Long:
1ST iteration:The SET statement copies the 1st observation PDV
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
1 1 1 A01 1 3 . . .
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
1ST iteration:The SET statement copies the 1st observation PDVFIRST.ID 1 since this is the 1st observation for A01LAST.ID 0 since this is not the last observation for A01
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
1 1 0 A01 1 3 . . .
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
1ST iteration:Since TIME = 1, S1 SCORE (3)
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
1 1 0 A01 1 3 3 . .
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
1ST iteration:Since LAST.ID ≠1, SAS returns to the beginning of the DATA
step to begin the 2nd iteration
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
1 1 0 A01 1 3 3 . .
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
2nd iteration:_N_ ↑2
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
2 1 0 A01 1 3 3 . .
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
2nd iteration: FIRST.ID and LAST.ID are retained; they are automatic variables ID, TIME, SCORE are retained; they are from input dataset S1, S2, and S3 are retained because of the RETAIN statement
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
2 1 0 A01 1 3 3 . .
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
2nd iteration:The SET statement copies the 2nd observation to the PDV
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
2 1 0 A01 2 4 3 . .
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
2nd iteration:The SET statement copies the 2nd observation to the PDVFIRST.ID 0; this is not the first observation for A01LAST.ID 0; this is not the last observation for A01 either
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
2 0 0 A01 2 4 3 . .
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
2nd iteration:Since TIME = 2, S2 SCORE (4)
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
2 0 0 A01 2 4 3 4 .
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
2nd iteration:Since LAST.ID ≠1, SAS returns to the beginning of the DATA
step to begin the 3rd iteration
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
2 0 0 A01 2 4 3 4 .
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
3rd iteration:_N_ ↑3The rest of the variables are retained
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
3 0 0 A01 2 4 3 4 .
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
3rd iteration:The SET statement copies the 3rd observation PDV
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
3 0 0 A01 3 5 3 4 .
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
3rd iteration:The SET statement copies the 3rd observation PDVFIRST.ID 0; this is not the first observation for A01LAST.ID 1; this is the last observation for A01
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
3 0 1 A01 3 5 3 4 .
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
3rd iteration:Since TIME = 3, S3 SCORE (5)
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
3 0 1 A01 3 5 3 4 5
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
3rd iteration:Since LAST.ID = 1, SAS continues to execute statements in
the DATA step
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
3 0 1 A01 3 5 3 4 5
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
3rd iteration:SAS reaches the end of 3rd iteration The implicit OUTPUT executes, variables marked
with (K) are copied to the dataset wide
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
3 0 1 A01 3 5 3 4 5
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
ID S1 S2 S3
1 A01 3 4 5
4th iteration:_N_ ↑4The rest of the variables are retained
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
4 0 1 A01 3 5 3 4 5
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
ID S1 S2 S3
1 A01 3 4 5
4th iteration:The SET statement copies the 4th observation PDV
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
4 0 1 A02 1 4 3 4 5
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
ID S1 S2 S3
1 A01 3 4 5
4th iteration:The SET statement copies the 4th observation PDVFIRST.ID 1; this is the first observation for A02LAST.ID 0; this is not the last observation for A02
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
4 1 0 A02 1 4 3 4 5
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
ID S1 S2 S3
1 A01 3 4 5
4th iteration:Since TIME = 1, S1 SCORE (4)
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
4 1 0 A02 1 4 4 4 5
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
ID S1 S2 S3
1 A01 3 4 5
4th iteration:Since LAST.ID ≠1, SAS returns to the beginning of the DATA
step to begin the 5th iteration
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
4 1 0 A02 1 4 4 4 5
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
ID S1 S2 S3
1 A01 3 4 5
5th iteration:_N_ ↑5The rest of the variables are retained
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 1 0 A02 1 4 4 4 5
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
ID S1 S2 S3
1 A01 3 4 5
5th iteration:The SET statement copies the 5th observation PDV
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 1 0 A02 3 2 4 4 5
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
ID S1 S2 S3
1 A01 3 4 5
5th iteration:The SET statement copies the 5th observation PDVFIRST.ID 0; this is not the first observation for A02LAST.ID 1; this is the last observation for A02
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 0 1 A02 3 2 4 4 5
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
ID S1 S2 S3
1 A01 3 4 5
5th iteration:Since TIME = 3, S3 SCORE (2)
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 0 1 A02 3 2 4 4 2
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
ID S1 S2 S3
1 A01 3 4 5
5th iteration:Since LAST.ID = 1, SAS continues to execute the rest of the
statement
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 0 1 A02 3 2 4 4 2
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
ID S1 S2 S3
1 A01 3 4 5
5th iteration:SAS reaches the end of 5th iteration The implicit OUTPUT executes, variables marked with (K) are
copied to the dataset wide
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 0 1 A02 3 2 4 4 2
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 4 2
How to fix this?
EXECUTION PHASE
data wide (drop=time score); set long; by id; retain s1 - s3; if first.id then do; s1 = .; s2 = .; s3 = .; end; if time = 1 then s1 = score; else if time =2 then s2 = score; else s3 = score; if last.id;run;
FROM LONG FORMAT TO WIDE FORMAT(USING ARRAYS)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id;
retain s1 - s3;
if first.id then do; s1 = .; s2 = .; s3 = .; end;
if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score;
if last.id;run;
S
[1] [2] [3]
array s[3];if first.id then do; do i = 1 to 3; s[i] = .; end;end;
S[1] S[2] S[3]S[1] S[2] S[3]
S[1]S[2]
S[3]
retain s;
FROM LONG FORMAT TO WIDE FORMAT(USING ARRAYS)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
data wide (drop=time score); set long; by id;
retain s1 - s3;
if first.id then do; s1 = .; s2 = .; s3 = .; end;
if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score;
if last.id;run;
S
[1] [2] [3]
array s[3];
S[1] S[2] S[3]S[1] S[2] S[3]
S[1]S[2]
S[3]
_N_ D FIRST.ID D LAST.ID D
ID K TIME D SCORE D
S1 K S2 K S3 K
S[1] S[2] S[3]
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
retain s;
FROM LONG FORMAT TO WIDE FORMAT(USING ARRAYS)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id;
retain s1 - s3;
if first.id then do; s1 = .; s2 = .; s3 = .; end;
if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score;
if last.id;run;
S
[1] [2] [3]
array s[3];
S[1] S[2] S[3]S[1] S[2] S[3]
S[1]S[2]
S[3]
_N_ D FIRST.ID D LAST.ID D
1 1 0
ID K TIME D SCORE D
A01 1 3
S1 K S2 K S3 K
. . .
S[1] S[2] S[3]
S[TIME]
3
retain s;
FROM LONG FORMAT TO WIDE FORMAT(USING ARRAYS)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id;
retain s1 - s3;
if first.id then do; s1 = .; s2 = .; s3 = .; end;
if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score;
if last.id;run;
S
[1] [2] [3]
array s[3];
S[1] S[2] S[3]S[1] S[2] S[3]
S[1]S[2]
S[3]
_N_ D FIRST.ID D LAST.ID D
2 0 0
ID K TIME D SCORE D
A01 2 4
S1 K S2 K S3 K
. . .
S[1] S[2] S[3]
S[TIME]
3 4
retain s;
FROM LONG FORMAT TO WIDE FORMAT(USING ARRAYS)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id;
retain s1 - s3;
if first.id then do; s1 = .; s2 = .; s3 = .; end;
if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score;
if last.id;run;
S
[1] [2] [3]
array s[3];
S[1] S[2] S[3]S[1] S[2] S[3]
S[1]S[2]
S[3]s[time] = score;
retain s;
FROM LONG FORMAT TO WIDE FORMAT(USING ARRAYS)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id;
retain s1 - s3;
if first.id then do; s1 = .; s2 = .; s3 = .; end;
if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score;
if last.id;run;
S
[1] [2] [3]
array s[3];
S[1] S[2] S[3]S[1] S[2] S[3]
S[1]S[2]
S[3]
if first.id then do; do i = 1 to 3; s[i] = .; end;end;
s[time] = score;
if last.id;run;
data wide (drop = time score i); set long; by id; array s[3]; retain s;retain s;
MULTIDIMENSIONAL ARRAYS
ARRAY ARRAYNAME[R, C, …] <$> <ELEMENTS>;
The difference between one- and multi-dimensional arrays is the DIMENSION
R: number of rowsC: number of columnsIf there are 3 dimensions, the next number will
refer to the number of pages
MULTIDIMENSIONAL ARRAYS
array a[2,3];
equivalent to …
array a[2,3] a1 - a6;
1 2 3
1 a1 a2 a3
2 a4 a5 a6
a[2,2]
a[1,3]
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
2 2 B A D C B C
Dat1:
Dat2:
Create ONE observation after you finish reading ALL the observations for EACH person
FIRST.ID
1
0
1
0
LAST.ID
0
1
0
1
Use the BY-group processing
The output will be generated when LAST.ID equals 1
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
2 2 B A D C B C
Dat1:
Dat2:
1 2 3
G1 G2 G3G[3]:
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
ALL_G[2,3]:
Use to group existing variables
Use to create new variables
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
RETAIN
i + 1;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
proc sort data=dat1; by id;run;
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
1 1 1 . . . . 0 .
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
. . . . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
ALL_G [I,J]
Dat1:
At the beginning of the 1st iteration:
G1 G2 G3G [J]
ARRAY TRACKING
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
1 1 0 1 A B F 0 .
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
. . . . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
1st iteration:
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
ALL_G [I,J]
G1 G2 G3G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
1 1 0 1 A B F 0 .
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
. . . . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
1st iteration:
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
ALL_G [I,J]
G1 G2 G3G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
1 1 0 1 A B F 1 .
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
. . . . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
1st iteration:
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
ALL_G [I,J]
G1 G2 G3G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
1 1 0 1 A B F 1 1
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
. . . . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
1st iteration (1st DO loop):
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
ALL_G [I,J]
G1 G2 G3G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
1 1 0 1 A B F 1 1
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A . . . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
1st iteration (1st DO loop):
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
1 1 0 1 A B F 1 1
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A . . . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
1st iteration (1st DO loop):
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
1 1 0 1 A B F 1 2
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A . . . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
1st iteration (2nd DO loop):
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
1 1 0 1 A B F 1 2
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B . . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
1st iteration (2nd DO loop):
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
1 1 0 1 A B F 1 2
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B . . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
1st iteration (2nd DO loop):
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
1 1 0 1 A B F 1 3
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B . . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
1st iteration (3rd DO loop):
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
1 1 0 1 A B F 1 3
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
1st iteration (3rd DO loop):
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
1 1 0 1 A B F 1 3
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
1st iteration (3rd DO loop):
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
1 1 0 1 A B F 1 4
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
1st iteration (4th DO loop):
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
1 1 0 1 A B F 1 4
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
1st iteration:
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
2 1 0 1 A B F 1 .
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
2nd iteration:
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
2 0 1 1 B A C 1 .
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
2nd iteration:
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
2 0 1 1 B A C 1 .
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
2nd iteration:
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
2 0 1 1 B A C 2 .
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
2nd iteration:
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
2 0 1 1 B A C 2 1
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F . . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
2nd iteration (1st DO loop):
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
2 0 1 1 B A C 2 1
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F B . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
2nd iteration (1st DO loop):
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
2 0 1 1 B A C 2 1
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F B . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
2nd iteration (1st DO loop):
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
2 0 1 1 B A C 2 2
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F B . .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
2nd iteration (2nd DO loop):
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
2 0 1 1 B A C 2 2
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F B A .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
2nd iteration (2nd DO loop):
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
2 0 1 1 B A C 2 2
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F B A .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
2nd iteration (2nd DO loop):
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
2 0 1 1 B A C 2 3
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F B A .
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
2nd iteration (3rd DO loop):
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
2 0 1 1 B A C 2 3
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
2nd iteration (3rd DO loop):
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
2 0 1 1 B A C 2 3
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
2nd iteration (3rd DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
2 0 1 1 B A C 2 4
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
2nd iteration (4th DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
2 0 1 1 B A C 2 4
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
2nd iteration:
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
2 0 1 1 B A C 2 4
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
2nd iteration:
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
2 0 1 1 B A C 2 4
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
2nd iteration:
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
ALL_G [I,J]
G [J]
Dat2:
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
3 0 1 1 B A C 2 .
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
3rd iteration:
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
3 1 0 2 B A D 2 .
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
3rd iteration:
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
3 1 0 2 B A D 0 .
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
3rd iteration:
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
3 1 0 2 B A D 1 .
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
3rd iteration:
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
3 1 0 2 B A D 1 1
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
A B F B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
3rd iteration (1st DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
3 1 0 2 B A D 1 1
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B B F B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
3rd iteration (1st DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
3 1 0 2 B A D 1 1
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B B F B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
3rd iteration (1st DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
3 1 0 2 B A D 1 2
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B B F B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
3rd iteration (2nd DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
3 1 0 2 B A D 1 2
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A F B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
3rd iteration (2nd DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
3 1 0 2 B A D 1 2
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A F B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
3rd iteration (2nd DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
3 1 0 2 B A D 1 3
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A F B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
3rd iteration (3rd DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
3 1 0 2 B A D 1 3
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
3rd iteration (3rd DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
3 1 0 2 B A D 1 3
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
3rd iteration (3rd DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
3 1 0 2 B A D 1 4
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
3rd iteration (4th DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
3 1 0 2 B A D 1 4
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
3rd iteration:
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
4 1 0 2 B A D 1 .
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
4th iteration:
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
4 0 1 2 C B C 1 .
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
4th iteration:
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
4 0 1 2 C B C 1 .
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
4th iteration:
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
4 0 1 2 C B C 2 .
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
4th iteration:
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
4 0 1 2 C B C 2 1
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D B A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
4th iteration (1st DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
4 0 1 2 C B C 2 1
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D C A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
4th iteration (1st DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
4 0 1 2 C B C 2 1
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D C A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
4th iteration (1st DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
4 0 1 2 C B C 2 2
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D C A C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
4th iteration (2nd DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
4 0 1 2 C B C 2 2
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D C B C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
4th iteration (2nd DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
4 0 1 2 C B C 2 2
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D C B C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
4th iteration (2nd DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
4 0 1 2 C B C 2 3
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D C B C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
4th iteration (3rd DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
4 0 1 2 C B C 2 3
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D C B C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
4th iteration (3rd DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
4 0 1 2 C B C 2 3
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D C B C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
4th iteration (3rd DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
4 0 1 2 C B C 2 4
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D C B C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
4th iteration (4th DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
4 0 1 2 C B C 2 4
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D C B C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
4th iteration (4th DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
Dat2:
ALL_G [I,J]
G [J]
data dat2 (drop = i j g1 - g3); set dat1; by id; array all_g [2,3] $ m_g1 - m_g3 f_g1 - f_g3; array g[3]; retain all_g; if first.id then i = 0; i + 1; do j = 1 to 3; all_g[i,j] = g[j]; end; if last.id;run;
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
_N_ D FIRST.ID D LAST.ID D ID K G1 D G2 D G3 D I D J D
4 0 1 2 C B C 2 4
M_G1 K M_G2 K M_G3 K F_G1 K F_G2 K F_G3 K
B A D C B C
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
Dat1:
4th iteration (4th DO loop):
1 2 31 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
G1 G2 G3
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
2 2 B A D C B C
Dat2:
ALL_G [I,J]
G [J]
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
2 2 B A D C B C
Dat1:
Dat2:
Creating TWO observations after you finish reading ONE observation
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
ID G1 G2 G3
1 1 A B F
2 1 B A C
3 2 B A D
4 2 C B C
ID M_G1 M_G2 M_G3 F_G1 F_G2 F_G3
1 1 A B F B A C
2 2 B A D C B C
Dat1:
Dat2:
1 2 3
G1 G2 G3G[3]:
1 2 3
1 M_G1 M_G2 M_G3
2 F_G1 F_G2 F_G3
ALL_G[2,3]:
Use to create new variables
Use to group existing variables
ALL_G[1,1]
ALL_G[1,2]
ALL_G[1,3]
ALL_G[2,1]
ALL_G[2,2]
ALL_G[2,3]
G[1] G[2] G[3]
RESTRUCTURING DATASETS BY USING THE MULTIDIMENSIONAL ARRAY
data dat1 (drop = i j m_g1 -- f_g3); set dat2; array all_g [2,3] m_g1 -- f_g3; array g[3] $;
do i = 1 to 2; do j = 1 to 3; g[j] = all_g[i,j]; end; output; end;
run;
CONCLUSION
Array processing enables you to create more efficient programming code
In order to use arrays correctly, in addition to grasping the array syntax, you also need to understand how DATA steps are processed
In the end, you will often realize that most of the errors are closely related to programming fundamentals, which is understanding how the PDV works
ACKNOWLEDGEMENT
I would like to thank Helen Wang & Cindy Song for giving me the opportunity to present at the PharmaSUG 2011
CONTACT INFORMATION
Arthur Li
City of Hope
Division of Information Science
1500 East Duarte Road
Duarte, CA 91010 - 3000
Phone: (626) 256-4673 ext. 65121
E-mail: [email protected]