Sas Programming
Sas Programming
Sas Programming
SAS PROGRAMMING
Reading Assignment: SELECTED SAS DOCUMENTATION FOR BIOS111 Part 4: SAS Programming
6-1
Declarative Statements
v These statements supply information to SAS during the compilation phase of DATA step processing. They define and modify the actions to be taken during the execution phase and affect the composition and contents of the PDV and the new data set being created. v For example, the DROP and KEEP statements determine which of the variables in the PDV get output to the new data set. v These statements are "non-executable" and their placement in the DATA step usually is unimportant, although there are cases where they order of these statements does affect the outcome. Remember, the PDV is created during the compile phase by compiling the statements in the order in which the compiler comes to them.
6-2
END OF INPUT
Yes
NEXT STEP
NO
THE PDV
OUTPUT DATASET THE DESCRIPTOR PORTION OF THE OUTPUT DATASET VARIANCE ATTRIBUTES NAME TYPE LENGTH FORMAT LABEL
6-3
END OF INPUT
YES
NEXT STEP
NO
DROP, KEEP
LENGTH
OUTPUT DATASET VARIABLE ATTRIBUTES NAME TYPE LENGTH FORMAT LABEL 6-4
6-5
6-6
6-7
2 SET CLASSLIB.CLASS ; 3 4 RETAIN N1 10 C1 BIOS ; 5 6 KEEP NAME N1 C1 ; 7 RUN; NOTE: The data set WORK.ONE has 6 observations and 3 variables. NOTE: The DATA statement used 5.00 seconds. 8 PROC PRINT DATA=ONE ; 9 TITLE THE RETAIN STATEMENT ; 10 RUN; NOTE: The PROCEDURE PRINT used 2.00 seconds.
6-8
11 DATA ONE ; 12 SET CLASSLIB.CLASS ; 13 14 RETAIN COUNT 0 NMALES 0 ; 15 16 COUNT=COUNT + 1 ; 17 NMALES=NMALES + (SEX=M) ; 18 19 KEEP NAME SEX COUNT NMALES ; 20 RUN; NOTE: The data set WORK.ONE has 6 observations and 4 variables. NOTE: The DATA statement used 4.00 seconds. 21 22 PROC PRINT DATA=ONE ; 23 TITLE THE RETAIN STATEMENT ; 24 RUN; NOTE: The PROCEDURE PRINT used 2.00 seconds.
6-9
DATASET A
NAME
SEX
HT
DATA B; SET A; RENAME NAME = LASTNAME HT = HEIGHT; OUTPUT; RETURN; RUN; PDV NAME SEX HT
DATASET B
LASTNAME
SET
HEIGHT
6-10
6-11
EXAMPLE:
Contents of SAS DATA SET WORK.OLDCLASS ALPHABETIC LIST OF VARIABLES
VARIABLES
POSITION
LABEL
3 4 1 2 5
17 20 4 16 28
DATA NEWCLASS; SET OLDCLASS; LABEL AGE = HT = HEIGHT IN INCHES ; OUTPUT; RETURN; PROC CONTENTS DATA = NEWCLASS NOSOURCE; CONTENTS OF SAS DATA SET WORK.NEWCLASS # 3 4 1 2 5 VARIABLES AGE HT NAME SEX WT TYPE NUM NUM CHAR CHAR NUM LENGTH 3 8 12 1 8 POSITION 17 20 4 16 28 SEX WEIGHT IN POUNDS HEIGHT IN INCHES LABEL
6-12
6-13
FORMAT NAME -----w. w.d commaw.d ew.d hexw.d dollarw.d zw.d $w. $charw. bestw. date7. mmddyyw.
DESCRIPTION -----------------------numeric, no decimals numeric, with decimals embedded commas scientific notation numeric hexadecimal dollars and commas added insert leading zeros character, leading blanks trimmed character, leading blanks preserved SAS selects appropriate decimal places numeric, used for dates (01APR93) numeric, used for dates (04/01/93)
ALIGNMENT --------r r r r l r r l l r r r
6-14
Proc Print of Data Set Class2 With Formats in the descriptor section of the Data Set OBS 1 2 3 4 5 6 NAME CHRISTIANSEN HOSKING J HELMS R PIGGY M FROG K GONZO SEX 4D 4D 4D 46 4D 20 AGE XXXVII XXXI XLI . III XIV HT seventy-one seventy seventy-four forty-eight twelve twenty-five WT 2.0E+02 1.6E+02 2.0E+02 . 1.0E+00 4.5E+01
CONTENTS PROCEDURE Data Set Name: Member Type: Engine: Created: Last Modified: Protection: Data Set Type: Label: WORK.CLASS2 DATA V612 11:03 Tuesday, June 1, 1999 11:03 Tuesday, June 1, 1999 Observations: Variables: Indexes: Observation Length: Deleted Observations: Compressed: Sorted: 6 5 0 37 0 NO NO
-----Alphabetic List of Variables and Attributes----# Variable Type Len Pos Format ----------------------------------------------3 AGE Num 8 13 ROMAN6. 4 HT Num 8 21 WORDS15. 1 NAME Char 12 0 2 SEX Char 1 12 $HEX2. 5 WT Num 8 29 E8.
6-15
PROC PRINT OF DATA SET SALES WITH A FORMAT STATEMENT IN THE PRINT STEP
OBS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 DEPT SHOES SHOES SHOES SHOES FURS SHOES SHOES SHOES SHOES SHOES SHOES FURS SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES CLERK CLEVER AGILE CLEVER CLEVER BURLEY AGILE AGILE BURLEY CLEVER CLEVER CLEVER BURLEY CLEVER CLEVER AGILE CLEVER CLEVER CLEVER BURLEY CLEVER PRICE 99+19/20 95 65 65 599+19/20 49+19/20 69+19/20 69+19/20 84+19/20 54+19/20 95 800 139+19/20 59+19/20 54+19/20 94+19/20 65 89+19/20 75 54+19/20 COST $41.21 $40.49 $33.44 $33.44 $180.01 $28.07 $34.93 $34.93 $38.65 $30.00 $40.49 $240.00 $42.96 $31.78 $30.00 $40.48 $33.44 $39.63 $36.31 $30.00 WEEKDAY TUE WED WED WED THR THR THR THR SAT SAT SAT MON MON MON TUE TUE TUE TUE WED WED DAY 003 004 004 004 005 005 005 005 007 007 007 009 009 009 010 010 010 010 011 011
6-16
FORMAT EXAMPLE
37 DATA ONE ; 38 SET CLASSLIB.CLASS ; 39 40 DATE1 = 30SEP93D ; 41 DATE2 = DATE1 ; 42 43 FORMAT DATE1 DATE2 DATE7. ; 44 KEEP DATE1 DATE2 HT ; 45 RUN; NOTE: The data set WORK.ONE has 6 observations and 3 variables. NOTE: The DATA statement used 1.00 seconds. 46 47 PROC PRINT DATA=ONE(OBS=1) SPLIT=* DOUBLE ; 48 VAR DATE1 DATE2 HT ; 49 FORMAT HT 1.0 DATE2 ; 50 LABEL DATE1="TODAYS*DATE*FORMATTED" 51 DATE2="TODAYS*DATE*UNFORMATTED" 52 ; 53 TITLE FORMAT EXAMPLE ; 54 RUN; NOTE: The PROCEDURE PRINT used 1.00 seconds.
OBS
HT
30SEP93
12326
6-17
FORMAT EXAMPLE
55 DATA ONE ; 56 SET CLASSLIB.CLASS ; 57 58 DATE1 = 30SEP93D ; 59 DATE2 = DATE1 ; 60 61 FORMAT DATE1 DATE2 DATE7. ; 62 KEEP DATE1 DATE2 HT ; 63 RUN; NOTE: The data set WORK.ONE has 6 observations and 3 variables. NOTE: The DATA statement used 5.00 seconds. 64 65 PROC PRINT DATA=ONE(OBS=1) SPLIT=* DOUBLE ; 66 VAR DATE1 DATE2 HT ; 67 FORMAT HT $3. DATE2 ; ERROR: You are trying to use the character format $ with the numeric variable HT. 68 LABEL DATE1="TODAYS*DATE*FORMATTED" 69 DATE2="TODAYS*DATE*UNFORMATTED" 70 ; 71 TITLE FORMAT EXAMPLE ; 72 RUN; NOTE: The SAS System stopped processing this step because of errors. NOTE: The PROCEDURE PRINT used 3.00 seconds.
6-18
v SYNTAX PROC FORMAT ; VALUE fmtname range1 = label1 range2 = label2 ; where format names: v are 8 characters or less v begin with a letter or underscore for numeric variables v begin with a dollar sign($) for character variables v do not end with a number v are unique
6-19
Ranges can be: v single value or OTHER VALUE AFT 1=agree 2=disagree OTHER=ERROR ; v ranges of values including LOW and HIGH VALUE AFT LOW-12=kids 13-19 =teens 20-HIGH=adults ; v lists of values and ranges VALUE SFT 1,3=male 2,4=female 0,5-9=miscoded .=missing ; v character values, ranges, or lists VALUE $GR A+=H A-C=P D =L ; v LABELS can be up to 40 characters in length and should be enclosed in single quotes
v NOTES v values not in any ranges are displayed as is, unformatted v missing values are not included in the LOW specification v missing values can be included in the OTHER specification v ranges should not overlap; a value can appear only once in all ranges v values in the range can be explicit or implied. Examples of implied ranges include 1-10 (values 1-10) 1<-10 (greater than 1 up through 10) 1-<10 (1 up to but not including 10)
v user-written formats may be used in data step FORMAT statements. If the data set is stored
permanently, you must have the format available whenever the data set is used, since the format is part of the permanent data set.
6-20
OBS 1 2 3 4 5 6
AGE 37 31 41 . 3 14
Cumulative Cumulative HT Frequency Percent Frequency Percent --------------------------------------------------SHORT 3 50.0 3 50.0 70 1 16.7 4 66.7 TALL 2 33.3 6 100.0 Cumulative Cumulative WT Frequency Percent Frequency Percent ----------------------------------------------------SKINNY 2 40.0 2 40.0 ROBUST 3 60.0 5 100.0 Frequency Missing = 1
6-21
6-22
/* CREATE AGE, HT and WT Categories*/ DATA CLASS; Set ClassLib.Class; If age LT 30 then AGECAT=YOUNG; Else AGECAT=OLD; If HT LT 30 then HTCAT=SHORT; Else If 30 LE HT LT 70 then HTCAT=AVE; Else If HT GE 70 then HTCAT=TALL; If 2 LT WT LT 100 then WTCAT=LIGHT; Else If WT GE 100 then WTCAT=HEAVY; OUTPUT; RETURN; RUN; DATA SET CLASS WITH AGE, HT, WT CATEGORIES
NAME Christiansen Hosking J Helms R Piggy M Frog K Gonzo SEX M M M F M AGE 37 31 41 . 3 14 HT 71 70 74 48 12 25 WT 195 160 195 . 1 45 AGECAT OLD OLD OLD YOUNG YOUNG YOUNG HTCAT TALL TALL TALL AVE SHORT SHORT WTCAT HEAVY HEAVY HEAVY LIGHT LIGHT
6-23
4 data one ; 5 set classlib.class ; 6 7 retain nmales nfemales prob 0 ; 8 9 if sex=M then nmales=nmales + 1; 10 else if sex=F then nfemales=nfemales + 1; 11 else prob=prob + 1 ; 12 run; NOTE: The data set WORK.ONE has 6 observations and 8 variables. NOTE: The DATA statement used 4.00 seconds. 13 14 PROC PRINT ; 15 TITLE IF/THEN/ELSE RETAIN EXAMPLE ; 16 RUN ; NOTE: The PROCEDURE PRINT used 1.00 seconds.
6-18
RETAIN NMALES NFEMALES PROB 0 ; IF SEX=M THEN NMALES=NMALES + 1; ELSE IF SEX=F THEN NFEMALES=NFEMALES + 1; ELSE PROB=PROB + 1 ; IF _N_=6 ; RUN;
NOTE: The data set WORK.ONE has 1 observations and 8 variables. NOTE: The DATA statement used 4.05 seconds. 17 18 19 20
6-19
DO/END STATEMENTS
v The DO and END statements define the beginning and end of a group of statements called a DO Group. The DO Group can be used within IF-THEN/ELSE statements to conditionally execute groups of statements. v Execution of a DO statement specifies that all statements between the DO and its matching END statement are to be executed. v SYNTAX: DO; DO GROUP STATEMENTS . . . END;
v EXAMPLE Consider a data set where some height and weight measurements were collected in English Units (inches and pounds) and some were collected in metric units (meters and kilograms). Convert all measurements to centimeters and grams.
SEX M M M F M
AGE 37 31 41 . 3 14
UNITS M E E E M M
6-20
DATA FIXED; SET MIXED; DROP UNITS; IF UNITS EQ M THEN DO; HT=HT*100; WT=WT*1000; END; ELSE DO; HT=HT*2.54; WT=WT*1000/2.2; END; OUTPUT; RETURN; RUN; CLASS DATA SET WITH METRIC UNITS NAME Christiansen Hosking J Helms R Piggy M Frog K Gonzo SEX M M M F M AGE 37 31 41 . 3 14 HT 180.34 177.8 187.96 121.92 30.48 63.5 WT 88636.36 72727.27 88636.36 . 454.5455 20454.55 /*ENGLISH UNITS */ /*INCHES TO CM */ /*POUNDS TO GRAMS */ /*METRIC UNITS */ /*METERS TO CM */ /*KILOS TO GRAMS */
6-21
X=0 ; DO I = 1 to 7 by 2; X = X + I; END; Is equivalent to: X = 0; I = 1; if (I GT 7) then "leave loop" X = X + I; I = I + 2; if (I GT 7) then "leave loop" X = X + I; I = I + 2; if (I GT 7) then "leave loop" X = X + I; I = I + 2; if (I GT 7) then "leave loop" X = X + I; I = I + 2; if (I GT 7) then "leave loop"
6-22
6-23
EXAMPLES:
1) Compute the final balance resulting from depositing a given amount (CAPITAL) for a given number of years (TERM) at a given rate of interest (RAIL). Assume interest is compounded yearly.
WORK.MONEY
CAPITAL 1000 100 3 5 TERM RATE .10 .15
DATA WORK.COMPOUND; SET WORK.MONEY; DO YEAR=1 TO TERM BY 1; INTEREST=CAPITAL*RATE; CAPITAL=CAPITAL+INTEREST, END; DROP YEAR INTEREST, RUN;
WORK.COMPOUND
TERM 3 5
6-24
2)
Count the number of leap years experienced by each member of the data set WORK.KIDS, shown below;*
WORK.KIDS
NAME JOE SUE ED BIRTHYR 1979 1981 1975
DATA WORK.LLAPS; SLI WORK.KIDS, NLEAP = 0; DO YEAR=BIRTHYR TO 1984 BY 1; IF MOD(YEAR,4) EQ 0 THEN NLEAP=NLEAP+1; END; DROP YEAR; RUN;
WORK.LEAPS
NAME JOE SUE ED BIRTHYR 1979 1981 1975 NLEAPS 2 1 3
*The solution shown does not deal appropriately with century years (e.g., 1900)
6-25
3)
X! = 1 * 2 * 3 ... X
WORK.MATH
X 1 4 -3 0 12 DATA WORK.FACTORL; SET WORK.MATH, IF X GE 0, FACTX=1, DO I=1 TO X; FACTX=FACTX*I; END; RUN;
WORK-FACTORL
X 1 4 0 12 FACTX 1 24 1 479,001,600 I 2 5 1 13
6-26
NOTES:
v Loops execute until the increment exceeds the end value. This means that the end value must be reachable from begin. You can not have a begin at 100 and an end at 50 unless you used a negative increment. v The index becomes part of the SAS dataset being created unless it is included in a drop statement. v The begin, end, increment, and value must by nonmissing. v You can combine the various forms of the indexed DO statements, using begin, end, and optionally, increment, with one or more "value" specification. v Each form of these DO-loops can be nested within each other or a do group.
6-27
6-28
A DO loop can also "count down." In that case "increment" is negative and "stop" must be less than "start." In such cases, the loop is repeated until the value of the index variable is less than "stop". EXAMPLE 1: Find the length of the value of a character variable, NAME. The length will be defined as the position of the right-most non-blank character in NAME. Assume that the length of the variable NAME is 20. LENGTH = 0; DO I=20 to 1 by -1; If (SUBSTR(NAME,I,1) NE ) and (LENGTH EQ 0) then LENGTH=I; END; If (LENGTH=.) then LENGTH=0;
DO LOOPS
OBS 1 2 3 4 5 6 7 VAR1 1 3 5 7 10 40 50 VAR2 1.00000 1.73205 2.23607 2.64575 3.16228 6.32456 7.07107
6-29
EXAMPLE 3: Creating a data set without input data 42 DATA ONE ; 43 44 LENGTH VAR1 3 VAR2 $ 6 ; 45 DO VAR1 = 1 TO 5 ; 46 47 DO VAR2 = FEMALE, MALE ; 48 49 OUTPUT ; 50 51 END ; 52 END ; 53 54 RUN; NOTE: The data set WORK.ONE has 10 observations and 2 variables. NOTE: The DATA statement used 4.37 seconds. 55 56 PROC PRINT ; 57 TITLE DO LOOPS ; 58 RUN; NOTE: The PROCEDURE PRINT used 1.03 seconds.
DO LOOPS OBS 1 2 3 4 5 6 7 8 9 10 VAR1 1 1 2 2 3 3 4 4 5 5 VAR2 FEMALE MALE FEMALE MALE FEMALE MALE FEMALE MALE FEMALE MALE
6-30
EXAMPLE 4: Creating a data set without input data 91 DATA ONE ; 92 93 SET CLASSLIB.CLASS ; 94 95 N3= 0; 96 97 IF SEX=F THEN DO; 98 99 N1 = 2 ; 100 101 DO N2 = 1 TO 3 ; 102 103 N3 = N3 + 2; 104 105 END ; 106 END ; 107 108 ELSE DO ; 109 110 N1 = 4 ; 111 112 END ; 113 RUN; NOTE: The data set WORK.ONE has 6 observations and 8 variables. NOTE: The DATA statement used 1.54 seconds. 115 PROC PRINT ; 116 TITLE DO LOOPS ; 117 RUN; NOTE: The PROCEDURE PRINT used 0.08 seconds.
6-31
DO LOOPS OBS 1 2 3 4 5 6 NAME CHRISTIANSEN HOSKING J HELMS R PIGGY M FROG K GONZO SEX M M M F M AGE 37 31 41 . 3 14 HT 71 70 74 48 12 25 WT 195 160 195 . 1 45 N3 0 0 0 6 0 0 N1 4 4 4 2 4 4 N2 . . . 4 . .
6-32
Examples:
1) Count the number of years needed to double an initial amount (CAPITAL) at a given rate of interest (RATE), compounding yearly.
WORK.COMPOUND
CAPITAL 1000 DATA DOUBLE; SET WORK.COMPOUND; TOTAL=CAPITAL; TERM=0; DO WHILE(TOTAL LT (CAPITAL*2)); TOTAL=TOTAL+(TOTAL*RATE); TERM=TERM+1; END; RUN; RATE .10
6-33
2)
Write a program to "make change"; that is to compute the smallest number of quarters, dimes, nickels, and pennies which add to an arbitrary amount between 0 and 99. CHANGE AMOUNT 97 32 11
DATA COINS; SET CHANGE; LEFT=AMOUNT; QUARTERS=0; DIMES=0; NICKELS=0; PENNIES=0; DO WHILE (LEFT GE 25); QUARTERS=QUARTERS+1; LEFT=LEFT-25; END; DO WHILE (LEFT GE 10); DIMES=DIMES+1; LEFT=LEFT-10; END; IF (LEFT GT 5) THEN DO; NICKELS=1; LEFT=LEFT-5; END; PENNIES=LEFT; DROP LEFT; RUN;
COINS
AMOUNT 97 32 11 QUARTERS 3 1 0 DIMES 2 0 1 NICKELS 0 1 0 PENNIES 2 2 1
6-34
DO UNTIL (Expression); In a DO UNTIL Statement, the expression is evaluated at the bottom of the loop, after the statements in the DO group are executed. If the expression is true, the DO group is not executed again. The DO group is always executed at least once. General form: DO UNTIL (Expression); Executable statements END; EXAMPLE: Find the position of the first occurrence of a character (CHAR) in NAME: I=1; CHAR=*; NAME=JOE*SMITH; DO UNTIL (SUBSTR(NAME,I,1) EQ CHAR)); I=I+1; END;
6-35
ARRAYS
ratio1 = verbal1/math1 ; ratio2 = verbal2/math2 ; ratio3 = verbal3/math3;
if date1=98 or date1=99 then date1=.; if date2=98 or date2=99 then date2=.; if date3=98 or date3=99 then date3=.;
6-36
ARRAYS(EXPLICIT) SYNTAX
ARRAY name{dim} [$] [len] [elements] [(starting_values)] ; NAME is the name of the array; cannot be a variable or an array already in the data set DIM is the number of elements in the array. An asterick(*) may also be entered. The DIM can an also be enclosed in brackets[ ] or parentheses. The $ indicates that the elements of the array are character variables that have not yet been assigned to SAS. The LEN indicates the length of any variables that have not yet been assigned to SAS. The ELEMENTS are the names of the variables in the array. Any combination of variable lists and variable names are permitted. All elements in the array must be of the same data type. STARTING_VALUES indicate initial values for array elements. These values are separated by a comma and/or one or more blanks. Starting values do not replace variables already known to SAS.
ARRAYS(EXPLICIT) EXAMPLES
THE FOLLOWING ARE VALID ARRAY STATEMENTS: ARRAY TEST{3} TEST1-TEST3 ; ARRAY TEST{*} TEST1-TEST3 ; ARRAY TEST{3} ; /* DEFINES VARIABLES TEST1-TEST3 */ ARRAY DAY{4} $2 DAY1-DAY4 (S,M,TU,W) ; ARRAY X{*} _NUMERIC_ ; ARRAY Y{*} TEST1-TEST3 SCORE4-SCORE6 ; ARRAY Z{*} X Y ; ARRAY {5,3} SCORE1-SCORE15 ;
6-37
EXAMPLES
1)
Convert the homework scores HW1-HW3 from a 10 point scale to a 100 point scale.
6-38
2)
The DATA step below searches through the homework scores in NEWHW, recording each students worst homework score.
DATA WORK.WORST; SET WORK.NEWHW; KEEP NAME SCORE; ARRAY HW (3) HW1 HW2 HW3; SCORE = 101; DO J = 1 TO 3; IF (HW(J) LT SCORE) THEN SCORE = HW(J); END; RUN;
WORST
NAME Christiansen Hosking J Helms R Piggy M Frog K Gonzo SCORE 90 40 . 100 30 .
6-39
3)
Convert HW1 - HW3 and exam from the NEWHW data set to letter grades L1-13 and LEXAM using the following grading scale:
Grade > 90 80 < Grade < 90 70 < Grade < 80 Grade < 70 or missing DATA LETTERHW; SET NEWHW;
H P L F
ARRAY HW (4) HW1 - HW3 EXAM; ARRAY L (4) $1 L1-L3 LEXAM; DO I = 1 TO 4; IF (HW(I) LT 70) THEN L(I) = F; ELSE IF (70 LE HW(I) LT 80) THEN L(I) = L; ELSE IF (80 LE HW(I) LT 90) THEN L(I) = P; ELSE L(I) = H; END; RUN;
6-40
4)
SAS data set ONE contains one record per subject. Each record contains three scores. Convert the data sets to 3 records per subject, where each record contains one score and The score number (1,2, or 3).
31 PROC PRINT DATA=ONE ; 32 TITLE DATA ONE ; 33 RUN; NOTE: THE PROCEDURE PRINT USED 0.00 SECONDS. 34 35 36 DATA TWO ; 37 SET ONE ; 38 39 DROP SCORE1-SCORE3 ; 40 41 ARRAY S{3} SCORE1-SCORE3 ; 42 43 DO I=1 TO 3 ; 44 SCORE = S{I} ; 45 OUTPUT ; 46 END; 47 48 RUN; NOTE: THE DATA SET WORK.TWO HAS 6 OBSERVATIONS AND 3 VARIABLES. NOTE: THE DATA STATEMENT USED 2.00 SECONDS. 49 50 PROC PRINT DATA=TWO ; 51 TITLE DATA TWO ; 52 RUN; NOTE: THE PROCEDURE PRINT USED 1.00 SECONDS.
DATA ONE
OBS 1 2 ID 101 102 SCORE1 20 50 SCORE2 30 60 SCORE3 40 70
DATA TWO
OBS 1 2 3 4 5 6 ID 101 101 101 102 102 102 I 1 2 3 1 2 3 SCORE 20 30 40 50 60 70
6-41
5)
Convert data set two (from previous example) from three records per subject back to one record per subject.
DATA THREE ; SET TWO ; RETAIN SCORE1-SCORE3 ; DROP I SCORE ; IF I=1 THEN SCORE1=SCORE ; IF I=2 THEN SCORE2=SCORE ; IF I=3 THEN SCORE3=SCORE ; OR ARRAY S{3} SCORE1-SCORE3 ; DO J=1 TO 3; IF I=J THEN S{J}=SCORE ; END ;
or
S{I} = SCORE ;
I 1 2 3 1 2 3
SCORE 20 30 40 50 60 70
SCORE1 20 20 20 50 50 50
SCORE2 . 30 30 30 60 60
SCORE3 . . 40 40 40 70
Note: The above is not a general solution. A more general solution will be discussed in the next section.
6-42
6)
Data set one contains three systolic blood pressure measures for each subject. Find the first non-missing measure.
34 DATA TWO ; 35 SET ONE ; 36 37 ARRAY S{3} SBP1-SBP3 ; 38 39 SBP=. ; /* SBP will contain the first non-missing measure */ 40 I=1 ; 41 DO UNTIL ((SBP> .Z) OR (I>3 )) ; 42 IF S{I} > .Z THEN SBP=S{I} ; 43 I=I+1 ; 44 END ; 45 RUN; NOTE: THE DATA SET WORK.TWO HAS 4 OBSERVATIONS AND 5 VARIABLES. NOTE: THE DATA STATEMENT USED 3.00 SECONDS. 46 47 PROC PRINT DATA=TWO ; 48 TITLE DATA TWO ; 49 RUN; NOTE: THE PROCEDURE PRINT USED 1.00 SECONDS.
DATA ONE
OBS 1 2 3 4 SBP1 . 101 . . SBP2 90 120 . . SBP3 104 130 95 .
DATA TWO OBS 1 2 3 4 SBP1 . 101 . . SBP2 90 120 . . SBP3 104 130 95 . SBP 90 101 95 . I 3 2 4 4
6-43
ARRAYS(EXPLICIT) EXAMPLES
ARRAY VAR{3} VAR1-VAR3 ; X1 = VAR{1} ; J=1 ; X1 = VAR{J} ; YOU CAN REFER TO AN ENTIRE ARRAY: X2 = SUM(OF VAR{*} ) ;
DIM FUNCTION
the DIM function returns the number of elements in a dimension of an array. ARRAY TEST{*} TEST1-TEST3 ; DO I=1 TO DIM(TEST) ; TEST{I} = TEST{I} + 10 ; END;
6-44
ARRAYS(IMPLICIT) SYNTAX
ARRAY name(index} [$] [len] [elements] [(starting_values)] ;
NAME is the name of the array; cannot be a variable or an array already in the data set INDEX gives the name of a variable whose value defines the current element of the array. The index must be enclosed in parentheses; brackets and braces are not allowed. The $ indicates that the elements of the array are character variables that have not yet been assigned to SAS. The LEN indicates the length of any variables that have not yet been assigned to SAS. The ELEMENTS are the names of the variables in the array. Any combination of variable lists and variable names are permitted. All elements in the array must be of the same data type STARTING_VALUES indicate initial values for array elements. These values are separated by a comma and/or one or more blanks. Starting values do not replace variables already known to SAS.
ARRAYS(IMPLICIT) EXAMPLES
THE FOLLOWING ARE VALID ARRAY STATEMENTS: ARRAY TEST(I) TEST1-TEST3 ; ARRAY TEST TEST1-TEST3 ; ARRAY DAY(J) $2 DAY1-DAY4 (S,M,TU,W) ; ARRAY X _NUMERIC_ ; ARRAY Y TEST1-TEST3 SCORE4-SCORE6 ; ARRAY Z(L) X Y ;
6-45
ARRAY S(I) SCORE1-SCORE3 ; DO I=1 TO 3 ; IF S>98 THEN S=. ; END; ________________________________ ARRAY S SCORE1-SCORE3 ; DO _I_=1 TO 3 ; IF S>98 THEN S=. ; END; _________________________________ ARRAY S SCORE1-SCORE3 ; DO OVER S ; IF S>98 THEN S=. ; END;
6-46
DATA SET A
X 4 . -3 Y 2 7 2
6-47
v Missing values are treated like minus infinity in comparison expressions, v Special missing values compare in the sort sequence.
DATASET C
X 0 -12 . A Y 4 . 9 B
DATA SET D1
X 0 -12 . A
Y 4 . 9 B
6-48
DATA SET D2
X 0 -12 . A Y 4 . 9 B T2 FALS TRUE FALS FALS
6-49
DATA SET D3
X 0 -12 . A Y 4 . 9 B T3 FALS TRUE FALS FALS
6-50
v Functions that compute sample statistics use only nonmissing values of the arguments.
DATA SET E
A 3 . B 2 4 C 7 9
DATA WORK.F; SET WORK.E; TOT = A + B + C; AVE = TOT/3; S = SUM(A,B,C); M = MEAN(OF A--C); RUN;
DATA SET F
A 3 . B 2 4 C 7 9 TOT 12 . AVE 4 . S 12 13 M 4.0 6.5
6-51
v The SUM function can be used to prevent cumulative totals of variables involving missing values from becoming missing.
DATA CUMLAT; SET CLASSLIB.CLASS; RETAIN CUMHT CUMWT CUMAGE; CUMHT = SUM(CUMHT,HT); CUMWT = SUM(CUMWT,WT); CUMAGE = SUM(CUMAGE,AGE); RUN;
6-52
DATA CUMTLAT2; SET CLASSLIB.CLASS; CUMHT + HT; CUMWT + WT; CUMAGE + AGE; RUN;
6-53