Splitting A Large SAS Data Set: The %split Macro
Splitting A Large SAS Data Set: The %split Macro
Splitting A Large SAS Data Set: The %split Macro
Coders' Corner
Paper 83-27
Abstract
%macro split(ndsn=2);
data %do i = 1 %to &ndsn.; dsn&i. %end; ;
retain x;
set orig nobs=nobs;
if _n_ eq 1
then do;
if mod(nobs,&ndsn.) eq 0
then x=int(nobs/&ndsn.);
else x=int(nobs/&ndsn.)+1;
end;
if _n_ le x then output dsn1;
%do i = 2 %to &ndsn.;
else if _n_ le (&i.*x)
then output dsn&i.;
%end;
run;
%mend split;
Introduction
Data analysis often involves large amounts of
data such that there may be processing and storage issues.
That is, it may be convenient to break-up a large SAS data
set into smaller more manageable data sets, thereby
partitioning the job or storing the data more conveniently.
Of course, writing a Data step would be a reasonable
method for accomplishing this task., as illustrated below.
SUGI 27
Coders' Corner
if _n_ eq 1
then do;
if mod(nobs,&ndsn.) eq 0
then x=int(nobs/&ndsn.);
else x=int(nobs/&ndsn.)+1;
end;
Examples
To better understand how the %split macro
works, consider the following examples that processes a
contrived test data set. The MPRINT option shows in the
SAS log how the macro resolves.
Error or Caveat
The next example poses a caveat in this solution.
What happens when you split a data set containing 82
observations into 43 separate data sets? The %split macro
generates a Data step that will produce 43 data sets. But,
how many observations will be in each of the 43 data sets?
Will all data sets contain observations? Is this a
reasonable partitioning of the input data set?
options mprint;
data orig;
do i = 1 to 82; output; end;
run;
%split(ndsn=43)
;
data dsn1 dsn2 ;
retain x;
set orig nobs=nobs;
if _n_ eq 1 then x=int(nobs/2);
else x=int(nobs/2)+1;
if _n_ le x then output dsn1;
else if _n_ le (2*x)
then output dsn2;
run;
MPRINT(SPLIT):
MPRINT(SPLIT):
MPRINT(SPLIT):
MPRINT(SPLIT):
MPRINT(SPLIT):
MPRINT(SPLIT):
MPRINT(SPLIT):
MPRINT(SPLIT):
MPRINT(SPLIT):
MPRINT(SPLIT):
MPRINT(SPLIT):
MPRINT(SPLIT):
Conclusion
The %split macro is a useful tool for partitioning
large data sets into more manageable data sets; and, it
offers a good lesson in the Macro Language and, even,
number theory.
Author Information
John R. Gerlach
NDC Health
Yardley, PA
Simant Misra
NDC Health
Phoenix, AZ