Intro To Sas Prog Lang

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Introduction to the SAS® Programming Language

Thomas J. Winn, Jr.


Texas State Auditor’s Office, Austin, Texas

[SAS is a registered trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.]

Abstract SAS software collects data from almost any platform


and data format; it cleans and transforms the data into
This short paper is first in the “Introduction to SAS” information which decision makers will understand; and
sequence of papers. It includes some of a it stores the information in an open and efficient data
programmer's first steps toward learning SAS. The storage structure. To explore information, SAS
paper provides a brief overview of the SAS System, as software provides multidimensional data analysis, query
well as an outline of the basic structure of the SAS and reporting, executive information systems, data
programming language. In particular, it covers mining, data visualization, and applications
preliminary concepts regarding DATA and PROC steps, development capabilities. SAS solutions are
the different kinds of data, and SAS data files. This client/server- and Web-enabled. SAS says about itself,
paper, and the associated presentation, is intended to “The Power to Know.™”
provide a foundation for the next step of learning SAS,
that of creating SAS data sets, using the INPUT Components of the SAS System include (partial listing):
statement and INFILE options. - Base SAS - SAS/LAB
- SAS/ACCESS - SAS/MDDB Server
A Very Brief History of SAS - SAS/AF - SAS/OR
- SAS/ASSIST - SAS/QC
- SAS/CONNECT - SAS/SHARE
While a graduate student in statistics at NCSU, Jim
- SAS/EIS - SAS/SPECTRAVIEW
Goodnight wrote a computer program for analyzing
- SAS/ETS - SAS/STAT
agricultural data. After a few years, Jim’s application
- SAS/FSP - SAS/TOOLKIT
had attracted a diverse and loyal following among its
- SAS/GIS - SAS/AppDev Studio
users, and the program’s data management and
- SAS/GRAPH - SAS/Enterprise Guide
reporting capabilities had expanded beyond Jim’s
- SAS/IML - SAS/Enterprise Miner
original intentions. In 1976, Jim decided to work at
- SAS/INSIGHT - SAS Universal ODBC Driver
developing and marketing his product on a full time
- SAS/IntrNet - SAS/Warehouse Administrator
basis, and SAS Institute was founded. Since its
beginning, a distinguishing feature of the company has
The “Introduction to SAS” section is designed to cover
been its attentiveness to users of the software. Today,
the fundamental elements of the SAS programming
SAS Institute is the world’s largest privately-held
language. It includes elementary DATA step, and
software company, and Dr. Jim Goodnight is its CEO.
Base/SAS PROCedure programming.
He continues to be actively involved as a developer of
SAS System software.
Introduction to SAS Programming
The SAS System
All SAS jobs are a sequence of SAS steps, which are
made up of instructions, which are called SAS
The SAS System is an integrated suite of information
statements. There are only two kinds of SAS steps:
delivery software products. It is a library of modular
DATA steps are used to read, edit, and transform data
components, that are tied together by a central
(raw data or SAS data files), to prepare SAS data sets,
supervisory program. Applications of the SAS System
PROC steps are ready-to-use procedures which
include executive information systems; data entry,
analyze or process SAS data sets. In general, data
retrieval, and management; report writing and graphics;
must be in a SAS data file before they can be
statistical and mathematical analysis; business
processed by SAS procedures.
planning; forecasting, and decision support; operations
research and project management; statistical quality
improvement; computer performance evaluation; and Without going into the details at this time, here is a
applications development. skeletal example of a SAS job:

Originally, ”SAS” stood for “Statistical Analysis System”, DATA STUDENTS;


however, the applicability of SAS has grown far beyond INPUT NAME $ 1-14 SEX $ 15
that single purpose. Today, SAS is not an acronym for SECTION $ 17-19 GRADE;
anything. Besides statistics, elements of the SAS DATALINES;
System also include web enablement, data mining, data . . . data lines . . .
warehousing, and business intelligence solutions, a ;
variety of industry-specific and business functional-area PROC SORT DATA=STUDENTS;
products (for managing organization, customers, and BY SECTION DESCENDING GRADE;
suppliers), as well as state-of-the-art applications PROC PRINT DATA=STUDENTS;
development tools. Today, the SAS System is a BY SECTION;
comprehensive system for data management and RUN;
analysis.
There are two kinds of SAS data sets: SAS data files All SAS DATA step statements are executed once for
(or tables), and SAS data views. A SAS data file each observation.
contains: the descriptor portion, which provides SAS All SAS statements begin with an identifying keyword,
procedures and some DATA step statements with and end with a semicolon. SAS statements are free-
descriptive information (data set attributes and variable format.
attributes) about the data , and the data portion, a
rectangular structure containing the data values, with They can begin anywhere, and end anywhere. A single
rows (customarily called observations), and columns statement may continue over several lines. Several
(customarily called variables); and which is passed to statements may be on a single line. Blanks (as many
most procedures, observation by observation. A SAS as desired) are used to separate fields. Other special
catalog is a type of SAS file which stores many different characters also may be used to separate fields.
types of information used by the SAS System. All SAS
files reside in a SAS data library. The data portion of a SAS data file is a collection of
data values arranged in a rectangular table. The rows
The SAS System processes the program in two steps: in the table are called observations. The columns in the
(1) it compiles the program, and (2) it executes the table are called variables. There are two kinds of
program. When the program is compiled, a program variables: character variables, and numeric variables.
data vector (PDV) is constructed for each DATA step. Each variable has a name.
It is an area of memory which includes all variables
which are referenced either explicitly or implicitly in the There are rules for naming SAS data sets and
DATA step. variables: 1 to 32 characters in length (8 character
maximum in Version 6 and earlier versions), start with
At execution time, the PDV is the location where the A-Z or _ (underscore), continue with letters, numbers,
current working values are stored as they are or underscores. It is recommended that you choose
processed by the DATA step. Variables are added to meaningful names.
the PDV sequentially as they are encountered during
parsing and interpretation of SAS source statements. Character data can consist of up to 32,767 characters
Each step (DATA or PROC) is compiled and executed (max. of 200 characters in Version 6 and earlier
separately, in sequence. And at execution time within versions). Character values may include letters,
each DATA step, each observation is processed numbers, blanks and special characters, although
iteratively through all of the SAS programming generally you should not include any semicolons within
statements of the DATA step. the data. Numeric data values must be numbers, and
they may be preceded by a + or -. Unless the data are
SAS procedures (PROCs) are programs that are being read using a special SAS informat, do not include
designed to perform specific data processing and commas or dollar signs in numeric data values. SAS
analysis tasks on SAS data sets. Base/SAS assigns the value of a decimal point (“.”) to missing
procedures fall into the following categories: numeric values, and a blank (“ ”) to missing character
SAS Utilities -- APPEND, CATALOG, CIMPORT, values. You can enter these values into your data to
COMPARE, CONTENTS, COPY, CPORT, DATASETS, indicate missing values.
DBCSTAB, DISPLAY, EXPLODE, EXPORT, FORMAT,
FSLIST, IMPORT, OPTIONS, PMENU, PRINTTO, Every SAS data set has a name and is physically
RANK, REGISTRY, SORT, SQL, STANDARD, stored on some type of media (disk, tape, etc.). In
TRANSPOSE, TRANTAB; simple jobs, the SAS data sets are stored on temporary
Descriptive Statistics -- CORR, FREQ, MEANS, SQL, space, but they can be stored “permanently”. A
SUMMARY, TABULATE, UNIVARIATE; temporary SAS data set exists only for the duration of
Reporting -- CALENDAR, CHART, FORMS, MEANS, the current SAS job, or interactive SAS session. A
PLOT, PRINT, REPORT, SQL, SUMMARY, permanent SAS data set exists after the end of the
TABULATE, TIMEPLOT. current SAS job or interactive SAS session. Both types
of SAS data sets have two-level names, of the form
Creating SAS Data Files libref.data-set-name, where libref is a reference to the
name of a SAS data library (a collection of SAS files).
Since SAS procedures can operate only on SAS data
sets, then the first step in processing any raw data With temporary SAS data sets, the SAS System
using SAS will be to transform them into a SAS data automatically assigns the libref WORK and you specify
set. Whenever the SAS System creates a SAS DATA the data set name. When you create a permanent SAS
file, it does the following: data set, you must specify both the libref and the data
1. it reads the DATA statement, creates the structure set name. The SAS System does not assign the libref
of a SAS data set, and marks the statement as the for you. Although there are other methods for certain
place to begin the processing of each line of data; operating environments, the LIBNAME statement is the
2. it uses the description of the data in the INPUT most universal method of assigning a libref. The
statement to read the data line, and to produce an general form is
observation;
3. it uses the observation to execute any other SAS LIBNAME libref ‘SAS-data-library’;
statements that are in the DATA step;
4. it adds the observation to the data set being Here is an example of a LIBNAME statement in the
created; and Windows environment:
5. it returns to the beginning of the DATA step for the
processing of the next observation. LIBNAME mylib1 ‘C:\mySASlib’;
Here is an example of a LIBNAME statement in certain For examples,
mainframe environments:
AGEDAYS = THISDATE - BIRTHDATE;
LIBNAME mylib2 ‘data.set.name’; AGEYRS = AGEDAYS / 365.25;

Here is an example of reading data from a “permanent” You may use date constants or time constants in a SAS
SAS data set (in the SAS data library whose previously expression by writing the date or time enclosed in
defined libref is ‘MYLIB’): quotes, and followed by a D (date), a T (time), or DT
(date:time).
DATA EXAMPLE;
SET MYLIB.STUFF; THISDATE = ‘20Aug2001’D;
SLEEPTIME = ‘23:59:59.9’T;
Here is an example of creating a “permanent” SAS data FAISDODO = ‘21Aug2001 20:30’DT;
set (also in the SAS data library whose previously
defined libref is ‘MYLIB’): To read data that are date or time values, SAS has a
variety of informats. To write date or time values in
DATA MYLIB.TESTDATA; reports, SAS has numerous formats. SAS also has
SET SAMPLE1; several special functions for working with date or time
values. We’ll learn more about informats, formats, and
A SAS DATA statement instructs the SAS System to functions in another presentation.
create and name a SAS data set. It has the general
syntax: Summary
DATA data-set-name-1 (options-1) This short paper included some of a programmer's first
data-set-name-2 (options-2) steps toward learning about the SAS programming
... language. In particular, it covered the following items:
data-set-name-k (options-k) ; an overview of the SAS System, a few fundamental
ideas regarding SAS data sets, some preliminary
Many SAS data sets can be created in a single DATA concepts regarding DATA and PROC steps, and the
step. DATA step options include such things as: different kinds of data in SAS.
DROP= , IN= , FIRSTOBS= , KEEP= , OBS= ,
RENAME= , WHERE= , and others. Suggested References:
The two major functions of the DATA statement are: to
Ronald P. Cody & Raymond Pass, SAS Programming
signal the beginning of the DATA step, and to name the
By Example (1995)
data set(s) being created.
Lora D. Delwiche & Susan J. Slaughter, The Little SAS
Book: A Primer, Second Edition (1998)
When creating temporary SAS data sets, the data set
Frank DiIorio, SAS Applications Programming: A Gentle
name can be supplied by the programmer:
Introduction
SAS Institute Inc., SAS OnlineDoc, Version 8
DATA STUDENTS;
SAS Institute Inc., Getting Started With the SAS
INPUT NAME $ 1-14 SEX $ 15
System, Version 8
SECTION $ 17-19 GRADE;
SAS Institute Inc., SAS Language Reference:
DATALINES;
Concepts, Version 8
. . . data lines . . . ;
SAS Institute Inc., SAS Language Reference:
Dictionary, Version 8, Volumes 1 and 2
or, if the name is omitted in the DATA statement, the
SAS Institute Inc., SAS Procedures Guide, Version 8,
SAS System will provide a name (DATA1, DATA2,
Volumes 1 and 2
etc.):

DATA ;
INPUT NAME $ 1-14 SEX $ 15 Author Information.
SECTION $ 17-19 GRADE;
DATALINES; Tom Winn
. . . data lines . . . ; Texas State Auditor’s Office
P.O. Box 12067
A Few Words About Working With Dates and Times Austin, TX 78711-2067
Using SAS
phone: 512 / 936-9735
Whenever SAS reads date value inputs, it converts e-mail: twinn@sao.state.tx.us
them into integers. SAS dates are positive or negative
integers representing the number of elapsed days
between January 1, 1960 and the specified date.
Similarly, SAS converts time values into the number of
seconds since midnight of the current day. SAS
datetime values are the number of seconds since
midnight on January 1, 1960. Since dates and times
are numeric entities, one may use ordinary arithmetic to
determine elapsed time, or future/past dates and times.

You might also like