SAS is statistical analysis software used to manage and analyze large datasets. It was created in the 1960s at NC State University to meet the needs of agricultural research projects. SAS uses libraries to organize data and a graphical IDE with windows for editing, viewing logs, outputs and more. It has built-in procedures and functions for data access, management, analysis and presentation.
SAS is statistical analysis software used to manage and analyze large datasets. It was created in the 1960s at NC State University to meet the needs of agricultural research projects. SAS uses libraries to organize data and a graphical IDE with windows for editing, viewing logs, outputs and more. It has built-in procedures and functions for data access, management, analysis and presentation.
SAS is statistical analysis software used to manage and analyze large datasets. It was created in the 1960s at NC State University to meet the needs of agricultural research projects. SAS uses libraries to organize data and a graphical IDE with windows for editing, viewing logs, outputs and more. It has built-in procedures and functions for data access, management, analysis and presentation.
SAS is statistical analysis software used to manage and analyze large datasets. It was created in the 1960s at NC State University to meet the needs of agricultural research projects. SAS uses libraries to organize data and a graphical IDE with windows for editing, viewing logs, outputs and more. It has built-in procedures and functions for data access, management, analysis and presentation.
Download as PPT, PDF, TXT or read online from Scribd
Download as ppt, pdf, or txt
You are on page 1of 42
What is SAS ?
SAS Statistical Analysis System/Software
Why the Need for SAS ? In the early 1960s, the Statistics Department at North Carolina State University was awarded an agriculture research project.The people working on the project needed computer software for IBM mainframe that could access and manipulate large volumes of data and perform statistical analysis on the data. There was no package available that met their needs, so they started designing a solution. History of SAS The Early 1960s Agricultural research at Land Grant Universities Business need: general purpose statistical software to manage and manipulate large volume of data and perform statistical analysis. SAS IDE INTEGRATED DEVELOPMENT ENVIRONMENT(IDE)
COMPONENTS/WINDOWS
1. EXPLORER WINDOW 2. LOG WINDOW INTERACTIVE (USED FOR PROG &DEBUG 3. ENHANCED EDITOR 4. OUTPUT WINDOW 5. RESULTS WINDOW NON-INTERACTIVE (USED FOR OUTPUT GENERATION ONLY) 6. HTML WINDOW SAS IDE EXPLORER WINDOW : HAVING FOUR COMPONENTS
THE PURPOSE OF EXPLORED WINDOW IS TO NAVIGATE A SYSTEM/ NETWORK
- IT BEHAVE LIKE A ALTERNATIVE OS (FOR NON-WINDOWS ENVIRONMENT) - HELPING IN STORAGE OF DATA & FILES
SAS INTEGRATESWITH ANY OPERATING SYSTEMS SAS IDE LIBRARIES: THESE ARE THE LOGICAL FOLDERS IN SAS USED FOR DATA MGMT, FILE MGMT.
THEY ARE TWO TYPES: DEPENDENT CONNECTED LIBRARY INDEPENDENT- SAS LIBRARY
DEPENDENT LIBRARY:
- DEPENDENT LIBRARY ALWAYS CONNECTED TO AN EXTERNAL DATA SOURCES (DBMS/RDBMS/FILE)
- DEPENDENT LIBRARY CAN REFER TO A SINGLE DATA SOURCE ONLY
- DOES NOT OCCUPY ANY MEMORY - DATA MANUPULATIONS WILL BE REFLECTED IN BOTH SAS & EXTERNAL SOURCES
DISADVANTAGE : SECURITY OF DB MAY BE COMPROMISED SAS IDE Libraries refer to the physical location where SAS files are stored. By default, several libraries are already defined by SAS:
1. WORK - used by SAS for storage of temporary files.
2. MAPS - contains SAS maps for most countries in the world. These maps are used with the SAS GMAP procedure.
3. SASUSER - automatically generated by SAS to save SAS default settings.
4. SASHELP - contains the SAS help catalogs; they are views (a type of data set) that describe every active library, data base, and catalog.
Data should not be stored in any of the default libraries; however, new libraries can be defined so that they, too, are automatically created each time SAS is started up (by specifying enable at startup when first created). SAS IDE INDEPENDENT LIBRARY:
- THESE ARE NOT CONNECTED TO ANY EXTERNAL DATA SOURCES (ONLY FOLDER BASED)
- USER HAS TO MANAGE ALL DATA & SECURITY OF DATA
DISADVANTAGE :
OCCUPY MEMORY (MORE MEMORY MAY EFFECT PROCESSING SPEED)
ADVANTAGE:
WE CAN USE TO STORE DATA OF MULTIPLE DATASOURCES
CAN BE USED TO PROVIDE SECURITY TO INDEPENDENT LIBRARIES
SAS IDE LIBRARY MODES: ARE TWO TYPES
- TEMPORARY LIBRARY: SINGLE SESSION ONLY
- PERMANENT LIBRARY: MULTIPLE SESSIONS.
PORPERTIES OF LIBRARY
1. NAME : MAX 8 CHARS, MUST START WITH CHAR OR UNDERSCORE
2. ENGINE: USED TO DEFINE THE DATA SOURCE OF LIBRARY (ORACLE, ACCESS)
3. ENABLE AT START UP: PERMANENT TEMPORARY
4. PATH : DEFINES THE LOCATION & PARAMETER TO CONNECT TO A DB OR DATA SOURCES (C:\PROG FILES\....)
Assigning a Libref You can use the LIBNAME statement to assign a libref to a SAS data library. General form of the LIBNAME statement:
Rules for naming a libref: must be 8 characters or less. must begin with a letter or underscore. remaining characters are letters, numbers or underscores.
LIBNAME libref SAS-data-library <options> Assigning a Libref Examples Windows Libname abc c:\mydocuments\prog1; Libname abc userid.prog1.sasdata disp=shr; UNIX z/OS(OS/390) Libname abc /users/userid; Note: DISP=OLD|SHR specifies the disposition of the file. The Default is OLD, which enables both read and write access. SHR enables read-only access. Assigning a Libref Making the Connection
When you submit the LIBNAME statement, a connection is made between a libref in SAS and the physical location of files on your operating system.
When your session ends, the link between the libref and physical location of your files is broken. SAS IDE LOG WINDOW: IT DISPLAYS THE COMPILATION RESULTS IN COLOUR CODED
SUPPORTS DEBUGGING (REMOVING THE LOGICAL ERRORS DEBUGGING). ERRORS ARE FOUR TYPES
SYNTAX/RUN TIME/ LOGICAL/DATA ERRORS
RUNTIME( EXTERNAL ERRORS) LOGICAL(OUT PUT IS NOT CORRECT)
-OPTIMIZATION: (TIME PERIOD BASED EXECUTION)
- REAL TIME (TOTAL TIME TAKEN TO) - CPU TIME (HOW MUCH TIME CONSUME) LOG WINDOW CAN BE STORED AS EXTERNAL FILE ( .LOG) SAS IDE ENHANCED EDITOR WINDOW:
-WIRITING SAS SCRIPTS (COLOUR CODED SCRIPTS) -STORED AS .SAS FILE -COMBINATION OF COMPLIER + INTERPRETER
INTERPRETER WHILE TYPING A PROGRAM COMPILER WHILE EXECUTING A PROGRAM
INTERPRETER CHECKS EACH LINE ERROR COMPILER CHECKS WHOLE PROGRAM ERRORS
PROGRAM EDITOR & ENCHACED EDITOR
- WRITING SAS SCRIPTS - STORED AS .SAS FILE - PROGRAM EDITOR ONLY FOR DOS & UNIX
NOTE: AN ENHANCED EDITOR WHEN SAVED BECOMES A PROGRAM EDITOR WINDOW SAS PROGRAMMING CREATING DATA SETS (TABLES)
-MANNUALY -USING EXISTING DATA SETS (DBMS/RDBMS) -USING DATA FROM FILES (FLAT FILES)
DEFAULT LIBRARIES IN SAS
PERMANENT: SASUSER, SASHELP,GISMAP,MAPS
TEMPORARY: WORK (ALL DATASETS WITHOUT ANY REFERENCE WILL BE STORED IN WORK LIBRARY) SAS FORMULA SAS FORMULA HAS DIVIDED INTO TWO TYPES SAS TECHINICAL USED FOR PROGRAMMING IN ALL LAYERS SAS FUNCTIONAL USED FOR PROGRAMMING & PROCESS
PROCESS
1) DATA ACCESS 2) DATA MGMT 3) DATA ANALYSIS 4) DATA PRESENTATION
SAS FORMULA SAS TECHNICAL FUNCTIONAL
1) DATA STEP DATA ACCESS 2) DATA SET DATA MGMT 3) DATA PROG & PROC DATA ANALYSIS 4) DATA OUTPUTS DATA PRESENTATION SAS FORMULA 1. DATA STEP : DEFINE THE STRUCTURE OF DATA
DEFINITION : DATA TYPES HAVING TWO TYPES NUMBER 8BYTES (MAX & MIN) TEXT/CHAR WILL OCCUPY 1BYTE/CHAR EXAMPLE X=01 8BYTES X=01 2BYTES NAME=ALLEN 5BYTES N=9060984976789 8BYTES S= SAS SYSTEM 10BYTES
SAS STORES DATE AS NUMBER CENTURY, YEAR, MONTH, DAY, HOURS, MIN & SEC FROM 01-JAN-1960=0 21-JAN-1960=20 LARGEST NUMBER 9,9999999------99(38 DIGITS) SMALLEST NUMBER- 0.00000------0000(29 DECIMALS) SAS FORMULA 1. DATA STEP : STRUCTURE OF SAS
STRUCUTRE: STORAGE PATTERN OF DATA. IT MAY BE COMIBINATION OF
VARIABLE + DATA TYPE + SIZE + CONSTRAINT
ITEMNO NUMBER(4) (4) >999 AND <10,000 ITEMNAME TEXT 200 NMISS (NON MISSING) PRICE NUMBER 7.2 >1000 SAS FORMULA 2. DATA SET : STORAGE OF DATA IN SAS
A TABLE IN SAS CALLED AS DATA SET. DATA SET CONSISTS OF VARIABLES, OBSERVATIONS. COLLECTION OF DATA IN FORM OF OBSERV & VAR MUST BE BASED ON DATA STEP (DEFINITION & STRUCTURE) INTERNAL (SAS) & EXTERNAL (FILES /DB) Basic Structure of SAS There are two main components in the SAS programs the Data step(s) and the Procedure step(s) also call PROC. The data step reads data from external / internal sources, manipulates and combines it with other data set and prints reports. The data step is used to prepare your data for use by one of the procedures (often called Procs"). SAS FORMULA 3. DATA PROGRAMS : ARE USER DEFINED PROGRAMS. IN SAS (20%) USED FOR
DATA PROCESSING DATA MANUPULATION LOGIC BUILDING IN SAS INTEGRATION CUSTOMIZATION SYNTAX:
DATA <DATA SET OPTIONS>; < PROG STATEMENTS>; <LOGICAL STATEMENTS>; < PROCESS STATEMENTS>; RUN; COMPILE & EXCUTE SAS FORMULA 3. DATA PROCEDURE : SAS BUILT-IN PROGRAMS/FUNCATIONS (80%). HAVING 7638 PROCEDURES SYNTAX BASED ALL PROCEDURES ARE PROCESS BASED DOMAIN BASED GENERATE OUTPUT SYNTAX: PROC <PROC NAME> <OPTIONS>; < SYNTAX STATEMENTS ONLY>; RUN; /*BASE SAS PROCEDURES*/ QUIT; /*OTHER THAN SAS PROCEDURES*/
SAS FORMULA 4. DATA OUTPUT : RESPONSIBLE FOR OUTPUT GENERATIONS FROM SAS.
THE ENTIRE PROCESS IS PROCEDURE BASED SIMPLE REPORTS & GRAPHS MULTIDIMENSIONAL REPORTS & GRAPHS DATA BASE DATA SET/TABLE GUI APPLICATION USER INTERFACES (WITH IN SAS)
SAS FORMULA
SAS FORMULA HAS DIVIDED INTO TWO TYPES:
(1) SAS TECHNICAL: USED FOR PROG IN ALL LAYERS (2) SAS FUNCTIONAL: USED FOR PROG & PROCESS
PROCESS TECHNICAL (A) DATA ACCESS DATA STEP (B) DATA MGMT DATA SET (C) DATA ANALYSIS DATA PROG & PROC (D) DATA PRESENTATION DATA OUTPUT Terminology in SAS In SAS, you call a File - DataSet Field - Variable Record(s) - OBServations / Rows An Observation is a collection of data values that usually relate to a single object. A Variable is the set of data values that describe a given characteristic. An example will be shown to best describe. SAS FORMULA FILES DATA BASE DATA STEP DATA SET DATA PROG DATA PROC DATA OUTPUT Sample SAS program Data MySample; A=4; B=2; C = A * B ; Run; Proc Print; Run; Why RUN statement ? Run statement Tells SAS that the Data step or Procedure has ended. Good practice to end each Data step or Procedure with a run statement. Must still SUBMIT the SAS program for it to be Processed. Missing Values in SAS * A character missing value is displayed as a blank. A numeric missing value is displayed as a period. Example;
Data Missing_Test; Length A B $ 10 ; A='Ramanathan'; Run; Proc Print; Run; Words in the SAS Language word or token in the SAS language is a collection of characters that communicates a meaning to SAS and is not divisible into smaller units capable of independent use. It can contain a maximum of 32,767 characters.
A word or token ends when SAS encounters one of the following: the beginning of a new token; a blank after a name ; or a number token the ending quotation mark of a literal token.
Words in the SAS Language (contd) Each word or token in the SAS language belongs to one of four categories: names literals numbers special characters.
SAS NAMING CONVENTIONS Name 1. SAS variable names may be up to 32 characters in length. 2. The first character must begin with an alphabetic character or an underscore. Subsequent characters can be alphabetic characters, numeric digits, or underscores. 3. A variable name may not contain blanks. 4. A variable name may not contain any special characters other than the underscore. 5. A variable name may contain mixed case. The mixed case is remembered and used for presentation purposes only. When SAS processes variable names, however, it internally uppercases them. You cannot, therefore, use the same letters with different combinations of lower- and uppercase to represent different variables. For example, cat, Cat, and CAT all represent the same variable.
Words in the SAS Language (contd) 1. You may not assign the names of special SAS automatic variables (such as _N_ and _ERROR_) or variable list names (such as _NUMERIC_, _CHARACTER_, and _ALL_) to variables. NAME is a series of characters that begin with a letter or an underscore. Later characters can include letters, underscores, and numeric digits. A name token can contain up to 32,767 characters. In most contexts, however, SAS names are limited to a shorter maximum length, such as 32 or 8 characters. Examples of name tokens include: Data _new yearcutoff year_99 descending _n_ Words in the SAS Language (contd) Literal consists of 1 to 32,767 characters enclosed in single or double quotation marks. Examples of literals include Chicago' "1990-91" SatyaKalyani Pala' Suresh Bharatha Mani"s plane' "Report for the Third Quarter" Words in the SAS Language (contd) Number in general is composed entirely of numeric digits, with an optional decimal point and a leading plus or minus sign. SAS also recognizes numeric values in the following forms as number tokens: scientific (E-) notation, hexadecimal notation, missing value symbols, and date and time literals. Examples of number tokens include 5683 2.35 0b0x -5 5.4E-1 '24aug90'd
Words in the SAS Language (contd) Special character is usually any single keyboard character other than letters, numbers, the underscore, and the blank. In general, each special character is a single token, although some two-character operators, such as ** and <=, form single tokens. The blank can end a name or a number token, but it is not a token. Examples of special-character tokens include = ; ' + @ /
Placement and Spacing of Words in SAS Statements Examples In this statement, blanks are not required because SAS can determine the boundary of every token by examining the beginning of the next token: total=x+y; The first special-character token, the equal sign, marks the end of the name token total. The plus sign, another special- character token, marks the end of the name token x. The last special-character token, the semicolon, marks the end of the y token. Though blanks are not needed to end any tokens in this example, you may add them for readability, as shown here: total = x + y; The Data Step
The data step provides a wide range of capabilities, among them reading data from external sources, reshaping and manipulating data, transforming data and producing printed reports.
The data step is actually an implied do loop whose statements will be executed for each observation either read from an external source, or accessed from a previously processed data set.
For each iteration, the data step starts with a vector of missing values for all the variables to be placed in the new observation. It then overwrites the missing value for any variables either input or defined by the data step statements. Finally, it outputs the observation to the newly created data set.
Data Step: Basics Each data step begins with the word data and optionally one or more data set names (and associated options) followed by a semicolon. The name(s) given on the data step are the names of data sets which will be created within the data step. If you don't include any names on the data step, SAS will create default data set names of the form datan, where n is an integer which starts at 1 and is incremented so that each data set created 39 has a unique name within the current session. Since it becomes difficult to keep track of the default names, it is recommended that you always explicitly specify a data set name on the data statement. When you are running a data step to simply generate a report, and don't need to create a data set, you can use the special data set name _null_ to eliminate the output of observations.
Data Step: Inputting Data (contd) Reading from inline data data one; input a b c; datalines; 4 5 3 9 10 12 ; Run; By default, each invocation of the input statement reads another record. This example uses free- form input, with at least one space between values. How to Use the INFILE Statement INFILE statement identifies the file to read, it must execute before the INPUT statement that reads the input data records. You can use the INFILE statement in conditional processing, such as an IF-THEN statement, because it is executable. This allows you to control the source of the input data records. Usually, you use an INFILE statement to read data from an external file. When data are read from the job stream, you must use a DATALINES statement. However, to take advantage of certain data-reading options that are available only in the INFILE statement, you can use an INFILE statement with the file-specification DATALINES and a DATALINES statement in the same DATA step. When you use more than one INFILE statement for the same file-specification and you use options in each INFILE statement, the effect is additive. To avoid confusion, use all the options in the first INFILE statement for a given external file.
Data Step: Inputting Data The input statement of SAS is used to read data from an external source, or from lines contained in your SAS program. The infile statement names an external le or leref from which to read the data; otherwise the cards; or datalines; statement is used to precede the data.
Reading data from an external data one; infile c:\Radhika\Samp.dat; input a b c; run;