In this topic I am giving an Introduction to SAS, explaining the basics about SAS in brief. This is for beginners who are just getting started learning SAS. You will find the learning path for Beginners and Advanced Users.
Introduction to SAS – What is SAS?
SAS stands for Statistical Analysis System or Software, a powerful statistical package. It includes many modules for Data Management, Data Mining and Statistical Data Analysis. And SAS is available for both Windows and UNIX platforms.
Introduction to SAS – What we can do with SAS?
SAS is an integrated system of software products provided by the SAS Institute that enables a user to perform:
- Data Management: Data entry, retrieval, Data Cleaning, and Data mining
- Report Generation: We can generate different reports including graphs
- Data Analysis: We can perform simple descriptive Data analysis to Advanced Statistical Data Analysis and Operations Research
- Predictive Modelling: SAS have powerful modules for Forecasting and Decision Support
- Data Warehousing: SAS is BI tool and used can perform all ETL transactions (Extract, Transform, Load)
Introduction to SAS – What is SAS program?
SAS is driven by SAS programs or procedures, which we can use to perform various operations on data stored as tables called Datasets. SAS also provides menu driven graphical user interfaces (such as the SAS Enterprise Guide (EG), SAS Enterprise Miner (Eminer)) which helps non-programmer.
However, most of the interaction with SAS system to perform analytical operations are done through writing SAS programs. SAS programs provide high level of flexibility compares to the menu driven interface. Also, menu driven interface for SAS is not provided for platforms like UNIX and Mainframe.
SAS programs are composed of two fundamental components, Data Step and Proc Step (Procedures).
Data Steps used to Create or modify the data sets. We can use the Data steps for:
- Defining the structure of the data: We can define the variables and assign the data
- Creating the Data: We can input the data or read from the files, subsets of the existing data, merging the more than one data set, or updating the data
- Modifying the data: We can modify the existing data and create new data sets and update the existing the data
- Checking for correctness: We can check if there are errors in the data
An Example Data Step:
Proc steps are pre-written procedures in SAS, each proc step is created for a particular form of data manipulation or statistical analysis to be performed on data sets created in the DATA step. We can use Proc steps for:
- Printing the contents of a data set and create reports (Example, PROC PRINT)
- Producing the frequency and cross tabulation (Example, PROC FREQ)
- Generating Summaries and Aggregates (Example PROC MEANS, PROC Summary)
- Applying Statistical Techniques and analysis the data (Example, PROC TTEST, PROC REG)
- Generating the Charts (Example, PROC GPLOT)
- Sorting, listing and exporting the results and creating data sets
An Example Proc Step:
PROC PRINT DATA=summarytables.categores_copy;
SAS programs are written using SAS language to manipulate, clean, describe and analyze the data. So, it is important to understand and learn the SAS language to use SAS.
A typically SAS program consists of one or more DATA steps to get and define the data as a required format that SAS can understand and one or more PROC Steps to analyze the data. All SAS statements must end with a semicolon.
A beginner level SAS user should at least know how to create the simple data sets and performing minimum operations to analyze the data. We will start with the following Data and Proc statement to do the basic tasks using SAS language.
DATA; INFILE;INPUT; SET; CARDS; DATALINES; TITLE; LABEL;FORMAT;IF / THEN; ELSE; WHERE; SORT; MERGE;PROC PRINT; PROC FREQ; PROC MEANS; PROC GPLOT; PROC SQL;
We will discuss about these statements in the next few topics. We will begin with understanding the statements used in the Data Step and Proc Step in the next topic.