SAS is programming language designed for statistical analysis and excels at database operations. Unlike many languages, SAS is whitespace-agnostic meaning tabs, line breaks, and excess spaces are typically ignored when programs are executed.
1 2 3 4 5 6 *An exception is a single line SAS comment, which begin with an astericks and end with a colon, and can span one line; /* block quotes can span multiple lines and begin with a forward slash followed by an astericks and end when there is an astericks followed by a forward slash */
Libraries are a common way to organize SAS datasets or to connect to databases. Libraries begin with a libname (library name) statement, the name of the library, and a filepath (if applicable)
1 2 libname example "/home/usr/data/"; *Note: SAS statements end with a semi-colon;
libnames can only contain 1-8 characters
Accessing data in libraries is easy
1 2 3 data OutputName; set example.FileName; run;
This creates a table titled “OutputName” from a SAS dataset titled “FileName” in the example library; in other words, we brought in the data located in “/home/usr/data/FileName.sas7bdat” and stored it in a new table called OutputName.
SAS tends to be case-insensitive meaning that it tends to disregard the casing used in syntax; therefore, running this would produce the exact same results
1 2 3 DATA OUTPUTname; SET EXAMPLE.FILEname; run;
SAS can also import multiple files; for example, if there are files with data for 2023_01 and 2023_02 they could be combined into a single table by the following:
1 2 3 data combined; set data_202301 data_202302; run;
Data also allows for importing using lists and prefix processing. This can be useful for when data follows a certain pattern such as in the previous example they all shared a similar prefix (data_) followed by the year and then month.
1 2 3 data combined; set data_:; run;
The above will import all data that starts with “data_”. If only a specific year was desired, that could be adjusted to say “data_2023:”.
If the data had been structured with months before years (e.g. data_012023) this would not be as easy; therefore, consideration should be given to the naming structure when archiving data to be used later.
For more examples on the SET statement see the SAS documentation
Additionally, the data can be filtered so that only specific rows are kept using a where statement.
1 2 3 4 data filtered; set data_2023:; where date < mdy(6,30,2023); run;
This would return all rows where the “date” field is less than 6/30/2023.
Columns can likewise be filtered or removed using Data Set Options (DSO) and can be utilized whenever SAS datasets are processed on the SAS Server (including PROC SQL)
Dataset Options cannot be used when using Pass-through queries in proc sql because the code is ran on a remote server and not the SAS server.
Consider the following code
1 2 3 data filtered(DROP=DATE); set data_2023:(where=(date<mdy(6,30,2023)) KEEP=DATE POLICY VALUE); run;
The above use of the “Where=()” dataset option is equivalent to the use in the previous example; however, using it as a dataset option makes it apply to only that dataset, rather than all datasets.
The input data on the SET statement will only process the DATE, POLICY, and VALUE columns. Then it will filter them to only contain rows where the date is before 6/30/2023. Finally, when writing them to the “filtered” dataset, it will keep all columns besides the “DATE” column.
The “DATE” field could not have been excluded from the KEEP statement as it was needed in processing there “Where” statement, but it could be dropped when writing to the output dataset.