Sunday, September 20, 2015

SAS-Made Easy

          In this blog we will be dealing with the explanations of some sample SAS programs. I will be providing the questions and its solution with explanation  

Ques : Using the SAS data set Hosp, use PROC PRINT to list observations for Subject
values of 5, 100, 150, and 200. Do this twice, once using OR operators and once using the IN operator.
Note: Subject is a numeric variable.

Sol: 
       First,lets understand the question clearly!!!
  lets divide the question into two parts one is he asked us to use the Hosp data set and divide the observations for subject
and the next part is he asked us to repeat the same program using the In statement

lets write the program


Dividing into groups without using IN
libname a15020 "D:\PRAXIS\sas/a15020";
libname learn "D:/PRAXIS/sas/sas 9.2/data sets";
data a15020.hosp1;
set learn.hosp;
if subject eq 5 or subject eq 100 or subject eq 150 or subject eq 200;
run;
proc print data=a15020.prob2;
run;

This is program for the question asked let us understand the program step wise
 Divide the program into lines -- there are 8 lines in the program
 In the 1st line  libname a15020 "D:\PRAXIS\sas/a15020";

in this we created a library name using the command libname and we gave it a name,name which we gave is a15020 which is followed by the location where you want to save the dataset permanently which should be there in " " and should end with a semi-column (;)

In the 2nd line  libname learn "D:/PRAXIS/sas/sas 9.2/data sets";

I already stored Hosp data in the library in learn so to initialize it we are giving the location of the learn

3rd line data a15020.hosp1;
its the data step as we are storing data permanently in the a15020 library we are giving the dataset name along with the library name  a15020.hosp1 ,here hosp1 is the dataset  name

4th line set learn.hosp;
we are using set command to set the hosp dataset which we already stored in the learn library
SET is used to load data in to the new program by using the data which we already used

5th line if subject eq 5 or subject eq 100 or subject eq 150 or subject eq 200;


we are using if statement here
we are saying SAS give me the values of subject where values are 5  or 100 or 150 or 200


here eq is =(operator)


6th line run;
 this will say to SAS that our data step came to an end
(remember to give the ; at the end of every step)


7 th step Proc print data=a15020.hosp1;

this will print the values which are stored in the dataset a15020.hosp1


8th step run;
As the proc step came to an end we will end the proc step using the run statement in the same way

we ended the data step

output :


 but in the above program we wrote so many OR commands for printing 5,100,150,200
in order to avoid it we have a command called IN in SAS
program demonstrating IN function

data a15020.usingin;
set learn.hosp;
if subject in (5 100 150 200);
run;
proc print data=a15020.usingin;
run;



without using IN we wrote a long step
if subject eq 5 or subject eq 100 or subject eq 150 or subject eq 200
but using in we wrote in(5 100 150 200)
 if subject in (5 100 150 200)---- this step tells SAS if you find subject the select the values 5 100 150 200 from the subject.

Ques: Using the Sales data set, create a new, temporary SAS data set containing Region and TotalSales plus a new variable called Weight with values of 1.5 for the North Region, 1.7 for the South Region, and 2.0 for the West and East Regions. Use a SELECT statement to do this.

Sol:  Here the question is in the sales dataset we have region adn total sales for that we have to add a new variable called as weight and we have to provide respective values for weights and we have to do this by the use of select statement

data a15020.newsales;
set learn.sales(keep=Region TotalSales);
select ;
when (region eq 'North') weight =1.5;
when (region eq 'South') weight =1.7;
when (region eq 'East') weight =2.0;
when (region eq 'North') weight =2.0;
otherwise;
end;
run;
proc print data=a15020.newsales;
run;



lets understand the  the simple program .......
we created a permanent dataset newsales in the library a15020,set the sales data set and keep only region and total sales variables in the sales dataset,next  use the select statement
SELECT statement is an alternative to the if and ifelse statement , in the select statement we will use when ,here we are writing when region is north weight will be 1.5 and so on ... this will take values of weight for all the regions ,otherwise can be left empty or you can write if otherthan the above values do this.select statement will end with an end statement and we are ending data step and asking SAS to print our output

Output:


Ques:  Count the number of missing values for the variables A, B, and C in the Missing data
set. Add the cumulative number of missing values to each observation (use variable
names MissA, MissB, and MissC). Use the MISSING function to test for the missing
values.

  Sol:  Here the question is count the missing values in the dataset missing data for variables A B C and name the missing values as MissA MissB MissC

before writing this program lets understand the sum statement in SAS then this program is simple

variable + increment;
the above statement will do the following things
Variable is retained
Variable is initialized at 0
Missing values (of increment) are ignored

data a15020.missing;
infile "D:/PRAXIS/sas/SAS BY EXAMPLE/missing.txt" missover;
input A B C;
if missing(A) then MissA+1;
if missing(B) then MissB+1;
if missing(C) then MissC+1;
run;
proc print data=a15020.missing;
run;


here upto input statement is same as above discussed then we are writing
if the value in A is missing then create a newvariable called MissA and then add one to it and then repeat for all the A values and store the final value in MissA

here if missing(A) will tell SAS if there is any missing values in A then
 next   is MissA+1 means we are creating a new variable missA and initializing it to 0 and then if a value is missing it will be counted
same will be repeated for missB and missC

Output: 
 
Ques: You are testing three speed-reading methods (A, B, and C) by randomly assigning
10 subjects to each of the three methods. You are given the results as three lines of
reading speeds, each line representing the results from each of the three methods,respectively. Here are the results:
250 255 256 300 244 268 301 322 256 333
267 275 256 320 250 340 345 290 280 300
350 350 340 290 377 401 380 310 299 399
Create a temporary SAS data set from these three lines of data. Each observation should contain Method (A, B, or C), and Score. There should be 30 observations in this data set. Use a DO loop to create the Method variable and remember to use a single trailing @ in your INPUT statement. Provide a listing of this data set using PROC PRINT.

 Sol: Here we have to place all the 30 observations we have in the three methods A B C by using the do command

data a15020.speed;
do method= 'A','B','C';
do n=1 to 10;
input score @;
output;
end;
end;
datalines;
250 255 256 300 244 268 301 322 256 333
267 275 256 320 250 340 345 290 280 300
350 350 340 290 377 401 380 310 299 399
;
proc print data=a15020.speed;
run;


we are using do and saying SAS create methods with names A B C and then for every method do
for 10 times thats the reason we gave
do method ='A','B','C';
do n=1 to 10;
and we are asking SAS to print the output at each level so we gave output compand and do statement should end with a end command so we used two do statements to end them we used two end statements

here for the input score we are using score @
input score @;

we used this @ to say SAS that the input we are providing is in a single line and not this multiple lines
if we are not providing @ then SAS will take only three values for A they are 250 267 350
so in order to take all the 10 values which are there in a single line we are using @ sign

we are datalines statement to provide the data in the SAS environment directly
and then printing the data.

Output: 
 

Ques : Using the Hosp data set, compute the subject’s ages two ways: as of January 1, 2006
(call it AgeJan1), and as of today’s date (call it AgeToday). The variable DOB
represents the date of birth. Take the integer portion of both ages. List the first 10
observations.

Sol: the problem is using the hosp dataset create two ages one untill jan1 2006 and one more age is today and only take the integer portion of the age .calculate the ages based on the DOB variable in the hosp dataset

libname learn "D:/PRAXIS/sas/sas 9.2/data sets";
data a15020.hosp3;
set learn.hosp;
agejan1=int(yrdif(dob,'01JAN2006'd,'actual'));
agetoday=int(yrdif(dob,today(),'actual'));
proc print data=a15020.hosp1 (obs=10);
var agejan1 dob agetoday;
run;


we are creating a dataset hosp3 and we are setting the hosp data set using the set function as discussed
next we are creating a new variable
agejan1 in that we are calculating the age difference by using function yrdif
yrdif contains three arguments (variable on which you want to do analysis, what you want to calculate , 'actual values or not)
actual values means it will take exact number of days means for sep it will take 30 days and for oct it will take 31 days
if we are not providing actual then for every month it will take 30 days and calculate

if we want calculate yrdif for a particular date for example January 1 2009  to give this date in SAS there is particular format '01Jan2009'd

so in agejan1 variable we rounding the calculated values to the nearest integer

if we want to calculate  age for  today then use today() function .

and in the proc print we are using obs=10 which will print only the first 10 observations of the hosp dataset and var function which says SAS print only the variables specified in the VAR statement. 

Output:
 
Ques:You want to see each patient in the Medical data set on the same day of the week 5 weeks after they visited the clinic (the variable name is VisitDate). Provide a listing of the patient number (Patno), the visit date, and the date for the return visit.

Sol: The problem here is for a hospital which has the data of patients stored in the medical dataset
and the values of the patient visit has been recorded and the doctor wants to see the patient after 5 weeks so we have to calculate the next visit date which is after 5 weeks by the patient

data a15020.returnvisit;
set learn.medical;
return_visit=intnx('week',visitdate,5,'sameday');
proc print data=a15020.returnvisit;
format return_visit mmddyy10.;
run;




here we are using a function INTNX which is used to calculate the date after a certain interval so the first variable in the INTNX function is week as we want to calculate date after 5 weeks (if we want to calculate months instead of weeks use month instead of week) and second is our variable on which the operations should be performed ,third argument is after how many weeks we want the date in this problem we want after 5 weeks so third argument is 5 , and fourth argument is we are asking SAS to print the date on the same day not on the starting of the month .

Output: 










Ques: Using the SAS data set Bicycles, create two temporary SAS data sets as follows: Mountain_USA consists of all observations from Bicycles where Country is USA and Model is Mountain. Road_France consists of all observations from Bicycles where Country is France and Model is Road Bike. Print these two data sets.


 Sol: Till now we have seen creating only one dataset using a data statement but lets create two datasets by using a single datastep and print the output ,the problem here says us to separate the Country and model from bicycles dataset and print in their respective outputs Mountain_USA and Road_France
 
libname a15020 "D:/PRAXIS/sas/a15020";
data a15020.mountain_usa a15020.road_france;
set learn.bicycles;

if country eq 'USA' and model eq 'Mountain Bike' then output a15020.mountain_usa;
else if country eq 'France' and model eq 'Road Bike' then output a15020.road_france;
run;
proc print data=a15020.mountain_usa;
run;

proc pritn data=a15020.road_france;
run;
Here in the data step we are creating two permanant datasets mountain_usa and road_france in the a15020 library and then we are using the set function to set the data in the bicycles to this dataset

and then we are seperating the data by using the if condition
if country is usa and model is mountain bike then keep that value in the mountain_usa file is the if statement which we gave in the step there
and then to print the output specify the dataset which we the output to be printed

Output:  for Mountain_usa




for Road_france:


Ques:Using the data set Sales, create the report shown here:
Sales Figures from the SALES Data Set
                            Total
Region Quantity Sales
East        100       699.0
               100       899.0
               500       19995.0
                  2        20000.0
------ -------- -------
East        702       41593.0

West      50         299.5
             1000      1990.0
------ -------- -------
West     1050      2289.5
======== =======
             2022     77113.0  in the specified format for all the specified regions.


Sol: The problem here states that divide the sales data set into speific regions and give the sum of the quantity and total sales region wise

Here's the problem


data a15020.region;
set learn.sales;
run;
proc sort data=a15020.region;
by region;
run;
proc print data=a15020.region;
by region;
var region quantity totalsales;
sum quantity totalsales;
run;


here we are creating a permanant dataset region in the library a15020 and we are setting the values which are present in the dataset sales.
To divide the data based on the regions first we have to sort the data based on the regions
to sort data use proc sort step followed by a BY statement which will say sort the data based on the variable specified in the BY statement . so here we want data to be segregated a region so we gave region as a BY variable
then print the data  by region and we want only two variables quantity and totalsales
so to get only these two variables use var statement which contains quantity and totalsales;
and use sum to sum these variables column wise and then end the step using a run statement

Output:



 


 

Ques: Using the data set Blood, produce a report like the one here. The numbers in the table
are the average WBC and RBC counts for each combination of blood type and
gender.
Average Blood Counts by Gender
         ----Gender---  ---Gender----
Blood Female Male Female Male
Type WBC WBC RBC RBC
A       7,218 7,051 5.47 5.46
AB     7,421 6,893 5.43 5.69
B       6,716 6,991 5.52 5.42
O       7,050 6,930 5.53 5.48

Sol:
 Here the problem states that segregate and print the values in the desired format .

for this type of segregation we will use proc report


data a15020.blood;
set learn.blood;
run;
proc report data=a15020.blood;
column blood_grp gender,wbc gender,rbc;
define blood_grp / group;
define gender/across width=8;
define wbc/analysis mean ;
define rbc /analysis mean;
run;


In proc report to specify which values to print we haev to give column option which is similar to the var option in the proc print statement
According to our ques we have to segregate gender and in gender we want wbc and rbc seperatly so
column we will provide gender,wbc and gender,rbc which will be helpful in the further steps
next we will define the blood_grp and /group is we are saying group all the variables based on the blood_group
next define gender /across here across is used to split the data column wise
next define wbc/analysis here analysis is used as its a numeric variable and the analysis we want to perform is mean but the default analysis is sum so we are specifying mean
same with rbc and end the proc report step with a run statement

Output:
 

you can find some more programs by downloading from the below link
https://drive.google.com/file/d/0B1v-Qh27ObFVc3lxMHVoR2VJSlU/view?usp=sharing 

1 comment:

  1. Your blog will look much better, if you:
    1. Make it structured. You can have separate page for each chapter and have an index page, linking the pages (for the chapters).  You can also add links at end of each chapter to go back to index, previous (if any) and next (if any) pages.
    2. Introduce the chapter (i.e. the page) in your own words.
    3. For each problem, explain the problem you are attempting to solve. Your approach here is OK.
    4. Use screenshots for programs as well.
    5. Some of the line by line comment you have put in your blog, are better off as in-program comment.  
    6. For each problem solved, write what you learned by solving the problem.  Even one line is fine for this. 
    7. Write a conclusion or summary for each page.


    Remember to :
    1. Number the problems solved by giving Chapter # and Problem #  (just like you did)
    2. "Beautify" the programs before posting.  There a menu item in SAS studio, which does the beautification for you.  Also, you may like to post the programs, the way you have posted the result.

    It would help, if you can think of this exercise as making your class notes and homework copy, combined.

    I understand, it is lot of work now, given you are trying to do it at eleventh hour.  For the next submission, if time permits, you can give it a try.

    Cheers

    ReplyDelete