I'm trying to create a frequency table using the following sample dataset.
For subjects, var1-var3 are variables measured on different days (day variable). I would like to generate a frequency count for how many subjects with value=1 on any given day per study (startified by study).
Any help to achieve the output is greatly appreciated.
data test; input id var1 var2 var3 day stud; datalines; 100101 1 1 1 -4 1 100101 1 0 0 1 1 100101 1 0 0 3 1 100101 1 1 1 8 1 100101 1 0 0 15 1 100101 1 0 0 17 1 100101 0 1 1 27 1 100103 1 1 1 -5 1 100103 1 0 0 1 1 100103 1 0 0 3 1 100103 0 1 1 7 1 100106 1 1 1 -4 1 100106 1 0 0 1 1 100106 1 0 0 3 1 100106 1 1 1 8 1 100106 1 0 0 15 1 100106 1 0 0 17 1 100106 1 1 1 26 1 1000101 1 1 1 -13 2 1000101 1 0 0 -8 2 1000101 1 0 0 0 2 1000101 1 0 0 1 2 1000101 1 0 0 1 2 1000101 0 1 1 7 2 1000101 1 0 0 8 2 1000101 1 1 1 15 2 1000101 1 0 0 111 2 1000103 1 1 1 -18 2 1000103 1 0 0 -9 2 1000103 1 0 0 0 2 1000103 0 1 1 6 2 1000103 1 0 0 7 2 1000103 1 0 0 17 2 1000103 0 1 1 19 2 1000103 1 0 0 24 2 4900201 1 0 0 -9 2 4900201 0 1 1 -9 2 4900201 1 0 0 -8 2 4900201 1 0 0 0 2 4900201 1 0 0 1 2 4900201 1 0 0 2 2 4900201 0 1 1 6 2 4900201 1 0 0 7 2 4900201 1 0 0 15 2 ; run;
Any help to achieve the example output is greatly appreciated.
day | day | day | day | day | day | day |
-4 | 1 | 3 | 8 | 15 | 17 | 27 |
var1 | ||||||
var2 | ||||||
var3 |
Proc tabulate allows nesting and grouping in different manners.
data test; input id var1 var2 var3 day stud; datalines; 100101 1 1 1 -4 1 100101 1 0 0 1 1 100101 1 0 0 3 1 100101 1 1 1 8 1 100101 1 0 0 15 1 100101 1 0 0 17 1 100101 0 1 1 27 1 100103 1 1 1 -5 1 100103 1 0 0 1 1 100103 1 0 0 3 1 100103 0 1 1 7 1 100106 1 1 1 -4 1 100106 1 0 0 1 1 100106 1 0 0 3 1 100106 1 1 1 8 1 100106 1 0 0 15 1 100106 1 0 0 17 1 100106 1 1 1 26 1 1000101 1 1 1 -13 2 1000101 1 0 0 -8 2 1000101 1 0 0 0 2 1000101 1 0 0 1 2 1000101 1 0 0 1 2 1000101 0 1 1 7 2 1000101 1 0 0 8 2 1000101 1 1 1 15 2 1000101 1 0 0 111 2 1000103 1 1 1 -18 2 1000103 1 0 0 -9 2 1000103 1 0 0 0 2 1000103 0 1 1 6 2 1000103 1 0 0 7 2 1000103 1 0 0 17 2 1000103 0 1 1 19 2 1000103 1 0 0 24 2 4900201 1 0 0 -9 2 4900201 0 1 1 -9 2 4900201 1 0 0 -8 2 4900201 1 0 0 0 2 4900201 1 0 0 1 2 4900201 1 0 0 2 2 4900201 0 1 1 6 2 4900201 1 0 0 7 2 4900201 1 0 0 15 2 ; run; proc tabulate data=test; class stud id day; var var1 var2 var3; /* this does separate table for each level of study*/ table stud='Study:', id, day*(var1 var2 var3)*(sum=""*f=best5.) /misstext=' ' ; /* nests id within levels of stud*/ table stud='Study' * id, day*(var1 var2 var3)*(sum=""*f=best5.) /misstext=' ' ; run;
As @Reeza sum gets the number of 1 values, there are a number of statistics that could be requested as well. Place them inside the () around sum to group as in these tables.