Splitting Data Set Based on a Variable

SAS programmers sometimes need to split a data set into multiple data sets, depending on the unique values of a variable. And you can usually achieve what you want by applying a WHERE= option or IF statement. But these aren’t the most efficient or elegant method. Suppose that you need to break SASHELP.CLASS into different tables based on the value of SEX, here are three methods I know:

  1. CALL EXECUTE:
    proc sql;
        create table sex as
            select distinct SEX 
        	from sashelp.class
            ;
    quit;
    
    data _null_;
        set sex;
        call execute('data sex_'||cats(SEX)||'(where=(SEX='||quote(cats(SEX))||')); set sashelp.class; run;');
    run;
    
  2. FILENAME:
    proc sql;
        create table sex as
            select distinct SEX 
        	from sashelp.class
            ;
    quit;
    
    filename code temp;
    data _null_;
        file code;
        set sex;
        put ' sex_' SEX '(where=(SEX="' SEX '"))' @@;
    run;
    
    data %inc code;;
        set sashelp.class;
    run;
    
  3. HASH(SAS9.2+):
    proc sort data=sashelp.class out=class;
    	by SEX;
    run;
    
    data _null_;
        dcl hash h(multidata:'y');
        h.definekey('SEX');
        h.definedone();
        do until(last.SEX);
            set class;
            by SEX;
            h.add();
        end;
        h.output(dataset:cats('sex_', SEX));
    run;
    

Note that the second method is most efficient when splits a huge data set since it reads data set only one time.

Xianhua Zeng /
Published under (CC) BY-NC-SA in categories Code  tagged with CALL EXECUTE  FILENAME  Hash Object 

Previous SAS Display Manager Commands
Next Automagically Opening Dataset and Copying Variable Value