Summary datasets using hashes


/ Published in: SAS
Save to your folder(s)

This snippet comes directly from Paul M. Dorfman's paper on programming with Hash objects. The hash object is useful when summarising huge datasets that aren't sorted and indexed by the variable(s) to be summarised; they can often be quicker than proc summary and are certainly less machine intensive.


Copy this code and paste it in your HTML
  1. ** Dummy code;
  2. data input ;
  3. do k1 = 1e6 to 1 by -1 ;
  4. k2 = put (k1, z7.) ;
  5. do num = 1 to ceil (ranuni(1) * 6) ;
  6. output ;
  7. end ;
  8. end ;
  9. run ;
  10.  
  11. ** Standard approach using Proc Summary;
  12. proc summary data = input nway ;
  13. class k1 k2 ;
  14. var num ;
  15. output out = summ_sum (drop = _:) sum = sum ;
  16. run ;
  17.  
  18. ** Alternative using the hash object;
  19. data _null_ ;
  20. if 0 then set input ;
  21.  
  22. dcl hash hh (hashexp:16) ;
  23. hh.definekey ('k1', 'k2' ) ;
  24. hh.definedata ('k1', 'k2', 'sum') ;
  25. hh.definedone () ;
  26. do until (eof) ;
  27. set input end = eof ;
  28. if hh.find () ne 0 then sum = 0 ;
  29. sum ++ num ;
  30. hh.replace () ;
  31. end ;
  32. rc = hh.output (dataset: 'hash_sum') ;
  33. run ;

URL: http://www2.sas.com/proceedings/sugi30/236-30.pdf

Report this snippet


Comments

RSS Icon Subscribe to comments

You need to login to post a comment.