We Recommend

Learning Python Learning Python
The authors of Learning Python show you enough essentials of the Python scripting language to enable you to begin solving problems right away, then reveal more powerful aspects of the language one at a time. This approach is sure to appeal to programmers and system administrators who have urgent problems and a preference for learning by semi-guided experimentation.


Posted By

tkf on 09/26/08


Tagged

python Numpy pickle


Versions (?)


pickle(cPickle) vs numpy tofile/fromfile


Published in: Python 


  1. import numpy
  2. import os
  3. import cPickle as pickle
  4. from datetime import datetime
  5.  
  6. def dt2sec(dt):
  7. return dt.microseconds / 1000.0 + dt.seconds
  8.  
  9. data = [ { "a" : numpy.arange( 100, dtype=float ),
  10. "b" : numpy.arange( 100, dtype=float ).reshape(10,10) }
  11. for i in range(10)]
  12.  
  13. pickle_name = "data.pickle"
  14. ff_dir = "fromfile"
  15.  
  16. os.mkdir( ff_dir )
  17.  
  18. #----- write data
  19. print "write data"
  20. t0 = datetime.now()
  21. pickle.dump(data, file(pickle_name,"w"))
  22. t1 = datetime.now()
  23. print "pickle.dump :", dt2sec(t1-t0)
  24.  
  25. t0 = datetime.now()
  26. for i,d in enumerate(data):
  27. idir = os.path.join(ff_dir,str(i))
  28. os.mkdir( idir )
  29. for name, array in d.items():
  30. # write to fromfile/{i}/{name}
  31. array.tofile(os.path.join(idir,name))
  32. t1 = datetime.now()
  33. print "tofile :", dt2sec(t1-t0)
  34.  
  35. #----- read data
  36. print "read data"
  37. t0 = datetime.now()
  38. pickle_data = pickle.load(file(pickle_name))
  39. #print pickle_data
  40. t1 = datetime.now()
  41. print "pickle.load :", dt2sec(t1-t0)
  42.  
  43. t0 = datetime.now()
  44. ff_data = []
  45. for i in os.listdir(ff_dir):
  46. idir = os.path.join(ff_dir,i)
  47. tmp = {}
  48. for name in os.listdir(idir):
  49. tmp[name] = numpy.fromfile(os.path.join(idir,name))
  50. ff_data.append(tmp)
  51. #print ff_data
  52. t1 = datetime.now()
  53. print "fromfile :", dt2sec(t1-t0)
  54.  
  55.  
  56. ## write data
  57. ## pickle.dump : 9.328
  58. ## tofile : 2.946
  59. ## read data
  60. ## pickle.load : 15.423
  61. ## fromfile : 1.858

Report this snippet 

Comments

RSS Icon Subscribe to comments
Posted By: i3d on September 29, 2008

When the data set increases to like thousands or millions, the tofile and fromfile would probably suffer on IO operations. Have you tested on bigger data set? Say 100,000 records? Also creating a dir for every entry seems too much overhead and will be limited on OS resources I think.

You need to login to post a comment.