Posted By

tkf on 09/26/08


Tagged

python Numpy pickle


Versions (?)

Who likes this?

1 person have marked this snippet as a favorite

carlosabargues


pickle(cPickle) vs numpy tofile/fromfile


 / Published in: Python
 

  1. import numpy
  2. import os
  3. import cPickle as pickle
  4. from datetime import datetime
  5.  
  6. def dt2sec(dt):
  7. return dt.microseconds / 1000.0 + dt.seconds
  8.  
  9. data = [ { "a" : numpy.arange( 100, dtype=float ),
  10. "b" : numpy.arange( 100, dtype=float ).reshape(10,10) }
  11. for i in range(10)]
  12.  
  13. pickle_name = "data.pickle"
  14. ff_dir = "fromfile"
  15.  
  16. os.mkdir( ff_dir )
  17.  
  18. #----- write data
  19. print "write data"
  20. t0 = datetime.now()
  21. pickle.dump(data, file(pickle_name,"w"))
  22. t1 = datetime.now()
  23. print "pickle.dump :", dt2sec(t1-t0)
  24.  
  25. t0 = datetime.now()
  26. for i,d in enumerate(data):
  27. idir = os.path.join(ff_dir,str(i))
  28. os.mkdir( idir )
  29. for name, array in d.items():
  30. # write to fromfile/{i}/{name}
  31. array.tofile(os.path.join(idir,name))
  32. t1 = datetime.now()
  33. print "tofile :", dt2sec(t1-t0)
  34.  
  35. #----- read data
  36. print "read data"
  37. t0 = datetime.now()
  38. pickle_data = pickle.load(file(pickle_name))
  39. #print pickle_data
  40. t1 = datetime.now()
  41. print "pickle.load :", dt2sec(t1-t0)
  42.  
  43. t0 = datetime.now()
  44. ff_data = []
  45. for i in os.listdir(ff_dir):
  46. idir = os.path.join(ff_dir,i)
  47. tmp = {}
  48. for name in os.listdir(idir):
  49. tmp[name] = numpy.fromfile(os.path.join(idir,name))
  50. ff_data.append(tmp)
  51. #print ff_data
  52. t1 = datetime.now()
  53. print "fromfile :", dt2sec(t1-t0)
  54.  
  55.  
  56. ## write data
  57. ## pickle.dump : 9.328
  58. ## tofile : 2.946
  59. ## read data
  60. ## pickle.load : 15.423
  61. ## fromfile : 1.858

Report this snippet  

Comments

RSS Icon Subscribe to comments
Posted By: i3d on September 29, 2008

When the data set increases to like thousands or millions, the tofile and fromfile would probably suffer on IO operations. Have you tested on bigger data set? Say 100,000 records? Also creating a dir for every entry seems too much overhead and will be limited on OS resources I think.

Posted By: scarfboy on July 9, 2009

I think you really want to use pickle.HIGHEST_PROTOCOL. Its makes a factor ten or twenty difference (on top of the cPickle/pickle difference) for both writing and reading.

You need to login to post a comment.