Posted By

chrisaiv on 10/23/10


Tagged

rails ruby csv ar-extensions


Versions (?)

Who likes this?

1 person have marked this snippet as a favorite

Priestd09


Ruby: Importing CSV data into DB with ActiveRecord Validation


 / Published in: Ruby
 

This script is a nice balance between validating data and speed.

  1. require 'rubygems'
  2. require 'activerecord'
  3. require 'faster_csv'
  4.  
  5. ###########################################
  6. #1. Create a DB table
  7. ###########################################
  8. class CreateLocations < ActiveRecord::Migration
  9. def self.up
  10. create_table :locations, :force => true do |t|
  11. t.string :sale_type,:home_type,:address,:city,:state,:zip,:list_price,:beds,:baths,:location,:sqft,:lot_size,:year_built,:parking_spots,:parking_type,:days_on_market,:next_open_house_date,:next_open_house_start_time,:next_open_house_end_time,:recent_reduction_date,:original_list_price,:last_sale_date,:last_sale_price,:url,:source,:listing_id,:original_source,:favorite,:interested,:latitude,:longitude,:is_short_sale
  12. end
  13. end
  14.  
  15. def self.down
  16. drop_table :locations
  17. end
  18. end
  19.  
  20. class Location < ActiveRecord::Base
  21. validates_presence_of #:sale_type,:home_type,:address,:city,:state,:zip,:list_price,:beds,:baths,:location,:sqft,:year_built,:parking_spots,:days_on_market,:original_list_price,:url,:source,:listing_id,:original_source,:favorite,:interested,:is_short_sale
  22. end
  23.  
  24. ###########################################
  25. #2. Connect to Database: ar-extensions is the best of both worlds between ActiveRecord and DB default import features
  26. ###########################################
  27. require 'ar-extensions'
  28. #Logger Objects are handy for finding out why imports crash
  29. logger = Logger.new('import.log')
  30. #Set the Logger level to Info to prevent boring debug messages
  31. logger.level = Logger::INFO
  32. #Make DB connection
  33. ActiveRecord::Base.establish_connection(
  34. :adapter => 'mysql',
  35. :database => 'test_test',
  36. :username => 'root',
  37. :password => ''
  38. )
  39. #Connect Logger to Active Record
  40. ActiveRecord::Base.logger = logger
  41.  
  42. #Kick off DB table creation
  43. CreateLocations.up
  44.  
  45. ###########################################
  46. #3. Parse the CSV File into a Location Object
  47. ###########################################
  48. def load(filename, chunk_size = 3_000, validate = true)
  49. options = {
  50. :headers => true,
  51. :header_converters => :symbol,
  52. :col_sep => ','
  53. }
  54.  
  55. locations = []
  56. #Instead of reading all the files at once, we instead break them into chunks
  57. FasterCSV.foreach(filename, options) do |row|
  58. locations << Location.new(row.to_hash)
  59. #Breaking everything up helps with memory and improves speed
  60. if locations.size % chunk_size == 0
  61. #ar-extensions can create multiple database objects with a single statement through .import()
  62. Location.import locations, :validate => validate
  63. locations = []
  64. end
  65. end
  66. Location.import locations, :validate => validate if locations.size > 0
  67. end
  68.  
  69. load(ARGV[0], 1_000, false)
  70. puts Location.count

Report this snippet  

You need to login to post a comment.