CSV and Rails - Freelancing Digest

Today I hacked up some scripts to update a Rails database based off of what is in a SQL Server database. The goal was to automate the export of some information from SQL, send it to a remote server, and then update the Rails database with the current information. Using Ruby’s CSV library made this really easy.

First I had to write an export script to get the data out of SQL Server and into csv. Using an old programming language this turned out to be rather easy, the hardest part was to get my joins working (multiple left joins into a single table). Just about any database library should be able to export to a basic CSV, so it only matters which one you know best and can get the job done in (sadly Ruby would not work, as I do not have permission to install it there)

Once that was sorted out, all that it takes is adding a method to the Rails model that uses the CSV library to parse the exported file. In fact I even used the example they gave to start it out:

require 'csv'

class Foo < ActiveRecord::Base

  def self.upload(file="default_file.csv")
    # Clear old entries
    Foo.find(:all).each {|item| item.destroy}
    first_row = true
    increment = 1   # This is so we are always using 1 to x for the table id's
    CSV::Reader.parse(File.open("#{RAILS_ROOT}/#{file}", 'rb')) do |row|
      if not first_row
        f = self.new
        f.id = increment
        f.title = row[0]
        f.name = row[1]
        f.status = row[2]
        if f.save
          if RAILS_ENV != 'test'
            puts "Row #{increment} entered..."
          end
        else
          puts "Row #{increment} not entered..."
        end
        increment += 1
      end
      first_row = nil
    end
  end
end

This code does many things for me:

It removes all the old entries. As this is from an export, there will be duplicated data in each import that we don’t want (Foo.find(:all…)
It skips the first row (rather hackish though) as that row is for the headings of each column (first_row = true)
It sets a default filename and path. This will allow me to automate it later, but also use a smaller or different data-set for testing.
It prints some debugging info if it is not run in a test unless the item is not entered, at which point it fails loudly.
It also re-uses the id columns. They are not really scare but if we run this enough, dumping and adding 12 sets of 5000+ rows a day will really chew though the numbers.

Now I can just call Foo.upload and have it grab the default file or Foo.upload("test/test.csv") to grab a CSV from the testing directory. This helps in that I really don’t want to run deletes and inserts of the full 5000 row data-set for each test run. The really good thing about setting up the default file is now I can just run one command RAILS_ROOT/script/runner "Foo.upload" from CRON and have it do it’s little thing.

So there is a quick example of the power of Ruby’s CSV class. There also is a FasterCSV class that I might take a look at if this one starts to run slowly.

Eric Davis