ruby! food! kids! … and other fun from terry heath
RSS icon Email icon Home icon
  • Guidification

    Posted on October 3rd, 2010 terry 4 comments

    Introduction

    About 4 months ago, I had a conversation with a coworker. It went something like this:

    J: “I noticed we’re using integers for our IDs. Why don’t we use GUIDs?”
    Me: “We don’t have multiple sources of data creation, and incrementing IDs are easier to keep up with. I’ll let you know if that turns out to be wrong though.”

    About 3 weeks ago, I had this conversation:

    Me: “Shit. We need GUIDs.”

    We’re working on being able to replicate datasets and clone them into existing environments, and incrementing IDs allow for collisions, which would be bad news bears.

    There’s a few hiccups I ran into, a few gems that needed to be patched, and a few unexpected benefits that have come from guidifying our app.

    Before I get too deep into it, here’s the plugin we’re using: BMorearty/usesguid, and this is on Rails 2.3.9, so I’m not sure all the other things that will come up when we get to Rails 3.

    The Migration

    The first step in all of this was to convert the IDs to GUIDs. I talked with JBogard if he’d done anything similar, and he said they had a script that would look at the MSSQL metadata and figure out what columns to update from that. The script wouldn’t really work for us since we use MySQL, but it’s a good approach.

    Using that idea, I used ActiveRecord’s reflections and column to figure out what columns to change. The process itself was something like:

    1. Remove any indexes that could cause problems as you’re killing FK columns.
    2. Find every table with a PK ID item, and add a GUID column (can be done very fast using MySQL’s select uuid())
    3. Use ActiveRecord’s reflections to find tables looking at other tables, and add a GUID column for the FK and populate it with the other table’s GUID
    4. Delete the ID columns
    5. Add indexes back to your keys (don’t forget this step!)

    Here’s how it looked:

    I took out most of the project-specific stuff, but kept in the parent_id stuff since that’s most likely useful to someone else.

    Hiccups

    Anywhere we were writing explicit SQL, IDs had to be quoted. Unit tests caught all of these issues, thankfully.

    Anywhere we were passing non-quoted IDs in Javascript also had to be quoted, because a1232-444ds isn’t a valid expression.

    Anywhere we were checking to see if an ID was valid, we were doing it by saying id.to_i != 0. That still works, 9/36ths of the time. If your guid starts with a non-zero integer, it’ll work, otherwise, no good. We now use a regex to tell if something is a guid.

    Because some plugins (below) will have to use usesguid, I had to change the plugin load order to put usesguid first.

    We had some routes that would look for an integer-only-regex match to find :id parameters, so we had to change that to a guid-finding regex.

    Library Changes

    acts_as_list

    Had to vendor the gem to quote explicit SQL relying on IDs.

    acts_as_archive

    Had to vendor the gem to quote explicit SQL relying on IDs.

    acts_as_solr_reloaded

    I had to change acts_as_solr_reloaded’s PK to be a string instead of an integer and add usesguid to dynamic_attributes and locals.

    acts_as_audited

    Needed usesguid on audit.rb

    delayed_job

    DJ does a very simple regex check to see if something is a valid job, and if not, bails. Unfortunately, it just looks for Model:integer. This fails with guids, obviously, so I had to change the regex in performable_method to look for a GUID instead. See it here.

    usesguid

    I really like the 36 character string GUIDs. I don’t have a good reason for it, but I modified the usesguid library to not use .to_s22, because the 22 character ones look odd. It might be just me who cares about it, but it’s worth noting that if you’re sending GUIDs to other systems, there’s a chance they’ll see the 22 character string and tell you the GUID is invalid. So, there’s that.

    Unexpected Benefits

    I recently wrote a migration that imported 131,000 new records into an existing table, and set up associations along the way and did a little bit of geocoding magic. The migration took about 45 minutes. After doing all of that hard work, I mysqldump‘d those tables and changed the migration to just load the generated data. The migration now takes <5sec.

  • Migrations with Mongo (and MongoMapper)

    Posted on February 26th, 2010 terry 7 comments

    I’m working on an application (which you’ll get to see soon) with another guy, and he introduced some timestamps to a MongoMapper model after some of those models had already been coded. As a result, the first time I tried to visit a page with my user, I got a sweet NoMethodError, because apparently nil doesn’t implement strftime.

    There are lots of ways to solve this problem, but it’s one that doesn’t arise nearly as frequently on SQL databases (I hypothesize…) because you have migrations where you can set default values for the schema.

    When we decided to write our new app in Mongo, one thing that worried me would be data transforms down the road. How do we keep up with those? Do we just create rake tasks and run each one, noting them somewhere in a textfile? That sounds kind of awful:

    8.3
    * run rake:transform_user_emails to downcase emails
    * run rake:default_user_nicknames

    8.4
    * run rake:user:canonicalize_phone_numbers

    This gets out of hand in a hurry.

    Anyways, in my MO of looking on the first page of Google, not finding my answer, and writing something that does mostly what I want, I give you Mongrations.

    Before I say anything else, I’ll tell you that I would love forks and fixes and bug findings and I know I need to write tests – but I’m pretty sure no matter what anyone does, Mongration is going to stick.

    If you install the plugin (it needs to be a MongoMapper backed application), and change your vocabulary to use “mongration” and “mongrate” instead of “migration” and “migrate,” respectively, you have pretty much what you had with ActiveRecord’s migrations, only instead of schema changes, you just manipulate data.

    I think this is actually a fairly pure solution. While there were problems and gotchas with integrating schema changes and data changes in the same set of migrations, I think MongoMapper skirts that with the schema being solely defined in the models.

    It’s entirely possible I’m overlooking something ridiculously easy, so feel free to tell me as much if you think so.