ruby! food! kids! … and other fun from terry heath
RSS icon Email icon Home icon
  • Dealing with Memory Leaks in DelayedJob

    Posted on April 23rd, 2011 terry 1 comment

    Update: David Genord pointed out a *huge* bug in my code, where jobs will never be retried; his fix has replaced the existing gist.

    Our workers (DelayedJob) leak memory. Not heinously fast, but enough that monit bounces them fairly often. I tried out perftools.rb on one of our longer running jobs, reindexing contacts. I cut the job down to indexing only 25 contacts, and profiled objects instantiated. To index 25 contacts, which took about 25 seconds, the app instantiated 3.1 million objects.

    Just in case the title is misinterpreted – DJ is not leaking memory. My code is.

    With a little bit of memoization and a few short-circuits, I got it down to 800 thousand. That was nice, and it sped things up by an order of magnitude, but the jobs were still leaking memory

    So, I decided to steal a cool concept from the next async processor I really want to work with: Resque. I changed the worker to fork and wait for every job it performs. This means that there’s an overhead added to every worker of about 200MB, but that’s nothing compared to how bad things got if all the workers started sucking up huge chunks of memory at the same time.

    One caveat: I tried to look around for someone who’d already done this and just incorporated those changes, but it turns out googling “delayed job fork” doesn’t reveal much. If there’s an existing project that’s already trying to do this for delayed job, please let me know :)

    The code changes around this are pretty simple, just changing the run method and adding a hook for handling fork reconnection stuff (it looks like DJ has some hooks built for this, but I didn’t see them in my local copy)

    Before this patch, if I ran 1 worker locally, after about half an hour my machine ran out of ram (it has 8gb). If I ran 4, my computer was unusable. After the patch, I can run 4 workers locally with no perceived performance hit. Money shot:

    Activity Monitor
  • Guidification

    Posted on October 3rd, 2010 terry 4 comments

    Introduction

    About 4 months ago, I had a conversation with a coworker. It went something like this:

    J: “I noticed we’re using integers for our IDs. Why don’t we use GUIDs?”
    Me: “We don’t have multiple sources of data creation, and incrementing IDs are easier to keep up with. I’ll let you know if that turns out to be wrong though.”

    About 3 weeks ago, I had this conversation:

    Me: “Shit. We need GUIDs.”

    We’re working on being able to replicate datasets and clone them into existing environments, and incrementing IDs allow for collisions, which would be bad news bears.

    There’s a few hiccups I ran into, a few gems that needed to be patched, and a few unexpected benefits that have come from guidifying our app.

    Before I get too deep into it, here’s the plugin we’re using: BMorearty/usesguid, and this is on Rails 2.3.9, so I’m not sure all the other things that will come up when we get to Rails 3.

    The Migration

    The first step in all of this was to convert the IDs to GUIDs. I talked with JBogard if he’d done anything similar, and he said they had a script that would look at the MSSQL metadata and figure out what columns to update from that. The script wouldn’t really work for us since we use MySQL, but it’s a good approach.

    Using that idea, I used ActiveRecord’s reflections and column to figure out what columns to change. The process itself was something like:

    1. Remove any indexes that could cause problems as you’re killing FK columns.
    2. Find every table with a PK ID item, and add a GUID column (can be done very fast using MySQL’s select uuid())
    3. Use ActiveRecord’s reflections to find tables looking at other tables, and add a GUID column for the FK and populate it with the other table’s GUID
    4. Delete the ID columns
    5. Add indexes back to your keys (don’t forget this step!)

    Here’s how it looked:

    I took out most of the project-specific stuff, but kept in the parent_id stuff since that’s most likely useful to someone else.

    Hiccups

    Anywhere we were writing explicit SQL, IDs had to be quoted. Unit tests caught all of these issues, thankfully.

    Anywhere we were passing non-quoted IDs in Javascript also had to be quoted, because a1232-444ds isn’t a valid expression.

    Anywhere we were checking to see if an ID was valid, we were doing it by saying id.to_i != 0. That still works, 9/36ths of the time. If your guid starts with a non-zero integer, it’ll work, otherwise, no good. We now use a regex to tell if something is a guid.

    Because some plugins (below) will have to use usesguid, I had to change the plugin load order to put usesguid first.

    We had some routes that would look for an integer-only-regex match to find :id parameters, so we had to change that to a guid-finding regex.

    Library Changes

    acts_as_list

    Had to vendor the gem to quote explicit SQL relying on IDs.

    acts_as_archive

    Had to vendor the gem to quote explicit SQL relying on IDs.

    acts_as_solr_reloaded

    I had to change acts_as_solr_reloaded’s PK to be a string instead of an integer and add usesguid to dynamic_attributes and locals.

    acts_as_audited

    Needed usesguid on audit.rb

    delayed_job

    DJ does a very simple regex check to see if something is a valid job, and if not, bails. Unfortunately, it just looks for Model:integer. This fails with guids, obviously, so I had to change the regex in performable_method to look for a GUID instead. See it here.

    usesguid

    I really like the 36 character string GUIDs. I don’t have a good reason for it, but I modified the usesguid library to not use .to_s22, because the 22 character ones look odd. It might be just me who cares about it, but it’s worth noting that if you’re sending GUIDs to other systems, there’s a chance they’ll see the 22 character string and tell you the GUID is invalid. So, there’s that.

    Unexpected Benefits

    I recently wrote a migration that imported 131,000 new records into an existing table, and set up associations along the way and did a little bit of geocoding magic. The migration took about 45 minutes. After doing all of that hard work, I mysqldump‘d those tables and changed the migration to just load the generated data. The migration now takes <5sec.

  • Parallel programming with environment variables

    Posted on October 1st, 2010 terry No comments

    There’s about 5 hours in the day where I’m caffeinated enough to keep up with a few tasks at once, and can get to a blocking point at each task (e.g., rake test or rake db:migrate) and go to the next one. Doing this with a “; say ‘done with x’” works pretty well, but if you’re working in the same Rails project, there are some other issues that come up, namely, database names. Or solr ports, if you’re using something like acts_as_solr_reloaded.

    I have three project directories, and accordingly three solr directories. But in development mode, the setup I have dictates that solr runs on port 8982, and the app db be *_development. So, I added something like this to the different YAMLs:

    That looks for an environment variable named after the containing directory and will append a string to it, giving me multiple, parallel, development databases.

    Something similar works for solr:

    Here’s my .bashrc to set things up for parallel directories:

    It works nicely for configuration files that need to be put into source control.

  • Cap Release Notes

    Posted on May 24th, 2010 terry No comments

    One of the really good things I learned from some guy, we’ll call Donkey, is to have everyone write down deployment dependencies in a file, so if someone goes on vacation or whatever during a deploy, you at least have an idea of what you should do.

    I put it in /README, and separate everything by sprints. It looks something like this:

    Even though everyone tries to remember to put things in there, there’s still a single point of failure, when the idiot who deploys it forgets to check the readme.

    So, to prevent future self-indictments, I put a task that runs at the end of our cap script that prints the current release notes.

    Ideally all of these are handled with a comprehensive deploy script, but that’s not always feasible/doable. Here’s the cap task, and if you make it the last thing that runs, you’ll get a friendly reminder at the end of every deploy for things you need to do.

  • Quick Tips

    Posted on April 20th, 2010 terry 2 comments

    Chrome view source
    I ran into an incredibly confusing bug last week where I had some fields named user[field_x], but they were being posted to my Rails server as user[field_y]. I started to blame all sorts of crazy things (becoming superstitious in not knowing wtf could be happening).

    I pulled my field-level identity map (another blog post). I wrote some Rack middleware to intercept the parameters before Rails’s ParamsParser could touch it and make sure they were good. Fail. Nothing worked. After 2 hours of fighting with it, unable to reproduce in tests or anything, I realized I wasn’t viewing the source of the content I was actually viewing.

    Chrome sends another, new request to the server when you say view source. This means that you’re not viewing the source from the page you’re viewing. If the pages are static or if there’s no way server state affects your page, then you’re good. Otherwise you might lose 3 hours.

    This bug was painful enough that I’ve dropped Chrome as my default browser and am back on Firefox. I hope that by reading this you’ll think of it when you do view source on Chrome and nothing makes sense.

    Git stash
    I learned about git-stash over the weekend. It allows you to stash your changes on a stack, make some other changes to the codebase, do what you want with them, and then pull those stashed changes back out.

    Example use case: I branch before everything I work on, but eventually I have to merge stuff back in. So I branch to make feature X, get it written, then merge it back into HEAD. Unfortunately, tests are failing. I’m not going to push my changes yet, but something needs to be changed for another developer who wants something to run locally. git stash –keep-index; do changes; git push; git stash pop. Done and done, and no broken builds!

    Timecop
    A developer at our office was working with some relative time methods. These are inherently tricky to test, because if “next week” means “next business week”, you can’t just willy-nilly add 7 days to a day and do tests. Same with “this week”, etc.

    Instead of doing a bunch of complicated date math that would arguably make the tests just as scary as the code, we found Timecop, which lets you freeze time for your tests. You can say, “I want Date.today to evaluate to 20Apr2010″, and it will. This allowed for significantly simpler tests and saved a ton of time.

    Avoid toggle actions
    Even on some of the more popular sites on the internets, there are a lot of AJAX actions that don’t handle network failures well. Backpack (at least on my iPhone) just keeps the in progress icon going forever. We’re not doing anything super-critical with AJAX on our app, but we’d like to at least handle timeouts or failures gracefully.

    The simple solution is to check for the HTTP response code and act appropriately (2xx? Yay! 4xx? BOO!). We decided to do this, but realized that it’s possible that your request makes it through to the server, then your network fails, and you get a timeout. What do we say happened on the client side?

    If they’re single-direction actions (e.g., marking something done), then you can just say “Something went wrong,” and not worry about changing it back or anything. If it’s a toggle, though, you can’t make any guarantees. “Something went wrong,” sure, but should the user try again? What if it only went through for the first 3 items?

    As such, we’re no longer writing toggle_x actions. Now we’ll handle that on the frontend, and have negative_x and positive_x methods.

  • DelayedJob scheduling improvements

    Posted on April 20th, 2010 terry No comments

    Last week I talked with a guy who noted that the scheduling patch I’d blogged about for DJ was insufficient for scheduling something every day at 8AM. This is because running a job and then scheduling the job again after a success pushes the job out a little bit each time (maybe in a week or so it runs at 10AM, e.g.)

    I thought about trying to make a run_at class method that would let you specify it, like “run_at ’8am’”, but that quickly fell through when I realized ’8am’ is ambiguous – 8AM every day? Every Wednesday? Every 4th week in July?

    I fall inline with Brandon Keepers when I say that I don’t want to re-implement cron, because (imo) cron is one of the best scheduling tools out there already, and it comes packed onto every Unix distribution that ends with x.

    Instead, I changed the run_every method to now accept a block, so you can pick the new best time to run if you’d like, or just keep using the same old (8.hours) syntax.

    Think something like this:

    If you like it, here’s the GitHub Repo.

  • Machinist + RSpec matcher

    Posted on March 20th, 2010 terry 14 comments

    Most of the time when I’m putting the skeleton of my Rails app together, I end up with a test like this:

    After discovering Remarkable this week, I decided it was worth putting that into a macro matcher*. So now you can just say it { should work_with_machinist }. Here’s the code. Not tested. Enjoy:

    * Thanks @David for pointing out it’s a matcher, not a macro

    ** Clarification: From reading the comments, I realize that this doesn’t seem useful unless you’ve used machinist. Machinist’s make method works on an ActiveRecord model, and works something like this: calls .new() with the fake params, calls save!, and then calls reload. If any of those fail, it raises an exception, so this effectively tests your blueprints for validation and callback sanity.

  • Bling Bling gem management

    Posted on February 20th, 2010 terry 9 comments

    With Rails 3 and 2.3.5, I’ve been trying to use Bundler, but it’s been pretty painful so far. It’s not lightweight, and the only problem I really had with config.gem was that it had to load up the Rails environment to run, which meant if you had some code that was out in the open referencing a module or class in a gem that wasn’t yet installed, it borked (AASM did this to us). This made rake gems:install useless for lots of things.

    After jumping through the bajillion hoops and kludges to get mongomapper working, RSpec started giving me some problems with bundler, so I set out last night to make something really simple, just to manage gems, Bling.

    It’s not nearly as feature-rich as Bundler. It won’t unpack your gems or cache them, and it wants to install all gems to your system. So if those features are important to you (and they probably are to a lot of people), bling isn’t for you.

    Bling just uses a yaml file (in Rails.root, for Rails projects) to keep up with gems, and lets you specify gems, their versions, their sources, and what lib to require (if not easily guessed as the gem name).

    I bet I’ll have to add features to it as I go, but it’s solved the few problems I’ve had so far.

    To get started, just do a “(sudo) gem install bling” and then in your Rails directory to a “bling init”. Once you’ve set up your bling.yml how you want, if you have any new gems, just do a “bling install”.

    Gem info is here. GitHub is here.

  • Scheduling Jobs with DelayedJob

    Posted on February 19th, 2010 terry 3 comments

    For a new project at work, I was tasked with reevaluated our async worker system. Previously we’d been using beanstalkd with a jobs model that used AASM for persistence and kept up with when something should be scheduled (and was scheduled with a rake task). After looking at the different things available, I decided DelayedJob was a better way to go, because it involved fewer dependencies (no more cron-rake work, no more beanstalkd) and had the persistence built in, instead of our custom code.

    One shortcoming, though, seemed to be scheduling jobs. After thinking about different ways of keeping up with what jobs could be scheduled, I figured it’d make the most sense for jobs to just re-schedule themselves before execution.

    That said, I didn’t want to put that in every class that had a perform() method as a boilerplate, “reschedule myself for 24 hours,” because it’s harder to maintain and grok what jobs are scheduled, and because it’s not nearly as readable.

    Enter ScheduledJob, a mixin that gives you a run_every method on your class:

    With this, you can create a class and include ScheduledJob, and then just say “run_every 24.hours” or whatever. Like this:

    Hopefully this is helpful. I’ll need to write some tests, and see if the DJ guys want to use it on GitHub.

  • Secure Passwords sans SSL

    Posted on January 6th, 2010 terry 14 comments

    I use a lot of webapps that don’t use SSL. I don’t think this is necessarily bad – SSL costs money, is slower, introduces development and IT headaches – but I do worry that I’m sending my password in cleartext.

    I set out tonight to offer a solution for that. I haven’t tried to work it into any existing plugin, but just the links and the code should suffice for someone who doesn’t want to let passwords go readable across the wire.

    I decided that I would send an HMAC using the SHA-256 encrypted password as the key, and the username and a unix style timestamp concatenated as the data. To make this work I dropped password salts. I haven’t decided if it’s safer to send the salt to the user or just to drop it. Hints in this area would be appreciated.

    Prerequisites for this are jQuery and the ridiculously-easy jQuery sha256 plugin.

    To start things out, I created a Rails app (using 2.3.4, if you’re curious) with a User model that had a login and password field. I meant for the login to be, well, the login, and the password to be a SHA-256 hexdigest of the actual user’s password. If you adapt this code to work with RestfulAuthentication, which handles the password digests nicely, you’ll have to solve the salt problem mentioned above either by dropping it or sending it to the client.

    Here’s how I set up my user:

    Then, I needed a login form. Here’s the relevant HTML:

    As you can see, as soon as a user submits the form, I hash the password and create the digest, and then blank the password field out. Obviously that last part is important, because otherwise you’re sending the user’s password as cleartext.

    This code is more of a jumping off point than a landing place, but here’s what my (trivial) action to login looks like:

    The meat and potatoes of the authentication goes in the User model, and the code looks painfully similar to its Javascript counterpart:

    Essentially this goes, “ok, if you took a hash of your password locally and then used that to hash whatever timestamp you received and your username, and I get the same result on my end, it’s really likely that you’re the right user.”

    Now, this is almost secure, but not quite. There’s still nothing preventing someone from re-posting the same data and logging in. This is just as bad as simply sending the digested password over to the server. I suggest (if you’re not using SSL), sending neither the password or the digest of the password itself over the wire.

    The benefit of using the timestamp is that you can let the client send 1-time valid digests. So, I added a new model to my Rails app, LoginRequest, consisting of a timestamp attribute, and a User has many of them.

    Here’s what the more secure version of authenticate looks like:

    Now, even if a bad guy snoops everything you sent to the server, he (or she!) can’t reuse it, because that timestamp’s been used. Changing the timestamp should significantly change the hash, and without the secret information necessary for the encryption (the password hash), it’s not feasible to guess.

    Also, brute forcing against a single timestamp isn’t an option, because even *failed* login attempts are registered as a LoginRequest. You get one try per timestamp to login.

    If you wanted to, you could even put a check in the authenticate method that required that the timestamp be from the last day or some other arbitrary time limit.

    So, if you aren’t storing important data, and for some reason can’t use OpenID, this seems to be a candidate for at least keeping passwords from going out in the open. I’m pretty sure there’s some hole in it that I haven’t seen, so if you see something, please put it in the comments so I might fix it. Thanks!

    * Note: I used OpenSSL’s HMAC functionality over ruby-hmac because (1) the author of ruby-hmac suggests as much and (2) Nathaniel Bibler showed significantly better performance from the OpenSSL implementation here: http://blog.nathanielbibler.com/post/63031273/openssl-hmac-vs-ruby-hmac-benchmarks