Posted on November 7th, 2011 No comments
Finally got around to enabling posting via email. I think that’ll end up meaning two things:
* More frequent posting
* Shorter mean post length
There’s still a few things to figure out. I just sent an email with a picture, and it looks like the default email processor drops those, and the email client I use doesn’t allow HTML emails, so I’ll probably need to find a markdown or textile parser for email posts.
Posted on November 7th, 2011 No comments
Mostly because GoDaddy tries to upsell a bunch of stuff that nobody wants, and you have to click through 7 obnoxious dialogs to renew a domain for a year. Got the idea from bmenoza.
Posted on April 24th, 2011 1 comment
Previously, I’d had some tough times with Bundler. I don’t think many of the pre-0.9 releases were stable, and had a rough enough time with it that I was in no hurry to upgrade to Rails 3.
That was a year and a half ago, and now we’re to the point that lots of libraries we want to use only maintain versions for Rails 2.3.x – all the new hotness is for 3.0.
I wanted to get us on an upgrade path for Rails 3 and then Ruby 1.9, but first I had to attempt to install Bundler again. I was scared.
And shouldn’t have been. Bundler now is genius. It gives me a place to store a canonical listing of all the gems we use for everything. I think it’s going to save a ridiculous amount of time in the future.
The only hiccup we have is, because we stage things for a few weeks on a staging server, then on a qe server, and then on a production server, and have older branches in stages closer to production, we need to maintain a Capistrano deploy script for both Bundler and old-school rake gems:install.
Here’s what I put in my deploy.rb (beneath the fold):
Posted on April 23rd, 2011 1 comment
Update: David Genord pointed out a *huge* bug in my code, where jobs will never be retried; his fix has replaced the existing gist.
Our workers (DelayedJob) leak memory. Not heinously fast, but enough that monit bounces them fairly often. I tried out perftools.rb on one of our longer running jobs, reindexing contacts. I cut the job down to indexing only 25 contacts, and profiled objects instantiated. To index 25 contacts, which took about 25 seconds, the app instantiated 3.1 million objects.
Just in case the title is misinterpreted – DJ is not leaking memory. My code is.
With a little bit of memoization and a few short-circuits, I got it down to 800 thousand. That was nice, and it sped things up by an order of magnitude, but the jobs were still leaking memory
So, I decided to steal a cool concept from the next async processor I really want to work with: Resque. I changed the worker to fork and wait for every job it performs. This means that there’s an overhead added to every worker of about 200MB, but that’s nothing compared to how bad things got if all the workers started sucking up huge chunks of memory at the same time.
One caveat: I tried to look around for someone who’d already done this and just incorporated those changes, but it turns out googling “delayed job fork” doesn’t reveal much. If there’s an existing project that’s already trying to do this for delayed job, please let me know
The code changes around this are pretty simple, just changing the run method and adding a hook for handling fork reconnection stuff (it looks like DJ has some hooks built for this, but I didn’t see them in my local copy)
Before this patch, if I ran 1 worker locally, after about half an hour my machine ran out of ram (it has 8gb). If I ran 4, my computer was unusable. After the patch, I can run 4 workers locally with no perceived performance hit. Money shot:
Posted on April 22nd, 2011 No comments
Last weekend, we were booked. I was looking for something simple to make for a for a friend, but couldn’t think of anything that was worth making for someone who was coming over.
I decided on pork burgers. In short:
I was trying to go for a crunchy/bite-y taste, and the snap pickles helped a ton.
Posted on February 23rd, 2011 1 comment
There’ve been some times where someone else would deploy and something would go wrong, and if they lost their deploy output, then it’s a lot harder to debug. Here’s a few lines that you can put in your deploy.rb to get your deploy log output written up to your server, so in case you need to find if something quietly failed at some point, you know where to look.
Posted on November 12th, 2010 1 comment
I’ve been introduced to a process known as the “5 Whys.” When something goes terribly wrong, like your Solr index getting corrupted, you ask the question, “why did it go wrong?” And then after the answer, “why was that?” (Rinse, repeat, until you’ve asked the question 5 times).
The idea being, if you ask that question five times and figure out solutions to those 5 answers, you’ll have a significantly better process than you did before.
About 2 months ago we released 1.0 of our first module, and in the last month our users have really started to use our application. As such, the last two weeks have been an (incredibly painful) learning experience.
While there are a bunch of boring ones, I think this one’s pretty interesting, and the lessons that come from it can be applied universally.
I’ll start this story with a timeline. Last Tuesday morning, around 7:45AM, a customer emails our support, saying search is broken. One of our guys looks into it, and sure enough, search is returning erratic results. My phone starts blowing up with texts and emails. I sit down, look at it, think back a week before when we turned on Solr’s autoCommit functionality, blame that, and do a non-destructive reindex of everything. After 15 minutes (our dataset isn’t big yet), the problem’s resolved.
Talking through the issue, we thought it was a reasonable hypothesis that too many DJ workers had posted big commits to Solr at once (at the time we were reindexing nightly), and that resulted in Solr running out of memory, corrupting its index. It’s worth noting that Solr’s documentation says that the index won’t be corrupted when this happens, but it was all we had to go on at the time.
Wednesday afternoon, about 3:15, it happened again. We knew how to handle it because we’d talked about what processes we should follow should we see this again, but were a little confused. We’d reduced the workers, we’d turned off full reindexing and were only doing nightly optimization calls, we reduced the commit batch size to 50 (from 300) to reduce the likelihood of some huge dataset being posted, and we’d lowered Solr’s commit threshold to keep it from blowing up.
We decided to reduce another worker and hope the black magic would help.
3:15 rolls around Thursday. It happens again.
After quickly setting the fix in motion, this piques my interest. Clearly it’s something scheduled doing it now. Looking at the crontabs for our different servers, I don’t see anything suspicious, but then I look through the Solr request logs, and see a POST to /solr/update from our office right before our phones started blowing up. Look the day before, see the same thing.
The easiest fix was to change the password for Solr up on production and not put that change into Git. It didn’t happen today, so it seems like that was the right fix. Then looking tonight, I was watching my logs while running a rake task, and saw “Error, couldn’t connect to Solr server at <production solr ip>”. Boom.
Someone had put ENV['RAILS_ENV'] ||= ‘production’ in one of the rake files, which it turns out are globally scoped. This, coupled with us putting production passwords in source control, let us corrupt the Solr index remotely. Epic fail.
There were lots of process changes that came from this, but here are a few code notes that might save you a stroke or two:
Never, ever, ever, ever store production passwords in source control
I’d previously argued against this, but now I’ve seen the error of my ways. If you have production passwords locally, and you don’t/can’t put some sort of IP security around your server, it’s just a matter of time before someone accidentally gets an environment variable screwed up and ruins *everything*.
Everything in lib/tasks is globally scoped
If you write something cool at the top of one rake file, keep in mind that it applies to all rake tasks in your app. If you define a method (even in a namespace) in a rake file, it’ll be available to all other rake tasks. If you have two identically named methods that do different things in different namespaces, only one of them is going to exist, and it’s quite possible that it’s not the one you want.
Don’t set RAILS_ENV=production by default
While this one is technically handled by the previous note, it’s worth listing explicitly. There’s no reason to assume RAILS_ENV=production. If someone wants it to be done in production mode, let him or her set the environment variable themselves. Setting it by default only means you’ll break it in every other environment where the environment variable isn’t set. Plus, if someone fails on note#1, it means there’s a good chance you ruin someone’s day.
Posted on November 12th, 2010 No comments
Slicehost charges $38/month for a 512MB slice, and then $10/month for backups.
Rackspace Cloud will do all of that for $20.
Moving this weekend, because that’s ridonkulous.
Posted on October 6th, 2010 No comments
There are about 4 hours in the day where the caffeine vs fatigue level is just right, and I get a lot done. Unfortunately, a lot of the stuff I work on requires a lot of computer effort too, which at first led me to creating multiple directories to work in.
After the environment variable thing I got working, everything was a lot more silo’d, which was great, but I still had to either explicitly type at the end of long running commands “;say ‘done’” or just check it spontaneously.
I don’t like having my headphones on all the time, so I wanted to hook Growl into this.
Also, I wanted to release a gem named caffeine, but someone already has a gem named caffeine, and since I’d already made it work locally, I didn’t care enough to re-package it and figure out a new name. And meth has all the wrong implications. Color me lazy.
Anyway, I ended up adding an at_exit block to my rake and spec executables, and put the following script in my path. It’s resulted in my computer telling me verbally and with sticky notes what’s going on. Who needs a secretary when you have this?
Posted on October 3rd, 2010 4 comments
About 4 months ago, I had a conversation with a coworker. It went something like this:
J: “I noticed we’re using integers for our IDs. Why don’t we use GUIDs?”
Me: “We don’t have multiple sources of data creation, and incrementing IDs are easier to keep up with. I’ll let you know if that turns out to be wrong though.”
About 3 weeks ago, I had this conversation:
Me: “Shit. We need GUIDs.”
We’re working on being able to replicate datasets and clone them into existing environments, and incrementing IDs allow for collisions, which would be bad news bears.
There’s a few hiccups I ran into, a few gems that needed to be patched, and a few unexpected benefits that have come from guidifying our app.
Before I get too deep into it, here’s the plugin we’re using: BMorearty/usesguid, and this is on Rails 2.3.9, so I’m not sure all the other things that will come up when we get to Rails 3.
The first step in all of this was to convert the IDs to GUIDs. I talked with JBogard if he’d done anything similar, and he said they had a script that would look at the MSSQL metadata and figure out what columns to update from that. The script wouldn’t really work for us since we use MySQL, but it’s a good approach.
Using that idea, I used ActiveRecord’s reflections and column to figure out what columns to change. The process itself was something like:
- Remove any indexes that could cause problems as you’re killing FK columns.
- Find every table with a PK ID item, and add a GUID column (can be done very fast using MySQL’s select uuid())
- Use ActiveRecord’s reflections to find tables looking at other tables, and add a GUID column for the FK and populate it with the other table’s GUID
- Delete the ID columns
- Add indexes back to your keys (don’t forget this step!)
Here’s how it looked:
I took out most of the project-specific stuff, but kept in the parent_id stuff since that’s most likely useful to someone else.
Anywhere we were writing explicit SQL, IDs had to be quoted. Unit tests caught all of these issues, thankfully.
Anywhere we were checking to see if an ID was valid, we were doing it by saying id.to_i != 0. That still works, 9/36ths of the time. If your guid starts with a non-zero integer, it’ll work, otherwise, no good. We now use a regex to tell if something is a guid.
Because some plugins (below) will have to use usesguid, I had to change the plugin load order to put usesguid first.
We had some routes that would look for an integer-only-regex match to find :id parameters, so we had to change that to a guid-finding regex.
Had to vendor the gem to quote explicit SQL relying on IDs.
Had to vendor the gem to quote explicit SQL relying on IDs.
I had to change acts_as_solr_reloaded’s PK to be a string instead of an integer and add usesguid to dynamic_attributes and locals.
Needed usesguid on audit.rb
DJ does a very simple regex check to see if something is a valid job, and if not, bails. Unfortunately, it just looks for Model:integer. This fails with guids, obviously, so I had to change the regex in performable_method to look for a GUID instead. See it here.
I really like the 36 character string GUIDs. I don’t have a good reason for it, but I modified the usesguid library to not use .to_s22, because the 22 character ones look odd. It might be just me who cares about it, but it’s worth noting that if you’re sending GUIDs to other systems, there’s a chance they’ll see the 22 character string and tell you the GUID is invalid. So, there’s that.
I recently wrote a migration that imported 131,000 new records into an existing table, and set up associations along the way and did a little bit of geocoding magic. The migration took about 45 minutes. After doing all of that hard work, I mysqldump‘d those tables and changed the migration to just load the generated data. The migration now takes <5sec.