It happens. This column has to go. But in a Rails app, there are a few problems. You can’t just drop it.
My first expectation is that we would create a migration and do something like this.
123456
class RemoveMiddleNameFromUsers < ::ActiveRecord::Migration
def change
remove_column :users, :middle_name
remove_column :users, :gender
end
end
This will generally be fine but there are a few practical issues.
Is any of my code still using this column?
What happens to the production code in the time between the migration and new code using it?
Still in use?
The obvious thing to do is search the code for use of this column name, but sometimes that can be tricky. The name would be fairly common leading to difficult searching. You could also be accessing it in a fairly inconsistent way like via send and string interpolation or something.
The best way that we’ve found out that we are still using something is to freak out in test/development/staging if it is accessed. If we didn’t have the test coverage, we’d probably log it’s usage in production and check the logs for use.
Summary: I extracted and improved on the Rails stats tool. The code is here.
Process
The other day, I was curious about how our codebase had evolved over the years and knew about the rake stats command in Rails. So I checked out a git sha from 2010 and ran the command. Of course the gems weren’t installed. So I ran bundle install. It said it couldn’t find gemcutter.org. That’s when I knew it wasn’t going to work out. Even if I switched it to use current standards, at least one of those gems was going to be completely gone or a million of other issues.
I figured that an external tool to do the same thing should be easy and I was right. I copied the 3 files related to that rake job to a new project. The code just took directories in so the only dependence on being actually in the Rails project was a Rails.root call at the beginning. I switched it to be passed in from the command line and I was done!
… except things were missing. The tool that comes with Rails has a certain set of directories and it only looks for code there. It included the normal Rails stuff like models and controllers but was missing anything that strayed from the prescribed structure. These pieces were missing in the default tool:
Anything added to the app directory (jobs, observers)
It turned into a bit of time sink, but was a fun project. By modifying it to use more introspection of the directory structure, I was able to cover all the pieces of the app.
At TaskRabbit, we like MySQL. As with everything, it has its own set of issues, though. One of these issues is that it locks the table while a column is added. This is not the biggest deal in the early days of a table, but once you start getting millions of rows and consistent traffic around the clock, this prevents the site from working as expected.
The first way that we worked around this problem was with the pt-online-schema-change tool. It does the following:
Makes a a new table with the old table’s schema
Copies data from the old table to the new table
Sets up a trigger for data to keep syncing
Adds the column to the new table
Renames the old table to something else and renames the new table to replace it
Deletes the old table
This worked quite well, but had a flaw within our process. It was outside of the development and deployment workflow. In order to develop, we would still make a Rails migration. Then just before deploy, we would run the tool and add the row to the schema_migrations table. When we deploy, it runs migrations, but in this case it would not run because we had added that row. I don’t think we ever messed it up, but the process had a few gaps.
We now use LHM also known as Large Hadron Migrator from the good folks at soundcloud. It does basically the same thing, but allows you to do it right there in the migration.
12345678910
require'lhm'classAddMiddleNameToUsers<ActiveRecord::MigrationdefchangeLhm.change_table:usersdo|m|# same as: add_column :users, :middle_namem.add_column:middle_name,"VARCHAR(191) DEFAULT NULL"endendend
So this is great. Now it’s in the process.
Schema
One design choice (and it seems like the right one) is that it makes the last step manual. That is, it leaves the old (renamed) table around. We go back and delete these from production when sure everything worked out. The above migration might leave around a table named something like this: lhma_2014_10_28_20_41_56_933_users.
So that’s fine, but you’ll notice that when you run rake db:migrate that your schema.rb file now has that table in it. This happens because ActiveRecord takes a snapshot of your database right after the migration. We try to take really good care of our schema file, so this made us sad.
I went poking around in the ActiveRecord code ready to monkey patch the adapter that reads the table list or the code that generates the schema file. When I got there, though, I found there was already a class setting that did what I wanted. It even took a regex!
So here’s what to add if you don’t want those tables showing up in your schema file:
Over the years, we’ve struggled at TaskRabbit with a way to represent smaller geographic areas. Postal codes seems to be the most obvious but these vary dramatically in size. For example, there are postal codes that are just one building and ones that are bigger than some US states. We’ve also tried “neighborhood” data from various sources, but these are somewhat unreliable.
I recently learned about and started using geohashes to break the world into boxes and have been very happy with the results. I gave a lightning talk last week at GoGaRuCo on the subject. Many of them had heard about it, but the talk went over fairly well.
Use Case
The problem at hand is one of probabilities. Given a task, who are the best taskers for the job. Or conversely, given a tasker, what are the best few tasks for them. Taskers give us an idea of where they want to work but actions often speak louder than words user-drawn geographic polygons. So what I wanted to do was map their history into positive and negative areas of probability depending on what tasks they wanted to do and which they did not.
I had all those points and it was fairly easy to draw a heatmap using this Leaflet plugin, but that needed to be mapped to something more concrete. We would be using Elasticsearch for the storage, so while researching it’s geo functions, I stumbled across the concept of geohashes which it evidently uses internally for such queries.
This article gives a really good explanation of how it works. The short version is that if you were to start with the whole world and keep cutting it in half depending on where your lat/lng was, you’d end up with a lot of left/right (or top/bottom) choices. Those choices can be encoded into binary, which can be encoded into a simple string. So this spot in London (51.507794, -0.127952) would become the “gcpvj0dyds” geohash. It has the interesting property that strings (locations) that have the same start are known to be close to it. For example, the “gcpvj09swr” geohash has the first six characters in common, so we would know that is within half a mile or so of the first one.
So now I have all of these points. I can map them to geohashes of the necessary precision (I chose 6) and give positive and negative weights for each user action. The actual calculation turns out to be really fast. I used the code from the pr_geohash gem and had no issues. It also calculates neighbors which turns out the be important.
Using the weight, I can combine and draw the geohashes (RGeo/Leaflet). As an example, here is drawing where a particular Tasker is likely to want to work:
So I’m at least two years behind blogging about various projects, but I have the list right here. I feel like being productive tonight but am not quite ready to start the next big feature so I thought I’d mention one on that list.
For a variety of reasons, I was trying to get everyone to migrate from Yammer to Hipchat a few years back. They aren’t exactly direct competitors, but I liked the more direct Hipchat model and didn’t want to be running two distracting things all the time. The main holdout was that Yammer tended to be used for communicating to the whole company as it’s model is better suited to that.
I quickly made yam2hip to sync them over to my new client of choice. It’s made to deploy to Heroku. Every 90 seconds it will look for new Yammer messages and post them in a Hipchat room. I made a Hipchat room called “Yammer” to be the target room.
If people would have continued to use both, I probably would have made it bi-directional; however, we are now quite happily all on Hipchat (boom).
In my recent project, Cakewalk, we give the user a fairly complete experience without having to sign up. This seems like the nice thing to do but it’s also in Apple’s guidelines for app approval. At some point, though, we need to create an account for them. In our case, it’s when they want to add an idea to their favorites or save their filter preferences. A “gate” such as this is something I’ve done on the web many times but it’s the first time I’ve done it an iPhone app. I’ll describe how I solved it using blocks.
Web
On the web, I would usually send them to a URL. If that URL requires a logged in user, then it would redirect to the login/signup URL with a query param. For example, if the page was http://www.example.com/favorite, it would redirect to something like http://www.example.com/login?after_auth=%2Ffavorite. The login/signup form would add this as a hidden parameter that it POSTs. If the action is successful, it will use this url instead of the default one to send the user to next place. In Rails, it would look something like this:
1234567891011
defloginuser=User.find_by_email(params[:email])ifuser.authenticate(params[:password])# right!log_in!(user)redirect_toparams[:after_auth]||dashboard_pathelse# not right!render:formendend
If the site allows the users to switch back and forth between login and signup pages or allows Facebook, Twitter or some other OAuth, it can be easy to lose this parameter. I’ve seen sites put the value in a cookie for this reason. It can simplify it, but is more likely to have unintended site effects.
Blocks
Most of the complexity of the web version is because of the stateless nature of the medium. The query parameter or cookie is an attempt to add that state globally. This is not particularly an issue in an iPhone app. The global state is fairly well known in the view controllers and there is shared memory. The interesting thing that I realized is that it’s not just data that we can store, but also blocks.
An Objective C block is an inline declaration of a function. It is similar to features available in Ruby and Javascript. Here we use one to change how a sort is done:
It’s nice to be useful. I was glad to be able to help out my wife and her partner make a new iPhone app called Cakewalk.
I really like the counter-culture situation of it. Technology has become a way to distract the kids or something you can hand them when you are out of other things to do. Krista and Marijane had the idea to use technology to go the other way by giving parents an app that was for them, not the kids. And the goal of the app is to stop using the app and start real playtime with the kids.
Put simply, Cakewalk comes in handy when the kids really want to play with you, but you’re having a hard time coming up with something to do. It is full of quick ideas for simple fun. And every activity is something you can do right now—no research, no prep, and no trips to the store.
So get it now on the App Store here. See below for the tech notes on how it was built.
Last year at TaskRabbit, we decided to go headlong into this Service Oriented Architecture thing. We ended up with several Rails and other Ruby apps that loosely depended on each other to work. While this was mostly good from a separation of concerns and organizational perspective, we found effective automated testing to be quite difficult.
Specifically, we have one core platform application whose API is the most used. We also allow other apps to have read-only access to its database. Several of the apps are more or less a client to that server. So while the satellite apps have their own test suite, the integration tests can only be correct if they are exercising the core API.
To handle this use case, we created a gem called Offshore. A normal test suite has factories and/or fixture data and it uses database transactions to reset the data for every test. Offshore brings this the SOA world by providing the ability to use another application’s factories from your test suite as well as handling the rollback between tests. Through the actual exercising of the other application as well as a simplified usage pattern, we found that we trusted our integration tests much more that alternative approaches.
System
What we have are several apps based on use case, all using a core platform. There are web apps for each of the participants in our marketplace: workers, consumers, and businesses. We also have iPhone and Android apps along those same lines.
The apps communicate in a few ways. All of them use the platform’s API synchronously. The web apps use Resque Bus to subscribe and publish asynchronously. We also allow the web apps to read the platform database, but not write.
Use Case
The APIs and data involved are basically about tasks and users in the system and their various state transitions. For example, our business app signs up a user and post a tasks using the API. The user will then be on the their dashboard seeing a list of tasks in various states, conversations with the workers, payment details, etc.
I really like Jekyll/Octopress and have aspirations to make my presentations in HTML instead of Powerpoint. For my last presentation to the Redis Meetup, I combined the two. I found it really easy to “blog” the talk and created a plugin to convert that to a reveal.jspresentation from the same source. It’s my first Jekyll plugin, but it got the job done. Check it out here.
Usage
Make a new blog post and add slides: simple to the metadata at the top. The value simple refers to the reveal.js theme to use. The options are: beige, blood, moon, night, serif, simple, sky, and solarized.
---
layout: post
published: true
title: "Resque Bus Presentation"
date: 2014-02-16 08:48
comments: true
author: BL
slides: simple
categories: architecture resque redis bus
---
You then write your normal blog post with a few extra elements mixed in.
For example, you start a slide with the {% slide %} notation. You can also say {% notes %} at the “bottom” of a slide. This will not appear in the slideshow, but will appear in the blog post.
You say {% endslide %} at the end of the presentation. You could do this each time, but it’s inferred by the beginning of a new slide using {% slide %}, so the end notation is only needed at the end.
That’s about it. Here ‘s the whole presentation if it was only a few of the slides:
{% slide %}
## Resque Bus
### Brian Leonard
##### TaskRabbit
##### 02/17/2014
{% notes %}
Hi, my name is Brian Leonard and I'm the Chief Architect and Technical Cofounder of TaskRabbit. TaskRabbit is a marketplace where neighbors help neighbors get things done.
At TaskRabbit we are using Redis and Resque to do all of our background processing. We've also gone one step further to use these tools to create an asynchronous message bus system that we call Resque Bus.
{% slide %}
### Redis
* Key/Value Store
* Atomic Operations
![Redis Logo](/images/posts/resque-bus-presentation/redis-logo.jpeg)
{% notes %}
I don't have to tell you guys about Redis, but for our purposes the main point is that we can store stuff in it and the operations are atomic.
{% slide %}
## Thanks!
Questions?
{% endslide %}