Geohashes

Over the years, we’ve struggled at TaskRabbit with a way to represent smaller geographic areas. Postal codes seems to be the most obvious but these vary dramatically in size. For example, there are postal codes that are just one building and ones that are bigger than some US states. We’ve also tried “neighborhood” data from various sources, but these are somewhat unreliable.

I recently learned about and started using geohashes to break the world into boxes and have been very happy with the results. I gave a lightning talk last week at GoGaRuCo on the subject. Many of them had heard about it, but the talk went over fairly well.

Use Case

The problem at hand is one of probabilities. Given a task, who are the best taskers for the job. Or conversely, given a tasker, what are the best few tasks for them. Taskers give us an idea of where they want to work but actions often speak louder than words user-drawn geographic polygons. So what I wanted to do was map their history into positive and negative areas of probability depending on what tasks they wanted to do and which they did not.

I had all those points and it was fairly easy to draw a heatmap using this Leaflet plugin, but that needed to be mapped to something more concrete. We would be using Elasticsearch for the storage, so while researching it’s geo functions, I stumbled across the concept of geohashes which it evidently uses internally for such queries.

This article gives a really good explanation of how it works. The short version is that if you were to start with the whole world and keep cutting it in half depending on where your lat/lng was, you’d end up with a lot of left/right (or top/bottom) choices. Those choices can be encoded into binary, which can be encoded into a simple string. So this spot in London (51.507794, -0.127952) would become the “gcpvj0dyds” geohash. It has the interesting property that strings (locations) that have the same start are known to be close to it. For example, the “gcpvj09swr” geohash has the first six characters in common, so we would know that is within half a mile or so of the first one.

So now I have all of these points. I can map them to geohashes of the necessary precision (I chose 6) and give positive and negative weights for each user action. The actual calculation turns out to be really fast. I used the code from the pr_geohash gem and had no issues. It also calculates neighbors which turns out the be important.

Using the weight, I can combine and draw the geohashes (RGeo/Leaflet). As an example, here is drawing where a particular Tasker is likely to want to work:

Likely to Accept

And where this Tasker is generally positive:

Positive

And also where they have negative feedback:

Negative

Using these tools and the fact that every Tasker’s map is different, we can make better recommendations. This reduces friction in the marketplace.

One of the gotchas is mentioned in the article I referenced earlier. If a point is on a border, especially at the beginning of the decision process, it can end up being massively off. I worked around this in my implementation by spreading out the weight to it’s neighbors. So if the location got 100 points, the neighbors would also get some, degrading as it spread out.

Spread out the weight

This has proven to minimize the boundary issue.

Conclusion

Overall, I just wanted to share a new tool and concept that I learned about. We’ve found it to be a great pattern and will likely use geohashes more for representing our world.

Comments

Copyright © 2017 Brian Leonard