Skip to content

Using Amazon Mechanical Turk

chess-1215079_640So, after over a decade, I finally found a use case where I had the clout and the need to use mechanical turk. I wanted to write about my experiences.

What I used it for: We were looking for some data on businesses.  We had business name, city and state, and wanted full contact information.  We paid a dime for each listing, and asked for email address and physical address.  We asked about each listing twice so that we’d have some kind of double check.

How effective was it? This varied.  If you were using the master workers, it was very effective, but slower.  If you open it up to all workers, you have to review their work more closely.  The few times I rejected someone’s task, they wrote back and asked why and tried to make it right, which was a testament to the power of the system (it records rejections).  Make sure you break the work into a couple of smaller groups so you can iterate on your instruction set (when workers asked questions on the first set, the answers went into the instructions for the second set).  We still had to review all the listings and double check any that didn’t match between both task answers, but that was a lot quicker than googling for each business and doing the research ourselves.

How much did it cost? On the order of a couple hundred bucks to process around fifteen hundred listings.

What kind of time savings did we see? Assume we had 1500 business names, and it took us 90 seconds to google the business name and find the information.  That is 1500 listings * 1.5 minutes == 37.5 hours, and this is on the low end.  Instead, it took about 2-3 hours of setup, and then 36 hours of calendar time (when I was able to do other things like sleep and work on other problems), and we were done.  Then I would say it was about 7-10 hours of review. So you are trading a couple hundred bucks for at least 20 hours of saved time.

Would I do it again? I think mturk is perfect if your problem has the following three attributes: more money than time, a task that is extremely simple, and time to review the finished product.

Other tips? You have to build it some kind of sampling for correctness. I have no idea what the quality is if you pay more than a dime per task.  Make sure you think about edge cases.  Provide tips to your workers (“check whois records as well as google”).

Bare minimum of ops tasks for heroku

Awesome, you are a CTO or founding engineer of a newborn startup.  You have an web app up on Heroku and someone is paying you money for it!  Nice job.

Now, you need to think about supporting it.  Heroku makes things way easier (no racking and stacking, no purchasing hardware, no configuring apache) but you still to set up some operations.

Here is the bare minimum you need to do to make sure you can sleep at night.  (Based on a couple of years of heroku projects, and being really really cheap.)

  • Have a staging environment
    • You don’t want to push code direct to prod, do you?
    • This can be a free dyno, depending on the complexity of your app.
    • Pipelines are nice, as is preboot.
    • Cost: free
  • Have a one line deploy.
    • Or, if you like CD/CI, an automatic deploy or a one click deploy.  But make it really easy to deploy.
    • Have a deploy script that goes straight to production for emergencies.
    • Cost: free
  •  Backups
    • User data.  If you aren’t using a shared object store like S3, make sure you are doing a backup.
    • Database.  Both heroku postgresql and amazon RDS have point and click solutions.  All you have to do is set them up.  (Test them, at least once.)
    • Cost: freeish, depending on the solution.  But, user data is worth spending money on.
  • Alerting
    • Heroku has options if you are running professional dynos.
    • Uptimerobot is a great free third party service that will check ports every 5 minutes and has a variety of alert options.  If you want SMS, you have to pay for it, but it’s not outrageous.
    • Cost: free
  • Logging
    • Use a logging framework (like slf4j or the rails logger, and mark error conditions with a string that will be easy to search for.
    • Yes, you can use heroku logs but having a log management solution like papertrail will make you much happier.  Plus, it’s free for 2 days of logfiles.
    • Set up alerts with papertrail as well.  These can be more granular.
    • Cost: free
  • Create a list of third party dependencies.
    • Sign up for status alerts from these.  If you have pro slack, you can have them push an email to a channel.  If you don’t, create an alias that receives them.  You want to be the person that tells your clients about outages, not the other way around.
    • Cost: free
  • Communication
    • Internal
      • a devops_alert slack channel is my preferred solutions.  All deploys and other alerts go there.
    • External
      • create a mailing list for your clients so you can inform them of issues easily.  Google groups is fine, but use whatever other folks are using.  Don’t use an alias in your email–you’ll forget to add new clients.
      • do not use this mailing list for marketing purposes, unless you want to offload the burden of keeping the list up to date to the marketing department.
      • do make sure when you gain or lose clients you keep this up to date
    • Run through a disaster in your mind and make notes on how you would communicate the issue, both internally and externally.  How often do you update your team?  How often do you update your clients?  What about an internal issue (some of your code screwed up) vs an external issue.  This doesn’t need to be exhaustive, but thinking about it ahead of time and making some notes will help you in the crisis.
    • Cost: free

All of this is probably a four hour project, max.

But once this is done, you’ll rest easier at night, knowing you have what you need to troubleshoot and recover from production issues.

Extending an existing Rails application that wasn’t meant to be extended

I am modifying an existing open source rails 4.2 app and wanted to keep my changes (some of which are to models, some to controllers, some to views) as separate as I can, so that when a new release of the app comes out, I won’t be in (too much) merge hell.

This app was not designed to be extended, which makes things more interesting.

For the views, I’m just doing partials with a prefix (_xxx_user_cta.haml).

For the models and controllers, I started out hacking the code directly, but after some digging around I discovered how to monkey patch (I believe that is what it is called) the classes.

In the config/application.rb file, I added the following code:

config.to_prepare do
Dir.glob(Rails.root + "app/decorators/**/*_decorator*.rb").each do |c|
require_dependency(c)
end
end

And then, if I want to make a change to app/models/person.rb, I add the file app/decorators/models/person_decorator.rb. In that file is something like this:

Person.class_eval do
# ... changes
end

This lets me add additional relations, helper methods, and other classes to extend existing functionality. I try to prefix things with a unique identifier (xxx_set_timezone rather than set_timezone) to lessen the chances of a collision, because if a method is added to the Person class with the same name as a method in the decorator, the decorator will win.

Write tests around this new functionality so that if anything changes, I’m aware and can more easily troubleshoot.

The downsides of this approach is that it is harder to track logic, because instead of everything in one file, it is now in two. (I don’t know if there are memory or performance implications.) However, that is a tradeoff I’m willing to make to make it easier to keep up with the upstream development and to pull said development in as often as possible.

I’m still fairly new to rails and didn’t know if this is the only or best way, but thought I’d share.

Joining The Food Corridor

After I left 8z, I contracted for about a year and a half. It was great fun, moving between projects, meeting a lot of new developers, and learned a lot of new things. I worked on evaluating software products and processes, supporting machine learning systems, large workflow engines, and, most recently, backend systems to stop distracted driving (they’re hiring, btw).

But I saw an email from Angellist early this year about a company looking to build a marketplace for kitchen space. (Aside: if you are interested in the labor market for startup professions, Angellist emails are great–they not only give you the company name and job description, but also typically include equity and salary–very useful information.) I replied, the conversation started, and I did some research on the company. They were pre-revenue, but the founder had been grinding it out for months and had an extensive background in the industry. Was clear they weren’t a fly by night, “we just had an idea for an app and need someone to build it” operation.

After discussions, interviews and reference checks, it became clear that this was a fantastic opportunity to join an early stage startup as a technical co-founder. So, I’m thrilled to announce that I have joined The Food Corridor as CTO/Co-Founder.

Why does this opportunity excite me so?

  • By increasing visibility and availability of shared kitchen space, it can grow the local food system, especially value add producers, across the country
  • There’s a real need for some innovative software and process solutions that we can solve
  • The founding team has the diverse set of skills needed to run a great company
  • I am looking forward to learning about the business side of a software company
  • It’s right at the intersection of two of my passions–food and technology

If you are interested in following along with the TFC journey, there’s a monthly newsletter that will be focused on shared kitchen topics.

As far as the blog, I expect to be heads down and building product, but will occasionally pop up and post.

Here’s to new adventures!