Amazon Machine Learning Video and Book

I’m working on a video series and an ebook about Amazon Machine Learning, or AML.

AML  is a great way to get started with machine learning, since you can focus on the key concepts of building and using a model and not worry about any infrastructure.  AWS takes care of provisioning all the underlying IT infrastructure–you just worry about getting your data to S3, choosing how to build the model, and then using the model.  Which, trust me, is quite enough to tackle if you are a machine learning newbie.

You can use the model to get predictions either in real time (with a default soft limit of 200 requests per second) or via batch processing, where you can upload up to 1TB of predictions to S3.  Like everything in AWS, you can control the entire process via a well documented API or from various SDKs.

AML isn’t a fit for all machine learning needs–it processes text that is in CSV format and supports only supervised learning.  There are other options on AWS (and other places as well).

The book is currently in progress, and I’ll be starting on the video soon.If you’d like to follow along as the book gets written, you can at leanpub: Amazon Machine Learning: An Introduction.

AWS Questions: Cloudformation

So, more questions from students.  This time about Cloudformation, the very cool way to built AWS infrastructure declaratively.  I would hate to have to pick a favorite AWS service, but if I had to, Cloudformation would be it.

  • By default a stack rolls back on failure.  You can also keep any successful stack elements by setting disable rollback to true.  Can you have some elements of a stack that must have successful creation, and others that may fail without rollback?
    • Nope.  I’d break this up into two stacks and chain them.
  • Why is YAML now supported for Cloudformation templates?
  • Can you a dry run of a cloudformation template?

The Four Types of Slacks

I have been using slack for a few years now, but have really noticed an uptick in the last year or so.  (If you aren’t familiar with slack, here’s an intro to slack usage, and if you are, here’s a great code of conduct for public slacks.)
It seems to me that there are four main types of slack groups.

The first is the company/department slack.  This slack is long lived, contains many channels, and is multi purpose.  There are channels for ops, marketing, etc.  This slack is typically limited to the employees of a company, though contractors are also given access.  The main purposes of this slack are an ad-hoc knowledge base and to reduce email.  Depending on IT, this slack may be under the radar and compete with other solutions like hipchat, wikis or internal mailing lists.

The next type is a project slack. This is related to the company slack, but is less long lived, and has fewer members and channels.  It is used for coordination amongst disparate people, often a set of contractors.  May be maintained by the client or prime contractor, also serves as an ad-hoc knowledge base, but is primarily a means for coordination of effort.

Both of the above slacks may have other integrations with systems (CI/CD, monitoring, etc).  These integrations with external systems can make the slack a one stop shop for corporate knowledge and memory, especially if the members are on paid accounts.

The above types are obviously limited in membership.  The next two types of slacks are more public.

Another type of slack is the event slack.  This slack replaces or augments Twitter as a way for people at a conference to communicate.  May exist between events, but is quiescent while events are not happening.  Here channels may be related to aspects of the event or tracks, and the slack is typically owned by the event coordinator and provided as a service to the conference attendees.

Slack can also be an email list replacement.  I have been a member of several email lists for user groups/meetups in the front range, and it serms much of the activity on some of them have been driven to slack (the BDNT meetup is a good example). In addition, I see a lot of new slacks being created that would, a decade ago been email groups.  (Facebook groups are also a replacement for email groups, depending on your audience, but I have found slack to be far superior in searchability.). The number of channels is typically related to member list size and length of existence.  I have found these slacks be on the free slack plan, with its limits. I have also heard of slacks of this type charging for membership.

What has been your slack experience?  What did I miss?

AWS Questions: DynamoDB

Here are some questions and answers about DynamoDB, Amazon’s managed NoSQL database offering.

  • What are options for dynamically scaling DynamoDB provisioned throughput?
    • Hard to beat the options outlined in this StackOverflow post.  You can do it via scripting, the DynamicDynamoDB open source library, a lambda function, cloudwatch–lots of different ways.
  • Do DynamoDB streams support multiple readers?
  • How does optimistic concurrency control work?
    • Nicely outlined here but the long and the short of it is you need to make sure you associate a version with your items, read that version when you prepare to update, and then update if and only if the version is the same as the one you read.
  • Do you have any insight into the internals of DynamoDB?
  • How do you connect to DynamoDB?  Is there an IP address?
    • You use the SDK or CLI which connect to an endpoint in a region that you know no further details of.
  • What is the difference between eventual and strong consistency with respect to DynamoDB reads?
  • Does DynamoDB have any automatic encrypt at rest options?


AWS Questions: Elastic Load Balancer

More question answered from an AWS course.

  • Does the AWS ELB have the ability to throttle requests, to stop invalid/illegal traffic – if someone refreshes a page 10 times in 5 seconds and I want to block the unnecessary requests from the refreshes?
  • What is the availability of the ELB component?
    • I couldn’t find firm numbers, but here’s an interesting article about ELB best practices.
  • In a DDOS attack, since there is a lot of traffic to your environment, do you get charged for the additional traffic?
    • Depending on the attack type, not if you are fronted by an ELB or set up your security groups/NACLs to discard the traffic.  From the DDOS whitepaper: “When [an ELB detects certain types of attacks], it will automatically scale to absorb the additional traffic but you will not incur any additional charges.”
  • When an instance is decommissioned from an ASG, does the ELB know not to send new sessions to that ASG because the instance is getting ready to shut down?

Testing Dossier Reports in Rails

One of the things I love about developing with rails is the vast array of open source, free components that you can drop in and extend your application.  Want an invoicing system?  Way to run your javascript testsSimple admin portal?  Just drop in a gem, run bundle install and you are good to go.

One of the gems I’ve used recently was dossier, which lets you write reports in SQL or active record, and then generate them in HTML or CSV (or JSON, but I didn’t use that). One tip–if you want your CSV results to have the same formatting as your HTML results, you’ll want to follow the steps on this issue.

I wrote up a couple of SQL reports, linked them into the appropriate admin pages, and called it good. Then, the app moved on, and at one point, the schema changed. (Some of you are shaking your head, knowing what is going to happen next.) Then, the reports failed.

I had forgotten the cardinal rule–write the tests first. I confess, I wasn’t sure how to, but a bit of research revealed that it wasn’t that hard. Here’s one of my spec files.

require 'spec_helper'

describe MonthlyHoursClientReport do

  # all this does is test that the SQL is valid
  it "sql valid" do
    report =
    sql = report.sql
    sql = replace_placeholders(sql)
    expect{ActiveRecord::Base.connection.execute(sql)}.to_not raise_error

  def replace_placeholders(sql)
    sql = sql.gsub(":kitchen_id",1.to_s)

This just gets the SQL from the dossier report and tries to execute the SQL in the test database. Super simple, but enough to catch the error I encountered. If/when I get more time, I could definitely add some more tests with some data in the test db to make sure the SQL is giving correct results, but I tend to be pretty confident in my SQL queries, especially when they don’t have any group by or having clauses.

Anyway, happy testing.

Updating Stripe bank accounts: “A bank account with that routing number and account number already exists for this customer.”

So, if you want to handle ACH transactions with Stripe, you can. Some limits to include the length of time for the transaction (5 business days on top of stripes 2 business day transfer duration) and support only for US accounts, but the API is nice and the price is pretty nice too (0.8% up to $5).

But if you are trying to do recurring billing with Stripe and ACH and you want to let your customer change their default charge source between credit card and bank accounts as a payment source (or even two different bank accounts), you’re going to run into a roadblock. While Stripe will happily accept new credit information with the exact same card number, expiration date and CVC code, and just create a new source for each entry, it is not so forgiving with bank accounts. Instead, you’ll get this error message: "A bank account with that routing number and account number already exists for this customer." if you try to change the default source to an existing bank account record in Stripe.

I found some code with this error message, but it actually isn’t complete. It’s not best to examine the error message and switch on that, but I didn’t see a specific exception class for this type of exception.

For a complete solution, you need to check the stripe tokens routing number and last 4 digits of the account number. If a user has two different bank accounts that match both in the last 4 of the account number and the routing number, well then, I think you are out of luck.

Here’s the complete ruby code, making sure to match the current request’s routing id number just in case your user wants to switch between multiple bank accounts for their default charge.

    def update_customer_from_token(customer,stripe_token)
      # takes the stripe customer object and the new token 
      # from the stripe indicating the changed payment method

      success = false
      Stripe.api_key = ENV["stripe_secret_key"]
        new_pmt_obj = customer.sources.create({:source => stripe_token})

        customer.default_source =
        success = true
      rescue Stripe::InvalidRequestError => e
        # special case where the bank account already exists, let's use that.
        if e.message == 'A bank account with that routing number and account number already exists for this customer.'
          tokobj = Stripe::Token.retrieve(stripe_token)
          customer.sources.each do | src |
              if src.object == 'bank_account' && src.routing_number == tokobj.bank_account.routing_number && src.last4 == tokobj.bank_account.last
                customer.default_source =
                success = true
            rescue => e
              Rails.logger.error(STRIPE_ERROR_PREFIX+" 4 unable to update customer for "+customer.inspect+", "+e.inspect)
          Rails.logger.error(STRIPE_ERROR_PREFIX+" 3 unable to update customer for "+customer.inspect+", "+e.inspect)
      rescue Stripe::CardError => e
        Rails.logger.error(STRIPE_ERROR_PREFIX+" 1 unable to update customer for "+customer.inspect+", "+e.inspect)
      rescue => e
        Rails.logger.error(STRIPE_ERROR_PREFIX+" 2 unable to update customer for "+customer.inspect+", "+e.inspect)

Or, you could just let the user choose from a drop down list of their existing sources which one they want to be the default. That might be a cleaner solution.

AWS Questions: Cloudwatch

Cloudwatch is Amazon’s monitoring and alerting service.

Some questions and answers re: this awesome service.

  • How can you create custom metrics?
    • Cloudwatch doesn’t limit you to the metrics it collects by default (at the hypervisor level).  You can push any metric that makes sense up the the statistics repository, using custom metrics.
  • What protocol does cloudwatch use?  ICMP or SNMP?
  • What are cloudwatch logs?
    • A way for you to push logfiles from ec2 instances up to Cloudwatch, parse them, and create metrics out of them (“how many 404 errors has this application had in the last 30 minutes?”).  More here.
  • Does cloudwatch allow you to setup different metric thresholds at different times? For example, set an alarm at 70% CPU on Wed night but 90% on Sat night?
    • No, but you could do this with custom metrics.  You could read the cloudwatch default metrics and have an ‘cpualarm’ metric which would be 1 or 0 depending on if certain parameters were set.  Then you could vary the parameters over time.  Then you could set an alarm on the ‘cpualarm’ metric.


Presenting on Stripe tomorrow

Excited to say that I’ll be presenting on my company’s use of Stripe at Boulder.rb tomorrow.  Working title of my talk: “Do you like Money?”.

What I’ll cover:  an overview of the service, real code, testing, operational practices, and gotchas.  This will be based on my experience at The Food Corridor, where we’ve processed over $250k of transactions through Stripe.

Hope to see you there!

© Moore Consulting, 2003-2017 +