AWS Questions: DynamoDB

Here are some questions and answers about DynamoDB, Amazon’s managed NoSQL database offering.

  • What are options for dynamically scaling DynamoDB provisioned throughput?
    • Hard to beat the options outlined in this StackOverflow post.  You can do it via scripting, the DynamicDynamoDB open source library, a lambda function, cloudwatch–lots of different ways.
  • Do DynamoDB streams support multiple readers?
  • How does optimistic concurrency control work?
    • Nicely outlined here but the long and the short of it is you need to make sure you associate a version with your items, read that version when you prepare to update, and then update if and only if the version is the same as the one you read.
  • Do you have any insight into the internals of DynamoDB?
  • How do you connect to DynamoDB?  Is there an IP address?
    • You use the SDK or CLI which connect to an endpoint in a region that you know no further details of.
  • What is the difference between eventual and strong consistency with respect to DynamoDB reads?
  • Does DynamoDB have any automatic encrypt at rest options?

     


AWS Questions: Elastic Load Balancer

More question answered from an AWS course.

  • Does the AWS ELB have the ability to throttle requests, to stop invalid/illegal traffic – if someone refreshes a page 10 times in 5 seconds and I want to block the unnecessary requests from the refreshes?
  • What is the availability of the ELB component?
    • I couldn’t find firm numbers, but here’s an interesting article about ELB best practices.
  • In a DDOS attack, since there is a lot of traffic to your environment, do you get charged for the additional traffic?
    • Depending on the attack type, not if you are fronted by an ELB or set up your security groups/NACLs to discard the traffic.  From the DDOS whitepaper: “When [an ELB detects certain types of attacks], it will automatically scale to absorb the additional traffic but you will not incur any additional charges.”
  • When an instance is decommissioned from an ASG, does the ELB know not to send new sessions to that ASG because the instance is getting ready to shut down?

Testing Dossier Reports in Rails

One of the things I love about developing with rails is the vast array of open source, free components that you can drop in and extend your application.  Want an invoicing system?  Way to run your javascript testsSimple admin portal?  Just drop in a gem, run bundle install and you are good to go.

One of the gems I’ve used recently was dossier, which lets you write reports in SQL or active record, and then generate them in HTML or CSV (or JSON, but I didn’t use that). One tip–if you want your CSV results to have the same formatting as your HTML results, you’ll want to follow the steps on this issue.

I wrote up a couple of SQL reports, linked them into the appropriate admin pages, and called it good. Then, the app moved on, and at one point, the schema changed. (Some of you are shaking your head, knowing what is going to happen next.) Then, the reports failed.

I had forgotten the cardinal rule–write the tests first. I confess, I wasn’t sure how to, but a bit of research revealed that it wasn’t that hard. Here’s one of my spec files.

require 'spec_helper'

describe MonthlyHoursClientReport do

  # all this does is test that the SQL is valid
  it "sql valid" do
    report = MonthlyHoursClientReport.new
    sql = report.sql
    sql = replace_placeholders(sql)
    expect{ActiveRecord::Base.connection.execute(sql)}.to_not raise_error
  end

  def replace_placeholders(sql)
    sql = sql.gsub(":kitchen_id",1.to_s)
  end
end

This just gets the SQL from the dossier report and tries to execute the SQL in the test database. Super simple, but enough to catch the error I encountered. If/when I get more time, I could definitely add some more tests with some data in the test db to make sure the SQL is giving correct results, but I tend to be pretty confident in my SQL queries, especially when they don’t have any group by or having clauses.

Anyway, happy testing.



Updating Stripe bank accounts: “A bank account with that routing number and account number already exists for this customer.”

So, if you want to handle ACH transactions with Stripe, you can. Some limits to include the length of time for the transaction (5 business days on top of stripes 2 business day transfer duration) and support only for US accounts, but the API is nice and the price is pretty nice too (0.8% up to $5).

But if you are trying to do recurring billing with Stripe and ACH and you want to let your customer change their default charge source between credit card and bank accounts as a payment source (or even two different bank accounts), you’re going to run into a roadblock. While Stripe will happily accept new credit information with the exact same card number, expiration date and CVC code, and just create a new source for each entry, it is not so forgiving with bank accounts. Instead, you’ll get this error message: "A bank account with that routing number and account number already exists for this customer." if you try to change the default source to an existing bank account record in Stripe.

I found some code with this error message, but it actually isn’t complete. It’s not best to examine the error message and switch on that, but I didn’t see a specific exception class for this type of exception.

For a complete solution, you need to check the stripe tokens routing number and last 4 digits of the account number. If a user has two different bank accounts that match both in the last 4 of the account number and the routing number, well then, I think you are out of luck.

Here’s the complete ruby code, making sure to match the current request’s routing id number just in case your user wants to switch between multiple bank accounts for their default charge.


    def update_customer_from_token(customer,stripe_token)
      # takes the stripe customer object and the new token 
      # from the stripe indicating the changed payment method

      success = false
      Stripe.api_key = ENV["stripe_secret_key"]
      begin
        new_pmt_obj = customer.sources.create({:source => stripe_token})

        customer.default_source = new_pmt_obj.id
        customer.save
        success = true
      rescue Stripe::InvalidRequestError => e
        # special case where the bank account already exists, let's use that.
        if e.message == 'A bank account with that routing number and account number already exists for this customer.'
          tokobj = Stripe::Token.retrieve(stripe_token)
          customer.sources.each do | src |
            begin
              if src.object == 'bank_account' && src.routing_number == tokobj.bank_account.routing_number && src.last4 == tokobj.bank_account.last
                customer.default_source = src.id
                customer.save
                success = true
                break
              end
            rescue => e
              Rails.logger.error(STRIPE_ERROR_PREFIX+" 4 unable to update customer for "+customer.inspect+", "+e.inspect)
            end
          end
        else
          Rails.logger.error(STRIPE_ERROR_PREFIX+" 3 unable to update customer for "+customer.inspect+", "+e.inspect)
        end
      rescue Stripe::CardError => e
        Rails.logger.error(STRIPE_ERROR_PREFIX+" 1 unable to update customer for "+customer.inspect+", "+e.inspect)
      rescue => e
        Rails.logger.error(STRIPE_ERROR_PREFIX+" 2 unable to update customer for "+customer.inspect+", "+e.inspect)
      end
      success
    end

Or, you could just let the user choose from a drop down list of their existing sources which one they want to be the default. That might be a cleaner solution.


AWS Questions: Cloudwatch

Cloudwatch is Amazon’s monitoring and alerting service.

Some questions and answers re: this awesome service.

  • How can you create custom metrics?
    • Cloudwatch doesn’t limit you to the metrics it collects by default (at the hypervisor level).  You can push any metric that makes sense up the the statistics repository, using custom metrics.
  • What protocol does cloudwatch use?  ICMP or SNMP?
  • What are cloudwatch logs?
    • A way for you to push logfiles from ec2 instances up to Cloudwatch, parse them, and create metrics out of them (“how many 404 errors has this application had in the last 30 minutes?”).  More here.
  • Does cloudwatch allow you to setup different metric thresholds at different times? For example, set an alarm at 70% CPU on Wed night but 90% on Sat night?
    • No, but you could do this with custom metrics.  You could read the cloudwatch default metrics and have an ‘cpualarm’ metric which would be 1 or 0 depending on if certain parameters were set.  Then you could vary the parameters over time.  Then you could set an alarm on the ‘cpualarm’ metric.

 


Presenting on Stripe tomorrow

Excited to say that I’ll be presenting on my company’s use of Stripe at Boulder.rb tomorrow.  Working title of my talk: “Do you like Money?”.

What I’ll cover:  an overview of the service, real code, testing, operational practices, and gotchas.  This will be based on my experience at The Food Corridor, where we’ve processed over $250k of transactions through Stripe.

Hope to see you there!


AWS Questions: VPC

Amazon VPC lets you create a virtual network in the cloud that you control–subnets, ip ranges, internet access, routing, etc.  At recent classes, I was asked some questions about VPC that I dug into to find answers.

  • Does AWS VPC support multicast or broadcast?
    • No, per the FAQs.  But there are some projects to overlay multicast functionality on top of the unicast network within a VPC.
  • Are vpc flow logs add on pricing?
    • There is no additional charge, but they go into Cloudwatch Logs and you are charged at the normal rate for that usage.
  • Is the NAT Gateway (used to provide internet access to ipv4 private subnets) highly available?
    • It is redundant within an availability zone.  But, from the docs: “If you have resources in multiple Availability Zones and they share one NAT gateway, in the event that the NAT gateway’s Availability Zone is down, resources in the other Availability Zones lose Internet access. To create an Availability Zone-independent architecture, create a NAT gateway in each Availability Zone and configure your routing to ensure that resources use the NAT gateway in the same Availability Zone.”  See also the Egress-only Internet Gateway, if you are using ipv6.

Using Amazon Mechanical Turk

chess-1215079_640So, after over a decade, I finally found a use case where I had the clout and the need to use mechanical turk. I wanted to write about my experiences.

What I used it for: We were looking for some data on businesses.  We had business name, city and state, and wanted full contact information.  We paid a dime for each listing, and asked for email address and physical address.  We asked about each listing twice so that we’d have some kind of double check.

How effective was it? This varied.  If you were using the master workers, it was very effective, but slower.  If you open it up to all workers, you have to review their work more closely.  The few times I rejected someone’s task, they wrote back and asked why and tried to make it right, which was a testament to the power of the system (it records rejections).  Make sure you break the work into a couple of smaller groups so you can iterate on your instruction set (when workers asked questions on the first set, the answers went into the instructions for the second set).  We still had to review all the listings and double check any that didn’t match between both task answers, but that was a lot quicker than googling for each business and doing the research ourselves.

How much did it cost? On the order of a couple hundred bucks to process around fifteen hundred listings.

What kind of time savings did we see? Assume we had 1500 business names, and it took us 90 seconds to google the business name and find the information.  That is 1500 listings * 1.5 minutes == 37.5 hours, and this is on the low end.  Instead, it took about 2-3 hours of setup, and then 36 hours of calendar time (when I was able to do other things like sleep and work on other problems), and we were done.  Then I would say it was about 7-10 hours of review. So you are trading a couple hundred bucks for at least 20 hours of saved time.

Would I do it again? I think mturk is perfect if your problem has the following three attributes: more money than time, a task that is extremely simple, and time to review the finished product.

Other tips? You have to build it some kind of sampling for correctness. I have no idea what the quality is if you pay more than a dime per task.  Make sure you think about edge cases.  Provide tips to your workers (“check whois records as well as google”).


Letting Go

doll-1187920_640When pursuing a possible contracting opportunity, you need to be persistent, but you also need to know when to let go.

A while ago I was pursuing a possible contract (the startup is still ongoing but I was extending runway) and had been emailing with the decision maker a fair bit.  We wanted to do a meeting to get things going. I’d be taking care of some of the “behind the scenes” tasks that would allow their development to accelerate.  There seemed to be enthusiasm on both sides, but the meeting kept getting rescheduled.  Eventually, emails I sent about the meeting were not returned.

Now, everyone gets busy, and I understand that.  But if someone has a hard time returning emails when they are excited about the new work you are going to help them with, how are they going to be when you are asking them about an unpaid invoice, or for crucial guidance on a technical decision?  Perhaps they’d be responsive, but I wouldn’t bet on it.

So, I sent a note along these lines:

It seems like you aren’t really in a place to meet with me and discuss this work. No worries–I imagine you have many tasks pulling you in different directions.

While I’d love to work with you, I’ve learned clients who don’t have bandwidth are not good working arrangements for me nor for the clients–while I am self directed, there are times when I’ll need some level of feedback, if only to make sure I’m spending my time and your money correctly.

Please feel free to reach out to me if/when you have time and want to re-focus on this work.

Salient points to note:

  • no blame–we’re all busy and the ability to juggle work priorities is one reason why folks use contractors.
  • closure of this conversation frees me up to pursue other opportunities and them to focus on what they are working on (or perhaps to find another contractor, if that’s a better fit).
  • but, leave the door open, so that if there’s an opportunity to work together in the future, no bridges are burned.

It can be hard to let go of a prospective client after you’ve put significant time into learning their problems, but it’s better to let go than to engage with a client who is not committed or is committed but doesn’t have the bandwidth to help you help them.

PS yes, that is Elsa of Frozen fame.



© Moore Consulting, 2003-2017 +