Skip to content

All posts by moore - 30. page

AWS Questions: Cloudwatch

Cloudwatch is Amazon’s monitoring and alerting service.

Some questions and answers re: this awesome service.

  • How can you create custom metrics?
    • Cloudwatch doesn’t limit you to the metrics it collects by default (at the hypervisor level).  You can push any metric that makes sense up the the statistics repository, using custom metrics.
  • What protocol does cloudwatch use?  ICMP or SNMP?
  • What are cloudwatch logs?
    • A way for you to push logfiles from ec2 instances up to Cloudwatch, parse them, and create metrics out of them (“how many 404 errors has this application had in the last 30 minutes?”).  More here.
  • Does cloudwatch allow you to setup different metric thresholds at different times? For example, set an alarm at 70% CPU on Wed night but 90% on Sat night?
    • No, but you could do this with custom metrics.  You could read the cloudwatch default metrics and have an ‘cpualarm’ metric which would be 1 or 0 depending on if certain parameters were set.  Then you could vary the parameters over time.  Then you could set an alarm on the ‘cpualarm’ metric.

 

Presenting on Stripe tomorrow

Excited to say that I’ll be presenting on my company’s use of Stripe at Boulder.rb tomorrow.  Working title of my talk: “Do you like Money?”.

What I’ll cover:  an overview of the service, real code, testing, operational practices, and gotchas.  This will be based on my experience at The Food Corridor, where we’ve processed over $250k of transactions through Stripe.

Hope to see you there!

AWS Questions: VPC

Amazon VPC lets you create a virtual network in the cloud that you control–subnets, ip ranges, internet access, routing, etc.  At recent classes, I was asked some questions about VPC that I dug into to find answers.

  • Does AWS VPC support multicast or broadcast?
    • No, per the FAQs.  But there are some projects to overlay multicast functionality on top of the unicast network within a VPC.
  • Are vpc flow logs add on pricing?
    • There is no additional charge, but they go into Cloudwatch Logs and you are charged at the normal rate for that usage.
  • Is the NAT Gateway (used to provide internet access to ipv4 private subnets) highly available?
    • It is redundant within an availability zone.  But, from the docs: “If you have resources in multiple Availability Zones and they share one NAT gateway, in the event that the NAT gateway’s Availability Zone is down, resources in the other Availability Zones lose Internet access. To create an Availability Zone-independent architecture, create a NAT gateway in each Availability Zone and configure your routing to ensure that resources use the NAT gateway in the same Availability Zone.”  See also the Egress-only Internet Gateway, if you are using ipv6.

Using Amazon Mechanical Turk

chess-1215079_640So, after over a decade, I finally found a use case where I had the clout and the need to use mechanical turk. I wanted to write about my experiences.

What I used it for: We were looking for some data on businesses.  We had business name, city and state, and wanted full contact information.  We paid a dime for each listing, and asked for email address and physical address.  We asked about each listing twice so that we’d have some kind of double check.

How effective was it? This varied.  If you were using the master workers, it was very effective, but slower.  If you open it up to all workers, you have to review their work more closely.  The few times I rejected someone’s task, they wrote back and asked why and tried to make it right, which was a testament to the power of the system (it records rejections).  Make sure you break the work into a couple of smaller groups so you can iterate on your instruction set (when workers asked questions on the first set, the answers went into the instructions for the second set).  We still had to review all the listings and double check any that didn’t match between both task answers, but that was a lot quicker than googling for each business and doing the research ourselves.

How much did it cost? On the order of a couple hundred bucks to process around fifteen hundred listings.

What kind of time savings did we see? Assume we had 1500 business names, and it took us 90 seconds to google the business name and find the information.  That is 1500 listings * 1.5 minutes == 37.5 hours, and this is on the low end.  Instead, it took about 2-3 hours of setup, and then 36 hours of calendar time (when I was able to do other things like sleep and work on other problems), and we were done.  Then I would say it was about 7-10 hours of review. So you are trading a couple hundred bucks for at least 20 hours of saved time.

Would I do it again? I think mturk is perfect if your problem has the following three attributes: more money than time, a task that is extremely simple, and time to review the finished product.

Other tips? You have to build it some kind of sampling for correctness. I have no idea what the quality is if you pay more than a dime per task.  Make sure you think about edge cases.  Provide tips to your workers (“check whois records as well as google”).

Letting Go

doll-1187920_640When pursuing a possible contracting opportunity, you need to be persistent, but you also need to know when to let go.

A while ago I was pursuing a possible contract (the startup is still ongoing but I was extending runway) and had been emailing with the decision maker a fair bit.  We wanted to do a meeting to get things going. I’d be taking care of some of the “behind the scenes” tasks that would allow their development to accelerate.  There seemed to be enthusiasm on both sides, but the meeting kept getting rescheduled.  Eventually, emails I sent about the meeting were not returned.

Now, everyone gets busy, and I understand that.  But if someone has a hard time returning emails when they are excited about the new work you are going to help them with, how are they going to be when you are asking them about an unpaid invoice, or for crucial guidance on a technical decision?  Perhaps they’d be responsive, but I wouldn’t bet on it.

So, I sent a note along these lines:

It seems like you aren’t really in a place to meet with me and discuss this work. No worries–I imagine you have many tasks pulling you in different directions.

While I’d love to work with you, I’ve learned clients who don’t have bandwidth are not good working arrangements for me nor for the clients–while I am self directed, there are times when I’ll need some level of feedback, if only to make sure I’m spending my time and your money correctly.

Please feel free to reach out to me if/when you have time and want to re-focus on this work.

Salient points to note:

  • no blame–we’re all busy and the ability to juggle work priorities is one reason why folks use contractors.
  • closure of this conversation frees me up to pursue other opportunities and them to focus on what they are working on (or perhaps to find another contractor, if that’s a better fit).
  • but, leave the door open, so that if there’s an opportunity to work together in the future, no bridges are burned.

It can be hard to let go of a prospective client after you’ve put significant time into learning their problems, but it’s better to let go than to engage with a client who is not committed or is committed but doesn’t have the bandwidth to help you help them.

PS yes, that is Elsa of Frozen fame.

AWS Questions: Kinesis and IAM

  • What happens if you push AWS Kinesis (a high volume managed streaming solution from AWS) past the provisioned shard limits (as specified here)?
    • You start getting exceptions if you are trying to write to or read from the stream.  You can back off or you can increase the number of shards, which increases your throughput.
  • Any planned support for .NET with the Kinesis libraries (Kinesis Producer Library, Kinesis Client Library) which have some nice features?
    • I’m not aware of any future plans.  However both are available on github (KPL, KCL) and are open source(ish) under the Amazon Software License.  I say “ish” because of some concerns about section 3.3, limits of use.  So you could port the code to .NET.  In addition, there is support for running the KCL with other languages (Ruby, .NET, etc) but you still need to run a Java daemon.
  • Can someone create an IAM group with more permissions than the group they are in?
    • Yes, if the IAM system is misconfigured.  If a user is in group A which has the attach group policy permission, and has no other limits, they can attach an arbitrary policy to group B.  As per of the AWS shared responsibility model, you are responsible for your IAM setup.

AWS Questions: ASGs and Amazon Inspector

More questions from AWS course students.

  • EC2 instances in auto scaling groups have a warmup period that you can specify (so that the EC2 instance can be fully ready to take traffic directed to it).  I retold a story from another consultant about the warmup period for an ASG increasing over time (due to increasing numbers of security patches against the base AMI) and one student asked: “Can you set an alarm on instances overrunning the the warmup period?”
    • Since you can create custom metrics in cloudwatch and create alarms on those, you can definitely capture the warmup period.  All you’d need to do is, as the last step before an EC2 instance was fully configured, subtract the current time from the launch time (obtained via the API).  Store that number as your ‘warmup’ metric and set an alert if it ever gets close to your ASG health check value, and you’ll avoid ASG thrashing.
    • Update 4/6/2017: Another instructor pointed out a flaw in the above statements.  Upon further research, warmup time settings  only apply if you are using step scaling, and cooldown periods only apply if you are using simple scaling.  They are both trying to solve similar problems–making sure that you don’t start up or shut down instances before the instances have a chance to affect the situation that triggered the Auto Scaling Group action.  More on policy types.
  • “Can the minimum and maximum number of instances of an ASG be changed after initial configuration?”
  • “Can you point Amazon inspector at non aws resources?  In your own data center, for example?”
    • Amazon is a security tool that looks for vulnerabilities in your EC2 instances.  It requires installing an agent on the instances that it will be monitoring, and thus doesn’t work outside of AWS.

AWS Questions: Cloudfront and SQS

I have recently started a contract teaching AWS courses. (None of the following posts speak for my client.) AWS stands for Amazon Web Services.

During every course I teach I get questions that are not directly covered in the course material that I don’t know. I’m going to try to capture some of the questions asked by my students and post the answers.

  • Does SQS have transactional messages akin to JMS?
    • No.  JMS has the idea of transactions over messages, so you can be sure that all or none of the messages were processed.  SQS has no such construct–each message is independent.  If I were going to have multiple units of work done, I’d use one message, perhaps pointing to different datastores if the message was too big for SQS.
  • Can you push content to the AWS CDN, Cloudfront, ahead of use requests?
    • No, the content always has to be pulled by a requester.  You can of course configure a crawler to pull the data from the origins through Cloudfront (which will then store it).
  • Can you configure Cloudfront to pull from origins over SSL/TLS?

Restoring a single table from an Amazon RDS backup

material-icon-1307676_640When you use SQL, how do you write delete statements at the database prompt?

A delete statement typically looks like this: delete from table_name where column_name = 'foo';. I usually write it in this order:

  1. delete
  2. delete where column_name = 'foo';
  3. delete from table_name where column_name = 'foo';

Even though this is a pain because you have to move back and forth (I really need to look into vi keybindings for mysql), it prevents you from making sending this command by accident: delete from table_name; which deletes all the data in your table.  (Another alternative is to never use the interactive client and always write out your delete statements in a file and run that file to delete data.)

But, recently, I did exactly that, because I forgot.  I deleted all the data from one table in our production database.  It was billing data, so rather important.  Luckily, I am using Amazon RDS and had set up backup retention.

I wanted to outline what I did to recover from this.

  • I took a deep breath.
  • I wrote a message on the slack channel documenting what had happened and the possible customer impact.
  • Depending on which data is removed, it’s possible you will want to put the application in maintenance mode and/or inform your customers of the issues.  What I deleted was used rarely enough that I didn’t have to take these steps.
  • I looked at how to restore an Amazon RDS backup.
  • I restored the missing data.
  • I communicated that things were back to normal to internal stakeholders.

Unfortunately, it wasn’t clear how to restore a single table.  I’m used to being able to download a .sql file and hand edit it, but that’s not an option.  Stackoverflow wasn’t super helpful.   But if there’s anytime you want clarity, it’s when you are restoring production data.  You don’t want to compound the problem by screwing up something else.

So, here’s how to restore a single table from an Amazon RDS backup:

  • Note the time just before you deleted the data.  (Another reason the slack message is nice.  chatops ftw.)
  • Start up another instance from that moment.  I named it something obvious like ‘has-data-from-tablename’.
  • Twiddle your thumbs anxiously while the new instance starts up.
  • The instance is put into your default security group (as of this writing) which probably doesn’t allow mysql access.  Make sure you modify this security group to allow access.
  • When the instance is up, do a dump of the table you need: mysqldump -t --ssl-ca=./amazon-rds-ca-cert.pem -u user -ppassword -h has-data-from-tablename.c1m7x25w24qor.us-east-1.rds.amazonaws.com -P3306 database_name tablename > restore-table_name.sql; (-t omits the create database/table statements.)
  • If your table is has had writes since you deleted everything, you may need to manually pull down the current data from the production system and merge it into restore-table_name.sql; I was able to avoid this step.
  • Load the data using mysql mysql --ssl-ca=./amazon-rds-ca-cert.pem -u user -ppassword -h production.c1m7x25w24qor.us-east-1.rds.amazonaws.com -P3306 database_name < restore-table_name.sql;
  • Review to make sure the data is correct.
  • Test the application.
  • Update the slack channel, and do any other notifications you need to (customers, internal contacts, etc).
  • Revoke the default security group access you allowed above.
  • Delete the ‘has-data-from-tablename’ instance.

Note this only works if you caught your mistake within the backup retention window. (Make sure you set that up.)  We aren’t multi AZ or clustered, so I’m not sure how that would affect things.

Happy deep breathing!