Skip to content

AWS Advent Calendar

There’s an AWS advent calendar, where new articles will be posted about various aspects of AWS starting Dec 25. If you’re interested in writing or reviewing the articles, feel free to sign up.  There are also some great posts from 2016, covering topics such as how to analyze VPC flow logs, cost control, lambda and building AMIs with packer.

There are also articles from years past.  I haven’t examined them closely, but I’d be wary of them, simply because the AWS landscape is changing so rapidly, and an article from three or four years ago may or may not be applicable.

The wonders of outsourcing devops

I have maintained a Jenkins server (actually, it was Hudson, the precursor). I’ve run my own database server.  I’ve installed a bug tracking system, and even extended it. I’ve set up web servers (apache and nginx).

And I’ll tell you what, if I never have to do any of these again, I’ll be happy as a clam. There are so many tools out there that let you outsource your infrastructure.  Often they start out free and end up charging when you reach a certain point.

By outsourcing the infrastructure to a service privder, you you let specialists focus on maintaining that infrastructure. They achieve scale that you’d be hard pressed to. They hire experts that you won’t be able to hire. They respond to vulnerabilities like it is their job (which it is).

Using one of these services also lets you punch above your weight. If you want, with AWS or GCP you can run your application in multiple data centers around the globe. With heroku, you can scale out during busy times, and you can scale in during slow times. With circleci or github or many of the other devtool offerings, you can have your ci/cd/source repository environment continually improved, without any effort on your part (besides paying the credit card bill).  Specialization wins.

What is the downside? You lose control–the ability to fine tune your infrastructure in ways that the service provider may not have thought of.  You have to conform to their view of the world.  You also may, depending on the service provider, have performance impacted.

At a certain scale, you may need that control and that performance.  But, you may or may not reach that scale.

It can be frustrating to have to workaround issues that, if you just had the appropriate level of access, you would be able to fix quickly.  It’s also frustrating having to come up to speed on the docs and environment that the service provider makes available.

That said, I try to remember all the other tasks that these services are taking off my plate, and the focus allowed on the unique business differentiators.

Re:Invent Videos

AWS Re:Invent is supposed to be a great conference.  I have thus far been unable to attend, but the videos of the presentations are posted online with about a day’s lag.  So, like most conferences, you really should be networking and meeting people face to face rather than attending the presentations.

Here’s the AWS Youtube channel where you can watch all the videos, or just sample them.

I’ve found the talks to be of varying quality.  Some just rehash the docs, but others, especially the deep dives, discuss interesting aspects of the AWS infrastructure that I haven’t found to be documented anywhere (here’s a great talk about Elastic Block Storage from 2016).  The talks by real customers also give a great viewpoint into how AWS’s offerings are actually implemented to provide business value (here’s a great talk from 2016 about using Amazon Machine Learning to predict real estate transactions).

It’s a sprawling conference, well suited to AWS’s sprawling offering, and I bet no matter what your interest, you will be able to find a video worth watching.

AWS Questions: Certification

Lots of times folks in my class are interested in pursuing AWS certifications.  The classes I teach are good at preparing you to be certified, but are definitely not certification classes.  Here’s the answers I give to students interested in being certified:

To get certified, you should review the page for the cert you want.  Here’s the page for the AWS Architect – Associate certification.

When I got certified, I re-read the student guide from the class and made sure I understood everything covered in it.  I didn’t just look at what the student guide had–I went to the AWS documentation as well.  I also read some whitepapers as outlined in the exam guide (found on the certificate page linked above).  I then took the sample questions (answers not provided, but you can find them via googling, anda again, on the certificate page) and then the practice exam (which costs $20, I believe, and gives you familiarity with the test format, but can only be taken once.).  Those gave me feedback that I was on track to pass the exam.

Note that the course is more hands on than the exam and doesn’t map strictly to the exam.  However, AWS does a good job of explaining what they are looking for in the exam guide (on, you guessed it, the certificate page).

Some of my students and colleagues have also had good luck with acloudguru, but I have no personal experience with that service.  The company for which I work (but for which I do not speak) also offers a course that is designed to help folks pass certain certs, but I have no experience with the course.

Finally, it’s worth noting that all the certs I have taken have been proctored.  Depending on where you live, you may have a number of test centers available, or one (or none).  Find that out before hand!  I also found that the exams I wanted were never available next day–I had to schedule them out a few weeks in advance.  YMMV.

AWS machine learning talk

I enjoyed giving my “Intro to Amazon Machine Learning” talk at the AWS Denver Boulder meetup.   (Shout out to an old friend and colleague who came out to see it.) I didn’t get through the whole pipeline demonstration (I didn’t get a chance to do the batch prediction), but the demo gods were kind and the demo went well.

We also had a good discussion.  A few folks present had used machine learning before, so we talked about where AML made sense (hint, it’s not a fit for every problem).  Also had some good questions about AML, about performance and pricing.  One of the members shared a reinvent anecdote: the AML team looked at all the machine learning used in Amazon and graphed the use cases and solved for the most common ones.

As, usual, I also learned something. OpenRefine is a tool to help you prepare data for machine learning.  And when you change the score cut-off, you need to restart your real-time end point.

The “Intro to Amazon Machine Learning” slides are up on SlideShare, and big thanks to the Meetup organizers.

DynamoDb: What’s left to manage

AWS recently announced that DynamoDb will now scale read and write capacity automatically.  While there was already a lot of database administration that DynamoDb took care of (backups, underlying infrastructure provisioning), setting the proper capacities initially, and updating them as your application changed, was a key task that fell to the user. No more.

I posted a link to the news to a discussion channel I participate in, and someone asked: “what’s left to manage?”. Drawing from that discussion, here are a few items remaining:

  • Appropriate partition keys.  Make sure they are spread uniformly.
  • Choosing the right primary key. Since you typically want to avoid table scans and can only query by primary key, making sure you pick the right one is important.  (I would also call this “data model design”.)
  • Enforcing data integrity, initially and through time.  This is a challenge with every nosql solution.
  • Creating the appropriate secondary global indices for your application.
  • Securing and controlling access to your data.

These are all still important tasks, but DynamoDb is getting easier and easier to use for high performance applications for which nosql is a good fit.  (And for which you don’t mind being tied to AWS.)

AWS Questions: SQS

More questions, this time about SQS, the simple queue service that AWS provides.

  • What was the first AWS service?
  • Are there upper limits on limits on SQS in terms of message/second?
    • FIFO Queues have a limit (300/s), but I wasn’t able to file any hard limits for standard SQS.  In the developer guide they have some examples that reach 2500 messages/second.  I found some benchmarks from 2014, which were able to get to 108k messages/second.
  • Can you create alarms based on the number of messages in a queue?
    • Yes, that is a metric that Cloudwatch tracks: “NumberOfMessagesSent”.  You can use this in combination with an auto scaling group to handle batch processes in a dynamic manner (scale out when you have more work in the queue, scale in when you have less).
  • What is the maximum visibility timeout for SQS?

AWS Questions: Windows Servers

Windows servers are supported on AWS, but recently I had students ask a bevy of questions about them.  Here are some answers.  As a reminder, I speak solely for myself with these blog posts, not for AWS or any employer.

  • What versions of Windows are supported?
  • Can I create an AMI from an EBS snapshot of a Windows root volume?
    • Unlike with a linux EBS snapshot, you cannot create an AMI from a root volume.  You can create an AMI from a running instance, however.  The reason for the limitation is that sysprep must be run on the Windows server, and you can’t run sysprep on a EBS volume that is not running.
  • In order to take an accurate snapshot, I need to quiesce the disk.  How can I do so?
    • This is a thorny problem and I don’t think there’s a great answer. You want to shut down as many apps as you can. You also may find the Volume Shadow Copy Service useful. You may want to review the answers here on this reddit thread.
  • I have a Windows bastion host, and I want to allow more than two users to access this host at one time.  How can I do this?
    • You need to purchase additional Remote Desktop Services licenses.  From the FAQ: “Amazon EC2 instances come with two Remote Desktop Services (aka Terminal Services) licenses for administration purposes. If additional Remote Desktop Services licenses are needed, they should be purchased from Microsoft or a Microsoft license reseller. Remote Desktop Services licenses purchased with Software Assurance have license mobility benefits and can be brought to AWS multi-tenant environments.”
  • Is powershell a first class citizen with the same functionality as the CLI or the supported SDKs?
    • Nope.  From the Powershell page: “The AWS Tools for Windows PowerShell lets you perform many of the same actions available in the AWS SDK for .NET. You can use it from the command line for quick tasks, like controlling your Amazon EC2 instances.”  (Emphasis added.)
  • Do you have any example userdata scripts for Windows AMIs?

Amazon Machine Learning: An Introduction

From my book, Amazon Machine Learning: An Introduction:

Amazon Machine Learning, or AML, provides you access to widely applicable machine learning algorithms without having to run any servers.  This type of learning is useful for making predictions based on a set of data for which answers are known.  AML supports supervised learning with the stochastic gradient descent algorithm.  The end goal of AML is to create a model, which is what will allow you to make further predictions based on past data.

AML supports three different kinds of predictions.  For binary outcomes, where observations lead to a yes/no result, AML supports binary classification.  An example would be whether or not a prospect is likely to sign up for a new account, given their past interactions with your company.  For multi valued results, where observations lead to one of N results, AML supports multi class classification.  A good example of this would be which product to show a customer, given what they’ve looked at and bought in the past.  And, for numeric values, AML supports regression.  An example of that would be predicting house prices based on sales data and house attributes.

If you are not trying to use existing data and create predictions out of it using supervised learning, but are trying to instead recognize images or tease out patterns in text, you may want to consider alternatives to AML.