Skip to content

All posts by moore - 38. page

Thoughts on Amazon CloudFormation

cloud formation photo
Photo by eschipul

I recently set up Amazon CloudFormation for a fairly complicated application in AWS.  For those unfamiliar with this service, it allows you specify a number of AWS resources in a declarative way in a JSON document, create them all at once (it’s called a ‘stack’), manage them as one entity, and destroy them.  You are billed just as you would be if you created the resources by hand.  But it’s a versionable, replicable way to create resources.

The distributed application for which I was creating the stack had the following components:

  • queues (SQS)
  • databases (dynamodb, including secondary indices)
  • compute (EC2)
  • alarms (Cloudwatch)
  • storage (S3)
  • a VPC and Subnets
  • event logging (kinesis)
  • hadoop (Elastic Map Reduce)

The last four items were not configured by the CloudFormation template I wrote.  S3, VPC and subnets because I leveraged existing resources, and Kinesis and EMR because they are not supported by CloudFormation.  (Kinesis has some support, but CloudFormation doesn’t allow you to specify a name of a stream, which makes it pretty useless when you want to post or read from a specific stream.)  However, while it would be preferable to have everything specified in CloudFormation, partial stack creation was useful–I just documented the other requirements in the CloudFormation template–because:

  • resource configuration like queue timeouts, names, read throughput, etc can be applied uniformly–consistency is enforced.
  • the infrastructure is defined and documented in one place, allowing a new developer to get up to speed quickly.
  • tags can be applied uniformly.
  • CloudFormation supports parameters, so that you can preface every resource with a deployment environment specific variable (‘stage’, ‘dan-dev’, etc), or have different DynamoDB throughput for different deployments.
  • if different configuration needs to be tested, you can stand up a new stack in minutes and test it.
  • the template can be stored in your version control system, allowing someone to see how things changed over time.  Yay, commit logs!

There were some other possible benefits I just didn’t have time to explore fully before the project wound down.

  • autoscaling groups seemed like they’d be extremely useful.  These aren’t a CloudFormation only tool, but CloudFormation seems an ideal way to define and use them.
  • the ability to create and delete stacks opened up the possibility of creating developer specific environments for debugging issues.

If you are going to start with CloudFormation, I highly recommend setting up an initial environment by hand, and then running CloudFormer, a small application written by Amazon which reads from your existing AWS infrastructure and generates a CloudFormation template.  I used CloudFormer to create a template for everything in our AWS account, and then picked and chose what was pulled over to the new template.  There were a few issues with this though:

  • There was a bug in the CloudFormation documentation for DynamoDB schemas.  You want to use this syntax: "KeySchema": { "HashKeyElement": { "AttributeName": "attrname", "AttributeType": "S" }, ... }.  CloudFormer generated them correctly, however.
  • CloudFormer coerces names of some resources resources including VPCs and subnets to strings, and I had to back those out when I wanted to use existing resources.

Other than not being able to fully define an application (because of dependencies on unsupported AWS tools like Kinesis and EMR), what other downsides does CloudFormation have?

  • it locks you into AWS.  Openstack Heat is an alternative that works across clouds, or so I read.  And, really, once you decide on AWS, is a infrastructure creation script going to be the one thing that keeps you from moving?
  • it is tied to infrastructure creation (though there is resource by resource support for in place updates).  If you want to modify one queue setting, you have to tear down and create anew the entire stack.  I found this to be relatively quick (15 min or so).
  • you are still writing scripts in the UserData section of the EC2 definition to set up your server environment.

After this experience, and reviewing my thoughts above, I believe the sweet spot of CloudFormation is setting up dev and QA environments quickly, and documenting infrastructure choices when you are committed to AWS.

My Experiences with a Digital Sabbath

sabbath photo
Photo by Center for Jewish History, NYC

I’ve tried a digital sabbath a few times in the past year.  If you aren’t familiar with the concept, it means taking one day a week and putting away all your digital devices.  No smartphones, tablets, laptops or desktop computers (do people still use those?  I do!).  For one day, I even skipped making phone calls.  Focus on the here and now.  Read a book.  Play with your kids.  Go outside.  Do that home improvement project you’ve been meaning to get to.  Look, there’s even a website about the digital sabbath!

I’ve done this a few times and it is tough.  Why?  If I have any questions about anything, I reach for my phone or tablet–when does Home Depot open?  How do I cook sunchokes?  That is relatively easy to counter–just prepare ahead of time, or accept not knowing.  I’ve even been known to pull out a copy of the white pages (yes, they still distribute that).

I also feel I am ‘maximizing’ my time–when I can read about Clojure or respond to tweets while brushing my teeth, I feel like I’m doubling my time.  It’s the same feeling I have when I run the washing machine and the dishwasher–I can sit on the couch and read because I’m ‘doing’ two jobs already!  So, a sabbath removes a major source of attention fragmentation.

The harder part of a digital sabbath is the non informational uses of my phone.  Frankly, I use my phone to escape boredom and frustration.  Of course, it is still entirely possible to ‘check out’ with a book or even daydreaming, but using a phone makes it so dang easy.  I think it is because it feels like you are accomplishing something worthwhile easily–gaining new knowledge, interacting with someone across the world.  Maybe because those use to be hard hard tasks–you had to check a book out of the library, or write someone a letter or make an expensive phone call.  Now the effort/reward has a radically decreased numerator, but my brain is still in the 1980s and doesn’t recognize it.

But.

While I can learn plenty and make plenty of friends through your phone/tablet/internet connected whatzit, a digital sabbath forces you to ive in the now and the here.  Escapism is fine in small doses, but a digital sabbath forced me to confront how often I use my phone for that purpose.

Annotation driven Spring configuration vs XML driven Spring configuration

 julius caesar photo
Photo by Internet Archive Book Images

First, off, I have to admit that I am not a XML hater.  Sure, it’s not the prettiest structured data language, but it seems to do the job just fine, especially in environments where you aren’t concerned about byte size (like, say, when you are configuring a software system).  Plus, it allows you comments, which some alternatives ([cough] JSON) don’t.

But, I come to contrast two types of Spring configuration, not bury XML or @Annotations.  I’ve been working on a project that uses Spring annotations.  I evaluated Guice, but it didn’t have lifecycle management, and we needed that.  I also evaluated PicoContainer (I know, blast from the past, right) but it didn’t seem functional and the web presence was a mess.  In the past, I’ve used XML based configuration of Spring.  I thought it would be useful for me to capture a few of the differences.

First of all though, they are fairly similar.  Both use the standard Spring workflow of “create a number of components, pass them to other components, pass those to other components, occasionally use some of the extensive spring library in a way that appears magical at times, create a few more components, and string them all together to build a system that does what you want”, whether “what you want” is responding to web requests, modifying databases, pulling data off of a queue for processing, or something entirely different.

For annotation based configuration, you are writing the configuration in classes that are specially tagged with a @Configuration annotation. These configuration classes look for components in a couple of places, either under them in the classpath or in certain packages, as specified by the @ComponentScan annotation. This means that you get all the benefits of your IDE:

  • You get code completion.
  • The compiler checks your types so that you don’t have runtime exceptions if you passed a Foo bean to a class that was expecting a Foobar bean.
  • You can do logic in the configuration class to load different components based on anything (accessing a buildtime constant, a file, a database, an external service, etc)
  • You can refactor bean definitions and know that they’re changed everywhere.

All this power comes with the risk of complexity. If you read from a database table to know what kind of component implements a certain interface, that can be, shall we say, less than obvious.

XML based configuration, in my experience, is simpler. You still can have multiple layers of XML files, but you can’t have any logic, at least without using XSLT and generating your Spring configuration files (if you are doing that, annotation based configuration is going to be far simpler!). Components can be anywhere because they are specified by the fully qualified class name.  XML configuration removes temptation to invoke any java code as part of your component configuration. On the flip side, you do have risk of runtime errors with typos, and you must deal with XML.

Both of these configuration options are well supported. I find that there’s more documentation online about the XML based configuration, but they are isomorphic, so I’d recommend picking the option that suits your needs best. If you need complex configuration or want the blanket of type safety, then annotation based configuration is the best option. If you have a simpler project, especially if everything can be contained in one XML file, XML based configuration is the better option.

Why Use an ETL Tool?

transformation photo
Photo by AlicePopkorn

I’m a big fan of ETL tools.  The one with which I am most familiar is Kettle, aka Pentaho Data Integration.  When I was working for 8z, we used it heavily to pull data from other systems, process it, and update our databases.  While ETL systems are not without their flaws, I think their strengths are such that everyone who is moving data around should consider them.  This is more true now than in the past because there is a lot more data flowing everywhere, and there are several viable open source ETL tools, so you don’t have to spend thousands or tens of thousands of dollars to get started.

What are the benefits of ETL tools?

  • There are pre-built components for common data tasks (connecting to a database, parsing a flat file) that have been tested and debugged by many many people.  It’s hard to over emphasize how much time this can save, allowing you to focus on business logic.
  • You operate at a higher level of abstraction.
  • There is support for other performance features like parallel jobs that you can configure.
  • The GUI makes data flow obvious.
  • You can write your own components that leverage existing libraries.

What are the detriments?

  • Possible to version control, impossible to merge.
  • Limits of components mean you sometimes have to contort your data flows, or drop down to write your own component.
  • Some components (at least for Kettle) are not open source.
  • You have to roll your own testing framework.  I did.
  • You have to learn another tool.

Don’t re-invent the wheel!  Your data movement problem may very well be a super special snowflake, but chances are it isn’t.  Every line of code you write is another you have to maintain.  When you are confronted with a data movement problem, take a look at an ETL tool like Kettle and see if you can stand on the shoulders of giants.  Here’s a list of open source ETL tools to evaluate.

Denver Bootstrappers Lunch

boots photo
Photo by liftarn

My friend and former colleague Corey Snipes has been working to get a Bootstrapper’s meetup off the ground in Denver. This is a small group (limited to 12, I believe) of people who are building products (typically software) and self funding. I believe most of the members are in the solopreneur mode (I know Corey is).

I imagine this kind of support group would be fantastic–certainly I had a similar group when I was a consultant in the past, and bouncing ideas off of others in similar situations made the struggle much easier.

I’ve not made this meetup yet because, a) I’m not sure I’m bootstrapping (and you know what, if you aren’t sure you’re bootstrapping, you aren’t bootstrapping!), b) I live in Boulder and Boulderites have a hard time leaving the Boulder Bubble, and c) Wednesdays in general are tough days for me to do anything outside of the house.

If you are a bootstrapper in the Denver area, take a look.

Lessons from curating a link blog

link photo
Photo by StockMonkeys.com

I maintain a link blog about Colorado food and local food in general.  I use Tumblr, but I’m only incidentally interested in Tumblr traffic.  Tumblr hooks up to Facebook and Twitter, and pushes links there.  (I realize that I am missing interaction on Twitter and Facebook by using these networks as broadcast only, but I don’t have time to fully engage, so I thought a limited presence was better than nothing.)

Having maintained this link blog for over two years, I have learned a few things.

  • It is easy to start a project like this, but hard to finish.  There’s always more to do.  I think I’ll stop when it stops being interesting.
  • Deciding to do this is a great way to gain a broad understanding of a field while providing some value (via curating).  As you find more and more sources of links, videos, articles and audio content, you’ll gain a sense of what is happening.  Even if you don’t painstakingly read every article, you’ll still get a sense.
  • Speaking of sources, Google alerts is your friend.  I get emailed alerts on a variety of searches, and about 25% of the results are worth posting.  Facebook and twitter are additional great sources of links.
  • An RSS reader can help you if you are really diving in.
  • Giving someone notice that you’ve referenced their article via an ‘@’ mention will get you their attention.
  • Queuing up posts on Tumblr is a life saver.  This lets you stack up posts and portion them out one per day.  I typically have between 15 and 30 posts in my queue.  This makes timely posts more difficult, but frees me up to forget about the link blog for weeks at a time.
  • A link blog like this is a great use of your in between time, especially if you have a smartphone.  In five minutes I can scan and post two or three links, where five minutes is barely enough time to think of a regular blog post.  The Tumblr app is very good.
  • A linkblog is a great resource for other content generation.  I have a newsletter about local food as well, and a key section of that is interesting links.  Those are almost entirely drawn from the Tumblr.

The linkblog approach is very similar to Twitter, but differs in a few crucial ways:

These attributes make a linkblog a fine complement to Twitter.

There are some problems with this model, however.

  • Limited interaction with followers, either on Tumblr, Facebook or Twitter.
  • I’ve found that engaging on Twitter and Facebook directly is far more effective if you want content to be viewed or links to be clicked.
  • A linkblog like this is not truly building my tribe

So, if you have limited time, want to gain insight into a particular area of interest, and are OK with the drawbacks, create a linkblog.

A Useful, Tested Git Work Flow

knot photo
Photo by fdecomite

I’m working on a project with a number of developers (about 9 checking in code) that is moving rather fast and we’re using git and github. It’s actually really interesting to me, because most of my experience has been with smaller teams (and centralized VCS) where having everything on HEAD is perfectly fine. I was even able to branch using CVS because the chance of merge conflicts with no one else doing development was small.

I remember having lunch with a friend who worked at Rally and we talked about git. He said that they were heavy users, and “once you use it, dude, you’ll never go back”. At the time I thought–how great can git be? I’d been using it for a small project I was coding by myself, and it seemed nice enough, but not revolutionary.

But, now that I’m using it in a fast moving team with a large number of developers touching lots of parts of the system, the branching and merging capabilities of git are starting to shine. The project lead, who has used git before, recommended the Driessen git flow (from 2010), which is more complex than the github flow.

We’ve been using this for a few weeks and I’ve found it be clear, fairly easy to understand and still flexible enough to let development move forward at a breakneck pace. The supporting branches, along with master (always what is in production) and develop (always works, what is coming down the pike in terms of features), seem to be a nice compromise between the strictures of traditional, centralized VCS and the free-for-all that is possible with git.