I’ve had some experience with CloudFormation in the past, and recently gained some puppet expertise. I thought it’d be great to combine the two, working on a new project to set up the ELK stack for a client.
Basically, we are creating an ec2 instance (or a number of them) from a vanilla image using a CloudFormation template, doing a small amount of initialization via the UserData section and then using puppet to configure them further. However, puppet is used in a masterless context, where the intelligence (of knowing which machine should be configured which way) isn’t in the manifest file, but rather in the code that checks out the modules and manifests. Here’s a great example of a project set up to use masterless puppet.
Before I dive into more details, other solutions I looked at included:
- doing all the machine setup in UserData
- This is a bad idea because it forces you to set up and tear down machines each time you want to make a configuration change. Leads to a longer development cycle, especially at first. Plus bash is great for small configurations, but when you have dependencies and other complexities, the scripts can get hairy.
- pulling a bash script from s3/github in UserData
- puppet is made for configuration management and handles more complexity than a bash script. I’ll admit, I used puppet with an eye to the future when we had more machines and more types of machines. I suppose you could do the same with bash, but puppet handles more of typical CM tasks, including setting up cron jobs, making sure services run, and deriving dependencies between services, files and artifacts.
- using a different CM tool, like ansible or chef
- I was familiar with puppet. I imagine the same solution would work with other CM tools.
- using a puppet master
- This presentation convinced me to avoid setting up a puppet master. Cattle not pets.
- using cloud-init instead of UserData for initial setup
- I tried. I couldn’t figure out cloud-init, even with this great post. It’s been a few months, so I’m afraid I don’t even remember what the issue was, but I remember this solution not working for me.
- create an instance/AMI with all software installed
- puppet allows for more flexibility, is quicker to setup, and allows you to manage your configuration in a VCS rather than a pile of different AMIs.
- use a container instead of AMIs
- isn’t docker the answer to everything? I didn’t choose this because I was entirely new to containerization and didn’t want to take the risk.
Since I’ve already outlined how the solution works, let’s dive into details.
Here’s the UserData section of the CloudFormation template:
"Fn::Base64": {
"Fn::Join": [
"",
[
"#!/bin/bash \n",
"exec > /tmp/part-001.log 2>&1 \n",
"date >> /etc/provisioned.date \n",
"yum install puppet -y \n",
"yum install git -y \n",
"aws --region us-west-2 s3 cp s3://s3bucket/auth-files/id_rsa/root/.ssh/id_rsa && chmod 600 /root/.ssh/id_rsa \n",
"# connect once to github, so we know the host \n",
"ssh -T -oStrictHostKeyChecking=no git@github.com \n",
"git clone git@github.com:client/repo.git \n",
"puppet apply --modulepath repo/infra/puppet/modules pure-spider/infra/puppet/manifests/",
{ "Ref" : "Environment" },
"/logstash.pp \n",
"date >> /etc/provisioned.date\n"
]
]
So, we are using a bash script, but only for a little bit. The second line (starting with exec
) stores output into a logfile for debugging purposes. We then store off the date and install puppet
and git
. The aws
command pulls down a private key stored in s3. This instance has access to s3 because of an IAM setup elsewhere in the CloudFormation template–the access we have is read-only and the private key has already been added to our github repository. Then we connect to github via ssh
to ‘get to know the host’. Then we clone the repository containing the infrastructure code. Finally, we apply the manifest, which is partially determined by a parameter to the CloudFormation template.
This bash script will run on creation of the EC2 instance. Once this script is solid, if you are testing adding additional puppet modules, you only have to do a git pull
and puppet apply
to add more functionality to the modules. (Of course, at the end you should stand up and tear down via CloudFormation just to test end to end.) You can also see how it’d be easy to have the logstash.conf
file be a parameter to the CloudFormation template, which would let you store your configuration for web servers, database servers, etc, in puppet as well.
I’m happy with how flexible this solution is. CloudFormation manages the machine creation as well as any other resources, puppet manages the software installed in those machines, and git allows you to maintain all that configuration in one place.