Terraform with multiple workspaces and environments

I recently was setting up a couple of AWS environments for a client. This client had a typical web application which talked to an RDS database. There was DNS, a CDN and other components involved. We wanted to use Terraform to maintain traceability and replicability, and have the same configuration for production and staging, with perhaps small differences like ec2 instance size. We also wanted to separate out the components into their own Terraform workspaces to limit the blast radius (so if one component had changes that caused issues or Terraform corruption, it wouldn’t affect others). Finally, we wanted each environment to have its own Terraform backend, again to separate the environments.

I wasn’t able to complete this project due to external factors (I left the position before testing could be completed), but wanted to share the concepts. Obviously I can’t share the working code, but I set up an example project which is simpler. That’s the project I’ll be examining in this post. I also want to be clear that while I’ve tested this as much as I could and have validated the ideas with others who have more Terraform experience, this hasn’t been run in production. You have been warned. (Here’s the Terraform docs about setting up modules, workspaces and repositories.)

Using a tool like Terraform is great for a number of reasons, but my favorite is that it lets you track changes to cloud infrastructure. More than once I’ve wandered into an AWS account and wondered why certain resources were set up in the way they were, and what might break if I changed them. There are occasionally comments, but it is far better to examine a commit. Even better to review the set of commits and see the customer request or bug tied to it. (Bonus link: learn more about Terraform and other cloudy tools in this podcast episode with the creator of Terraform.)

So this simpler example project has a lambda that writes to an SQS queue. For now, it just writes the date of invocation, but obviously you could have it reach out to an external API, read from a database, or do some kind of calculation. The SQS queue could then be read from by an EC2 instance, which processes the message and perhaps updates a database. You have three components of the system:

  • The lambda function
  • The SQS queue
  • The EC2 instance (implementation of which is left as an exercise for the reader)

The SQS queue is shared infrastructure and needs to be accessed by both of the other systems. However, the SQS system doesn’t need to know about either the lambda or the EC2 instance. Using Terraform, we can create each of these components as their own workspace. Each of the subsidiary systems can evolve or change (for instance, the EC2 instance could be replaced with an autoscaling group) with minimal impact on other systems. They could be managed by different teams as well if that made sense.

To enforce this separation, set up each component as a separate Terraform workspace. (All code is on github here.) I use remote state so that more than one person can manage the terraform state, and use the S3/dynamodb backend because we are targetting AWS and want a free scalable solution. This post assumes you know how to set up Terraform using s3/dynamodb as a remote state storage.

Here’s the outputs of the SQS system:

output "queue_url" {
  value = "${aws_sqs_queue.myqueue.id}"
}

output "queue_arn" {
  value = "${aws_sqs_queue.myqueue.arn}"
}

I explicitly define the output variables so I can pull them in from the lambda and EC2 workspaces. This is how you can do that.

...
data "terraform_remote_state" "sqs" {
  backend = "s3"
  config = {
    bucket = "${var.terraform_bucket}"
    key = "sqs/terraform.tfstate"
    encrypt = true
    dynamodb_table = "terraform-remote-state-locks"
    profile = "${var.aws_profile}"
    region = "us-east-2"
  }
}
...
resource "aws_lambda_function" "mylambda" {
...
  environment {
    variables = {
      sqs_url = "${data.terraform_remote_state.sqs.outputs.queue_url}"
    }
  }
}

The terraform_remote_state block defines the location of the previously defined sqs workspace, and the ${data.terraform_remote_state.sqs.outputs.queue_url} references that url. That is then injected as an environment variable into the lambda, which reads it and uses the url to create an SQS client. It can then post whatever message it wants.

You can see how this would work with any number of configuration parameters. If you have typical three tier database driven application with a separate caching layer you can create each of these major components and inject the values into either the environment (for lambda) or the userdata (for EC2). I’m not sure I’d use this with a microservices architecture because using a services registry might be more appropriate.

Note that the lambda component has a rudimentary lambda function (you have to define something). It also uses Terraform to deploy the lambda code. That’s fine for the toy example, but for production you will want to use a real CI/CD system to deploy your lambdas.

Now, suppose you want to run production and staging environments, because you are ready to launch. Here are the constraints you’d want:

  • Production and staging run the same config (except when staging is changing, of course)
  • Production and staging may differ in a few details (the size of the EC2 instance, for example)
  • Production and staging execute in different AWS accounts to limit access and issues (charity link). You don’t want an error in staging to affect production. This is handled by creating different profiles which have access to different accounts.
  • Production and staging execute in different Terraform backends for the same reason as the separate AWS accounts.

Staging and production can use the same git repository, but when pulled down they are kept in two places on the filesystem. This is because you need to specify the profile and the bucket when using terraform init. So you end up running something like these two commands:

git clone git@github.com:mooreds/terraform-remote-state-example.git # staging
git clone git@github.com:mooreds/terraform-remote-state-example.git production-terraform-remote-state-example # production

I set up the project so that staging can be managed by normal terraform commands (since that will happen more often), and that production uses either special incantations or a script. For the initialization of the production Terraform environment, this looks like: terraform init -backend-config="profile=trsproduction" -backend-config="bucket=". For staging, it’s just terraform init. I didn’t have a lot of luck switching between these two Terraform backends in the same filesystem locations, so that having two trees was a straightforward workaround.

Any changes between production and staging are each pulled out to a variable, with the staging value as the default. Then each workspace has a script which applies the Terraform configuration to the production environment. The script sets variables to be the correct value for production. Here’s an example for the lambda workspace:

terraform apply -var aws_profile=trsproduction -var terraform_bucket="mooreds-terraform-remote-state-example-production" -var env_indicator="production" -var lambda_memory_size=256

We pass in the production terraform_bucket in case any references need to be made to the remote state (to pull in the SQS queue url, for example). We also pass in an increased lambda memory size because, hey, it’s production. Other things that might vary between environments: for example, VPC or subnet ids, API endpoints, and S3 bucket names.

For simplicity, we just use two profiles for staging and production (in ~/.aws/credentials), but any way of getting credentials that works with Terraform will work:

[trsstaging]
aws_access_key_id = ...
aws_secret_access_key = ...

[trsproduction]
aws_access_key_id = ...
aws_secret_access_key = ...

This lets us separate out who has production access. Some users can have both staging and production profiles (perhaps operations), and others can have only staging profiles (perhaps developers). You can pass region values in via variables as well.

Using this system, the workflow for a change would be:

  • Check out the terraform git repository
  • Create a feature branch (including an issue identifier)
  • Pull request and approval
  • Run terraform apply to apply to staging
  • Run any additional tests
  • Merge to master
  • Run prodapply.sh

Again, I want to be clear that I’ve implemented this partially, but I didn’t get a chance to run this fully in production. I tested all these concepts with the simple system mentioned above (and you can stand up your own using the code on github). There will be issues that I haven’t experienced. But I hope that this post helps illuminate the complexity of managing multiple workspaces and environments within a single Terraform github repository.


Develop Denver

I’m excited to let y’all know I’ll be speaking at Develop Denver this year. Last year I talked about Amazon Machine Learning (RIP). This year I’ll be covering three things that surprised me as a new developer (based on my Letters to a New Developer project).

If you want to learn more about how much fun Develop Denver is, I wrote a recap about my experience last year. As of writing, tickets are still available. It’s a great two day wide ranging community oriented conference. Hope to see you there.


What senior engineers do: fix knowledge holes

I had a bad week a few weeks back, and got together with a friend and former colleague. We started trading war stories. He mentioned one time when he was working for a company which had both a desktop client and a server component. The communication between the two pieces of the system was completely, utterly undocumented. The original engineers who had written each piece of the app had departed. The system was key to the companies business (it was what they sold!).

This “knowledge hole” was identified by my friend. He was a software engineer. This was a key piece of software, but was unknown. The problem was not getting solved.

So, he solved it.

He installed the client and a TCP sniffer, and clicked around the client. He recorded all the traffic. He literally reverse engineered the client/server communication protocol. He then documented what the communication protocol was, so the the next group of engineers could benefit. He was was a software developer who was responsible for the back end systems. I don’t think he was directly responsible for the client/server protocol and the client was, I believe, outside his purview. (Updated 7/20) Note, he was a software engineer, not QA or a network engineer. But he didn’t let the role definition stop him.

I thought to myself, this is the textbook definition of a senior engineer. You see a problem, you solve it (thoroughly), you document it and you level up your team.

It was pretty awesome to hear about.


Exercism.io: Level up your coding skills

Children at playI’ve really enjoyed Exercism.io. This is an online learning platform for coding. It’s similar to HackerRank or Codewars.

The main difference is that for a subset of the problems, you actually have a mentor give you feedback. You download a problem (with tests) and code up the solution however you like. Then you push the solution up to a central repository where a volunteer mentor reviews it. I’ve had a different mentor for every problem. The same mentor stays with you for the duration of one single problem, and gives you feedback on style and API usage. They also provide accountability in a way that a computer grading my code just doesn’t do it for me.

Each language has a track (as well as some specialized frameworks like React). I’m working on two tracks right now. The first is a language I’m intermediate in (ruby). I’ve learned a lot about the standard library as well as ruby idiom. The second is a language I’m at best a beginner at: javascript (specially modern javascript which has changed a lot).

But there are a lot of other languages I’m looking forward to looking at when I finish these tracks (functional languages like OCaml, for one).

A final benefit of Exercism is that each problem is a challenge but not an impossible one. I find coding up a naive solution to the challenge takes about 15 minutes, and I’ll often do that first before I try to refine it to make it a bit better. It’s kinda like TDD but someone has written all the tests for you, so you just get to do the fun part.

Check it out.


Things I wish I knew as a new developer

I’m participating as a speaker or panelist in my third annual Boulder Startup Week. This year I get to talk about my current passion project, Letters to a New Developer. I’m presenting on “10 things I wish I knew as a new developer” including tips like “learn version control” and “remember, it’s about outcomes, not output” It’s a free presentation on Monday May 13. I hope you can join me.

If that time or subject doesn’t work or interest you, check out all of the other awesome presentations happening in Boulder during the 2019 Boulder Startup Week and see if any of them tickle your fancy.


A video of My Amazon Machine Learning Talk

I gave a talk at Develop Denver last year about Amazon Machine Learning. They recorded it and you can now view the video. I feel a bit like a superhero in the shadows, because the lighting situation was such that you couldn’t see both my face and my slides at the same time, but if you want to see what AML is all about and how it can help you experiment with supervised machine learning in a lightweight, cheap, fast manner, please check it out.

The full video is about 35 minutes long.


What is an MVP and why do you care?

I was recently published over on the Go Code Colorado blog. I wrote about what a minimum viable product (aka MVP) is. When I talked to teams at last years competition, one recurring theme was how much people wanted to spend time building rather than talking to users.

I have been there! I know it is much more fun to build something than it is to try to find people of whom to ask questions.

But it’s much better for the long term viability of your project to build something people want poorly than something no one wants perfectly.

More over at the Go Code Colorado blog.


Announcing: Letters to a New Developer

Blank paper, pencil, lightbulb, eraserI’ve been working on a new project that I’m excited to share here. It’s called Letters to a New Developer. It’s a blog with over forty posts full of advice for people who are just starting out in the field of development.

So much of development is not about code. It’s about teams and processes and organizations and questions. I feel like the traditional education system (whether four year college or bootcamp) does a good job of teaching folks coding fundamentals (what you need to be a programmer) but there is so much more to being a successful developer than coding. In my experience, delivering code is a necessary but not sufficient activity for success in a software company.

That’s why I’ve spent the last five months writing up some of my experiences and learnings.

The posts are mostly mine, but some are from guest posters who bring a different perspective. I also excerpt posts I find on the Internet that I find helpful, so that I can stand on the shoulders of some of the giants who have come before.

Here are some of my favorite Letters To a New Developer posts: Always Be Journaling from Brooke Kuhlmann, Get Used to Failure, and Learning to Read Code is More Important Than Learning to Write It.

I hope you find this project as fun to read as I’ve found it to write.


Ideas for marketing a dev centric product

I have a friend who has a company that produces a developer tool that provides a one stop shop for application authentication. It’s a good one and they’ve found some traction.

They are trying to follow the redhat/mongodb playbook–create a tool so good at solving developer problems that developers adopt it. Eventually some portion of enterprises will wake up one morning and realize that their developers are using this. The CIO will freak out and be happy to pay money to my friend’s company for support.

The question is, how do you get developers to find out about the solution and how it will make their lives better?

I am not a marketing guru, but I am a developer and have been involved in selling of software in the past, so I had some suggestions.

  • Read The New Kingmakers, by Stephen O’Grady, in particular “Courting the Developer Population” and “What to Do?”. You can get a PDF here. O’Grady has thought a lot about this problem and his blog is worth reading as well.
  • Find or create a community, and contribute to it. Depending on your audinence, this may be a reddit subforum, an old school php bbedit forum, a facebook group or even an IETF working group. An existing group is more likely to give you results, where a new group will give you more control. Aim for results.
  • Present case studies or testimonials of success using the software. How did a real person solve a real problem with your software?
  • If you can find analysts and/or bloggers focused in your problem space, contact them and offer to walk them through the solution. (To me, this seems more of a long shot, but could be a big hit if you find the right person.) Developer focused podcasts could be an option too, as long as you are talking about technology and the problem space, not your product.
  • Writing interesting content on your blog. Share it widely. Again, this doesn’t have to be about your product, in fact it shouldn’t. Instead, take topics you’ve learned by building your product, or in the communities of which you are part, or the problems developers have solved with your product, and highlight those topics.
  • Write talks for conferences. For developers, I’d look at conferences like gluecon and oscon. Meetups are good too–easier to get into but much less scalable. You attend virtual meetups as well as physical ones.
  • Set up a google alert for competitors’ names and keywords and see if you can add value to any discussions happening there.

Above all, don’t be salesy. Focus on making the developer’s life easy.


Ever felt like your codebase was out of control?

I certainly have. A couple of times in my career the combination of technical debt, business model shift and lack of time for a proper fix have left me feeling out of control.

But reading this post on Hacker News made me realize that it all could have been so so much worse. A couple of “best ofs”:

To give you some examples, I originally came on as a contractor because they had some refactoring they wanted done. The entire system was home built (including the programming language) and there was a file size limit of 32,767 lines. They had many functions that were approaching this limit and they didn’t know what to do, so they hired me.

and:

Once upon a time, there was a search product and one of the data sources that it could search was a Solr/Lucene database. This should be no problem, since search is what Solr does. It should be as simple as passing the user’s query through to Solr and then reading the response. The problem was, it was important to know exactly which parts of any matched records were relevant to the search.

 

The Guy Before Me™ decided that the best way to implement this would be to split the user’s search into individual words, perform a separate search query through Solr’s HTTP API for each individual word, and then do a bunch of very clever and complex post-processing on the result sets to combine them into a single set of results.

and (last one, I promise):

At my first gig I teamed up with a guy responsible for a gigantic monolith written in Lua. Originally, the project started as a little script running in Nginx. Over the course of several years, it organically grew to epic proportions, by consuming and replacing every piece of software that it interfaced with – including Nginx.

 

There were two ingredients in the recipe for disaster. The first is that Lua comes “batteries excluded”: the standard library is minimalist and the community and set of available packages out there is small. That’s typically not an issue, as long as one uses Lua in the intended way: small scripts that extend existing programs with custom user logic (e.g. Nginx, Vim, World of Warcraft). The second is that Lua is a dynamic language: it’s dynamically typed, and practically everything can be overridden, monkey patched and hacked, down to the fundamental iterators that allow you to traverse data structures.

shivers. There, but for the grace of God.



© Moore Consulting, 2003-2019