Skip to content

Protecting a CDN source using basic auth

I have a website that is behind a content delivery network (CDN). I want to protect it from being crawled by any robots. I want all access to go through the CDN for reasons. There may be errant links to the source; I don’t care if they continue to work.

htaccess and basic auth is perfect for this situation.

I added an .htaccess file that looks like this:

AuthType Basic
AuthName "Secure Content"
AuthUserFile /path/to/.htpasswd
require valid-user

I needed to make sure the path and the file are readable by the web server user.

Then, I added a .htpasswd entry that looks like this:

user:passwdvalue

If you don’t have access to htpasswd, the typical program used to generate the password value, this site will generate one for you.

Then, I had to configure my CDN to give it the appropriate header.

Use the Authorization header, and make sure to pass the username and the password. This site will generate the appropriately base64 encoded values.

Voila. Only the CDN has access.

Now, the flaws:

  • Depending on how the CDN accesses the site, it may be possible to snoop out the username and password
  • If you ever want to get the origin site over HTTP, you’ll need the username/password

GitHub Actions Are Amazingly Easy

GitHub Workflows are automated jobs that can be triggered by various events against a GitHub repository. They are pretty awesome.

GitHub Actions are a way to encapsulate configuration and functionality in a way that can be easily reused in GitHub Workflows.

I was thinking it’d be fun to create some GitHub Actions (yes, I’m the life of the party), so I sat down a few mornings ago to do this. I was shocked at how easy it was.

I followed a few lines of this tutorial to create a workflow. Then I created an action by following this tutorial. Finally, I edited my workflow to use the new action. That was it.

It was amazingly simple and took me about 30 minutes. I ran into one unrelated issue (to set the executable bit on a shell script in windows, I had to modify the shell script contents in order to ensure the change was sent to the remote repo).

If you take a look, you’ll see these are both toy repositories, to be sure. However, the ability to write jobs which will be executed on a git push, pull request or other events is great and removes toil. Being able to extract common functionality to an action is even better. Finally, the ability to share the action publicly by adding it to the GitHub marketplace is fantastic.

I’ve liked CircleCI for a long time, but if I were them I’d be worried.

One issue I found is that the testing/release cycle is pretty tedious (I’ve mentioned that action debugging to be an issue for a while).

While I was troubleshooting my executable bit error, I had to do the following every time I wanted to test a change:

  • make a change in the action repository
  • create a new tag
  • push it to the remote
  • switch to the workflow repository
  • bump the action version
  • push to the remote
  • wait for the workflow to complete

Not horrific, but pretty tedious. I don’t know if there are other options such as local deployment which would reduce that cycle, but that would be swell.

Other than that, 10 out of 10, would write more actions.

GitHub actions and workflows

I recently wrote my first real GitHub action workflow at work. It was to publish our website after a merge or push to our main branch.

After this experience, I think these workflows are perfect for simple automation tasks. Things like:

  • Running a linter like rubocop on your code
  • Deploying a simple application (one or a few artifacts).
  • Running unit and integration tests.

I didn’t use self hosted actions, though that seems like a nice escape valve if you want to run things within your own network or run over limit. GitHub publishes the action and workflow limits (storage, runtime) and that’s definitely worth reviewing.

You also can easily stand up a couple of different service containers (right now only postgresql and resdis) for easy integration testing. You can also abstract out your commonly used workflow segments to versioned actions.

It was really a pain to write the workflow, however. I had to push repeatedly to our mainline branch, and there were times I screwed up the YAML or didn’t have my script correct. The feedback loop was slooow. Ouch. There are solutions to run them locally, but I didn’t try it yet.

Other than that, it was a positive experience. If you are using GitHub and have automation needs, take a look at GitHub actions. I am a big fan of CircleCI and have been for years. GitHub actions covers a lot of the same ground. GitHub actions are less sophisticated, but it seems like a definite “innovators dilemma” play. So I expect to see actions to get more and more sophisticated.

A quick look at xkit

I was prototyping a small app in xkit and wanted to document this useful tool. When I first saw this launch on HackerNews, I couldn’t quite understand what the purpose was. But now that I’ve spent a bit of time playing with it, I understand it a bit more.

Suppose you are writing a recipe management SaaS and realize that you want to integrate with some other services. Perhaps you want to be able to export the steps of a recipe to a Trello board, or to a Google doc, or to a PDF.

These are all services available on the internet with an API which will allow end users to give your application access to their accounts. This lets you publish to each user’s Google docs account or Trello board.

(I’m not as familiar with services offering PDF generation functionality, but a quick search turns up some options, including some that you can self host.)

There’s a fair bit of hoop jumping in terms of setting up API keys and OAuth consent screens, however.

And this is the problem that xkit solves. If they’ve already written the connection (here’s a list), it is quite simple to add the ability for a user to connect to the service. With no previous experience, I was able to connect to Trello in about an hour. The user experience of connecting the external SaaS application is really smooth and far better than something I could whip up quickly.

If they haven’t written a connector, I don’t believe you can write one yourself. For example, for that PDF service, you’d need to contact the xkit folks and ask them to add one.

This is different than, say, Zapier, because it’s operating at a different level. Zapier is excellent (and has been for years) at letting users connect their apps. But xkit lets you let your users connect apps, basically letting you build a mini Zapier (in terms of connectivity, not functionality).

You can also host your own app catalog if you want to. I didn’t get into this too much, though, so it’s unclear what the benefits of that are.

They provide a user data store out of the box, but also integrate with a number of other providers (including FusionAuth). This means you can leverage your existing auth solution and still get the easy integration with other third party APIs.

Their pricing seems reasonable, given what they take off your plate.

Nothing’s perfect, however. I found a few documentation bugs, which I let them know about (they host their docs on readme.com and I found the suggestion process delightful). When I tried to sign up, the service was down, but a quick Tweet exchange resolved the issue within 30 minutes.

It is bizarre to me as an authentication focused company that they don’t have a “forgot password” link on their login pages. The documentation is javascript heavy, with nary a mention of other languages, but that’s understandable as they’re just starting out. It’s also strangely video heavy, which I found a bit distracting; that, however, could just be my learning style.

All in all, if you are looking to integrate third party APIs which require OAuth interactions on the part of your users, you’d be well served to take a look at xkit.

Thoughts on static site generators and WordPress

Blog in scrabble lettersIn my last few jobs, I’ve done a lot more writing. I’ve learned to work with static site generators (SSGs), such as jekyll and 11ty. I even moved a database driven side project to Netlify. Here’s an interesting survey about SSGs from Redmonk, if you want to learn a bit more. I am also a longtime WordPress user, and think it has some tremendous strengths. I wanted to capture my thoughts on these two options for building your website.

Here’s why I’d pick WordPress for a content heavy site:

  • Avoid any kind of compilation pipeline or step
  • The authors don’t have technical chops and want WYSIWYG
  • You want more than a blog, with additional functionality pulled from the wide world of plugins. Also, themes!
  • You want to allow people to just write, without software getting in the way

Here’s why I’d pick a static site generator for the same:

  • Need performance and scalability at a low low price
  • Wanted ‘set it and forget it’ security (you can’t hack a static HTML page)
  • Authors are technical enough to write markdown and to leverage the data driven possibilities of an SSG
  • No need for interactive functionality, beyond what JavaScript/JAMStack can provide

Like any other choice in software engineering, these two solutions offer tradeoffs.

I think that SSGs are far better for simpler websites, and in some ways they are a more sophisticated version of very early websites. I remember writing a perl program to take a usenet file and turn it into a set of web pages in the last 1990s. (It was jokes, if you must know the content.) When you build and deploy a static site, you know exactly what you’re getting, but you can also extract common functionality to shared files. You will have to compile it, however, and typically uses a version control system, which can be a lot for some non technical folks.

WordPress is fantastic for just getting going. I can get a blog started in 15 minutes that can be used by anyone who knows the basics of Microsoft Word. The flip side of the speed to first page is operational complexity, slower performance (all things being equal), as well as a more complicated application to secure.

I will say that as a developer, SSGs are growing on me.

PS I know some people have combined them both, and using WordPress as the backend and publishing a set of static pages is really appealing. I’ve done some preliminary work on this, but haven’t found a great solution out there.

Joining FusionAuth

I wanted to let y’all know that I’ve joined FusionAuth as a developer advocate. I’ll be working to help our customers succeed and promote the virtues of standards based user management systems. I get to write a lot of content and example applications against a full featured API.

I’ve built enough systems to know two things:

  • Users and their behavior are almost always a key part of any software application.
  • User management is difficult to get right, especially if you want to use secure best practices and standards such as OAuth.

FusionAuth wants to elevate everyone’s user identity management system. The community edition is free and will always be. (It’s important to note that it is free as in beer, not free as in speech, but almost all of the development happens in the open.) If you want to run FusionAuth on your own forever, that’s great! You get a secure user store that supports OAuth, SAML and two factor authentication, free forever. We’ll happily provide you “best effort” support in our forums and we’ve seen the community help each other out too (most notably in the creation of helm charts to run FusionAuth on Kubernetes).

If, on the other hand, you find value in FusionAuth and want guaranteed support, custom development, or hosting, we’re happy to sell that to you. The price is often a fraction of the other solutions out there. Another differentiator for FusionAuth is that you can host it wherever you want: in your data center, in your cloud, or on our cloud servers. Not every client needs that level of control, but many do.

I really love the business model of providing a ton of value to your end users and monetizing only a small percentage of them with unique needs. (I’ve been involved in this type of business before.) The business thrives and there’s a ton of consumer surplus generated.

I’m really excited about this opportunity. It’s a nimble company with a passionate team based in Denver. If you need a user identity management system built from the ground up for developer happiness, please check us out.

Terraform with multiple workspaces and environments

I recently was setting up a couple of AWS environments for a client. This client had a typical web application which talked to an RDS database. There was DNS, a CDN and other components involved. We wanted to use Terraform to maintain traceability and replicability, and have the same configuration for production and staging, with perhaps small differences like ec2 instance size. We also wanted to separate out the components into their own Terraform workspaces to limit the blast radius (so if one component had changes that caused issues or Terraform corruption, it wouldn’t affect others). Finally, we wanted each environment to have its own Terraform backend, again to separate the environments.

I wasn’t able to complete this project due to external factors (I left the position before testing could be completed), but wanted to share the concepts. Obviously I can’t share the working code, but I set up an example project which is simpler. That’s the project I’ll be examining in this post. I also want to be clear that while I’ve tested this as much as I could and have validated the ideas with others who have more Terraform experience, this hasn’t been run in production. You have been warned. (Here’s the Terraform docs about setting up modules, workspaces and repositories.)

Using a tool like Terraform is great for a number of reasons, but my favorite is that it lets you track changes to cloud infrastructure. More than once I’ve wandered into an AWS account and wondered why certain resources were set up in the way they were, and what might break if I changed them. There are occasionally comments, but it is far better to examine a commit. Even better to review the set of commits and see the customer request or bug tied to it. (Bonus link: learn more about Terraform and other cloudy tools in this podcast episode with the creator of Terraform.)

So this simpler example project has a lambda that writes to an SQS queue. For now, it just writes the date of invocation, but obviously you could have it reach out to an external API, read from a database, or do some kind of calculation. The SQS queue could then be read from by an EC2 instance, which processes the message and perhaps updates a database. You have three components of the system:

  • The lambda function
  • The SQS queue
  • The EC2 instance (implementation of which is left as an exercise for the reader)

The SQS queue is shared infrastructure and needs to be accessed by both of the other systems. However, the SQS system doesn’t need to know about either the lambda or the EC2 instance. Using Terraform, we can create each of these components as their own workspace. Each of the subsidiary systems can evolve or change (for instance, the EC2 instance could be replaced with an autoscaling group) with minimal impact on other systems. They could be managed by different teams as well if that made sense.

To enforce this separation, set up each component as a separate Terraform workspace. (All code is on github here.) I use remote state so that more than one person can manage the terraform state, and use the S3/dynamodb backend because we are targetting AWS and want a free scalable solution. This post assumes you know how to set up Terraform using s3/dynamodb as a remote state storage.

Here’s the outputs of the SQS system:

output "queue_url" {
  value = "${aws_sqs_queue.myqueue.id}"
}

output "queue_arn" {
  value = "${aws_sqs_queue.myqueue.arn}"
}

I explicitly define the output variables so I can pull them in from the lambda and EC2 workspaces. This is how you can do that.

...
data "terraform_remote_state" "sqs" {
  backend = "s3"
  config = {
    bucket = "${var.terraform_bucket}"
    key = "sqs/terraform.tfstate"
    encrypt = true
    dynamodb_table = "terraform-remote-state-locks"
    profile = "${var.aws_profile}"
    region = "us-east-2"
  }
}
...
resource "aws_lambda_function" "mylambda" {
...
  environment {
    variables = {
      sqs_url = "${data.terraform_remote_state.sqs.outputs.queue_url}"
    }
  }
}

The terraform_remote_state block defines the location of the previously defined sqs workspace, and the ${data.terraform_remote_state.sqs.outputs.queue_url} references that url. That is then injected as an environment variable into the lambda, which reads it and uses the url to create an SQS client. It can then post whatever message it wants.

You can see how this would work with any number of configuration parameters. If you have typical three tier database driven application with a separate caching layer you can create each of these major components and inject the values into either the environment (for lambda) or the userdata (for EC2). I’m not sure I’d use this with a microservices architecture because using a services registry might be more appropriate.

Note that the lambda component has a rudimentary lambda function (you have to define something). It also uses Terraform to deploy the lambda code. That’s fine for the toy example, but for production you will want to use a real CI/CD system to deploy your lambdas.

Now, suppose you want to run production and staging environments, because you are ready to launch. Here are the constraints you’d want:

  • Production and staging run the same config (except when staging is changing, of course)
  • Production and staging may differ in a few details (the size of the EC2 instance, for example)
  • Production and staging execute in different AWS accounts to limit access and issues. You don’t want an error in staging to affect production. This is handled by creating different profiles which have access to different accounts.
  • Production and staging execute in different Terraform backends for the same reason as the separate AWS accounts.

Staging and production can use the same git repository, but when pulled down they are kept in two places on the filesystem. This is because you need to specify the profile and the bucket when using terraform init. So you end up running something like these two commands:

git clone git@github.com:mooreds/terraform-remote-state-example.git # staging
git clone git@github.com:mooreds/terraform-remote-state-example.git production-terraform-remote-state-example # production

I set up the project so that staging can be managed by normal terraform commands (since that will happen more often), and that production uses either special incantations or a script. For the initialization of the production Terraform environment, this looks like: terraform init -backend-config="profile=trsproduction" -backend-config="bucket=bucketname". For staging, it’s just terraform init. I didn’t have a lot of luck switching between these two Terraform backends in the same filesystem location, so that having two trees was a straightforward workaround.

Any changes between production and staging are each pulled out to a variable, with the staging value as the default. Then each workspace has a script which applies the Terraform configuration to the production environment. The script sets variables to be the correct value for production. Here’s an example for the lambda workspace:

terraform apply -var aws_profile=trsproduction -var terraform_bucket="mooreds-terraform-remote-state-example-production" -var env_indicator="production" -var lambda_memory_size=256

We pass in the production terraform_bucket in case any references need to be made to the remote state (to pull in the SQS queue url, for example). We also pass in an increased lambda memory size because, hey, it’s production. Other things that might vary between environments: for example, VPC or subnet ids, API endpoints, and S3 bucket names.

For simplicity, we just use two profiles for staging and production (in ~/.aws/credentials), but any way of getting credentials that works with Terraform will work:

[trsstaging]
aws_access_key_id = ...
aws_secret_access_key = ...

[trsproduction]
aws_access_key_id = ...
aws_secret_access_key = ...

This lets us separate out who has production access. Some users can have both staging and production profiles (perhaps operations), and others can have only staging profiles (perhaps developers). You can pass region values in via variables as well.

Using this system, the workflow for a change would be:

  • Check out the terraform git repository
  • Create a feature branch (including an issue identifier)
  • Pull request and approval
  • Run terraform apply to apply to staging
  • Run any additional tests
  • Merge to master
  • Run prodapply.sh

Again, I want to be clear that I’ve implemented this partially, but I didn’t get a chance to run this fully in production. I tested all these concepts with the simple system mentioned above (and you can stand up your own using the code on github). There will be issues that I haven’t experienced. But I hope that this post helps illuminate the complexity of managing multiple workspaces and environments within a single Terraform github repository.

Ideas for marketing a dev centric product

I have a friend who has a company that produces a developer tool that provides a one stop shop for application authentication. It’s a good one and they’ve found some traction.

They are trying to follow the redhat/mongodb playbook–create a tool so good at solving developer problems that developers adopt it. Eventually some portion of enterprises will wake up one morning and realize that their developers are using this. The CIO will freak out and be happy to pay money to my friend’s company for support.

The question is, how do you get developers to find out about the solution and how it will make their lives better?

I am not a marketing guru, but I am a developer and have been involved in selling of software in the past, so I had some suggestions.

  • Read The New Kingmakers, by Stephen O’Grady, in particular “Courting the Developer Population” and “What to Do?”. You can get a PDF here. O’Grady has thought a lot about this problem and his blog is worth reading as well.
  • Find or create a community, and contribute to it. Depending on your audinence, this may be a reddit subforum, an old school php bbedit forum, a facebook group or even an IETF working group. An existing group is more likely to give you results, where a new group will give you more control. Aim for results.
  • Present case studies or testimonials of success using the software. How did a real person solve a real problem with your software?
  • If you can find analysts and/or bloggers focused in your problem space, contact them and offer to walk them through the solution. (To me, this seems more of a long shot, but could be a big hit if you find the right person.) Developer focused podcasts could be an option too, as long as you are talking about technology and the problem space, not your product.
  • Writing interesting content on your blog. Share it widely. Again, this doesn’t have to be about your product, in fact it shouldn’t. Instead, take topics you’ve learned by building your product, or in the communities of which you are part, or the problems developers have solved with your product, and highlight those topics.
  • Write talks for conferences. For developers, I’d look at conferences like gluecon and oscon. Meetups are good too–easier to get into but much less scalable. You attend virtual meetups as well as physical ones.
  • Set up a google alert for competitors’ names and keywords and see if you can add value to any discussions happening there.

Above all, don’t be salesy. Focus on making the developer’s life easy.

Moving a database driven side project to Netlify

FishermanI recently moved a side project to Netlify. Netlify, if you aren’t familiar with it, is a static file hosting service with a great workflow, free SSL certificates, and a built in CDN. I made the move because the side project was a database driven site, but didn’t really use other server side interactivity (no user generated content, etc). I haven’t invested much in the side project over the past couple of years, but it still gets tens of hits a day and is useful to a users. It’s a top hit on google for certain keywords. I didn’t want to spend time updating the underlying application, but wanted it to be more secure. The site is only updated over a month or so every year. It is then read only for the next eleven months of the year, as it provides information on a very seasonal service.

Moving to Netlify (or, frankly, any other static site provider) provides the following benefits:

  • extremely low cost (Netlify is free for the level of traffic I have).
  • faster browsing experience for the end user (due to being behind a CDN).
  • free SSL certificates, with no hassle of setting up letsencrypt myself.
  • indirection–the source site can now be anywhere. I could put it on an ec2 instance that I only boot up when I need to push a new build, or could even host it locally. This means that I don’t need to worry about the updating the source application.

I could have chosen to do this with S3/Cloudfront/AWS Certificate manager/Lambda (or using technologies from some other cloud provider). But even though I’m familiar with all the steps to do this, it was lot of work. And Netlify was free and had done all the grunt work of bringing all these technologies (or similar ones, I’m not familiar with the tech behind the site, but they do write about it a bit).

What steps did I take to move a database driven directory site to Netlify? Enough that I wanted to document this for my future self.

  • Change the DNS for the site to have a lower TTL. During the transition from your site to Netlify, users may see versions served from either the old site or the new site, and you want to minimize that.
  • Remove all forms and other server side interactivity that your site has. You can replace them with javascript driven interactivity (like Disqus for comments). Netlify has some support for functions, but I didn’t explore that.
  • Set up Netlify using a default URL (they provide it, it is something like ‘foo-bar-123.netlify.com’). I deploy from Bitbucket. (I know some people hate on Bitbucket, and I don’t like all of their UI, but they are free for private repos, which is a win for me.) This let me understand how to deploy a simple one page site.
  • Download the site. I used wget: wget -mk http://mysite.com/ . The switches set up mirroring and convert all links to be relative.
  • Move the site to a subdirectory (I used web) and configure Netlify to deploy the HTML from that subdirectory. This lets you have other scripts outside of the webroot that can help you build the site.
  • Check in the site with the downloaded content and push it up to Bitbucket.
  • Watch the site be deployed to the Netlify default URL.
  • Access the site via SSL: ‘https://foo-bar-123.netlify.com’ and look in the console for “mixed active content” errors. Fix those as applicable. (If this was for a client, rather than a side project, I’d solve all of these.) Note that if the source site is http only, you may need to hardcode https URLs.
  • See that “pretty” html pages like ‘https://foo-bar-123.netlify.com/about’ are rendered as text.
  • Ask on Stackoverflow about the problem.
  • End up solving it by generating a _headers file after downloading the site. Update Stackoverflow question with the answer.
  • Test again that the site at the default URL looks good.
  • Update the webserver config and DNS for the database driven site. I wanted it to answer to both the old addresses: mysite.com/www.mysite.com and a new address: generator.mysite.com
  • Update DNS to point mysite.com and www.mysite.com to the Netlify site. Instructions here.
  • Wait for DNS to propagate. When it does, check to make sure that SSL works.
  • Add a password for generator.mysite.com
  • Test the download process with the password (the wget command changes to wget --user=USER --password=PASS -mk http://mysite.com/
  • Write a script to do the download process.
#!/bin/sh
wget -mk  --user=USER --password=PASS http://generator.mysite.com/
mv generator.mysite.com mysitecom
cd mysitecom
find `pwd` -type f |grep  -v \\. |sed "s#`pwd`/##" > list 
for i in `cat list`; do echo "/$i" >> _headers; echo "  Content-Type: text/html" >> _headers; done
rm list
cd ..
rm -rf web
mv mysitecom web
git add web
git commit -m "pull in latest from generator.mysite.com"
# could do a git push here as well if you wanted

Now, any time that I make changes to the site, I need to run this script and push to Netlify. I could put it in cron or a scheduled lambda if I thought I was going to make changes often. For now, I’m content to run it manually. One downside is that I self-host my analytics code and that site is not SSL enabled. So I’ll need to make a change if I want to continue to get statistics.

So, in the end I have a fast, secure site that is hosted outside my infrastructure and that is easy to update. This process, while not trivial, was easy enough that it has me thinking about where else I could use it (this blog, for instance). Highly recommend.

Easily extracting conversations from a slack group

Man slack liningSlack is an amazing productivity tool when used correctly. One of the primary uses I’ve seen is for open source projects to provide support (Craft CMS, OG-AWS) or for communities to be built (Techfriends, Denver Devs). If you don’t have the luxury of the owner of your slack being Slack’s VP of engineering, the costs of $x/month/user can cause these types of slacks to remain on the free plan.

Which means that you are limited to the last 10k of messages.

And that’s fine for the vast majority of messages. Sometimes, however a discussion is so good that it deserves to be indexed and shared, which means it needs to be pulled out of the Slack walled garden and onto the web (I also wrote about how to do this with the Facebook Group walled garden last year). Sometimes you might just want to save it beyond the 10k message limit for your own selfish reasons.

You can of course do this extraction manually (I did so here and here). But that’s a lot of work.

Another option is to use Zapier. The slack integration is trivial to set up, and has a number of options. From there you can push to a google spreadsheet (if you want to do further reification) or directly to WordPress (or any of the other integrations).

The nice part about this is that the Zapier slack integration is that you have a variety of options that can trigger the publishing of a message to a spreadsheet:

  • a post of a public message in a specific channel
  • a post of a public message in any channel
  • starring of a message by you
  • attachment of a certain reaction emoji (I picked a floppy disk) to a message, no matter who adds the emoji

I’ve just started doing this but am excited to have a low friction way to pull high value conversations out of slack. Slack is great for synchronous communication and easy discussion. When real knowledge drops, it should be shared with the future and anyone who can type into a search box. Do make sure to let folks know because there may be some expectation of privacy that you’ll want to respect.