Who’s Afraid of Continuous Deployment?

Fish leaping to a larger pool

Leaping to larger pool

So, who’s afraid of continuous deployment? I am, for one. And I’m not alone. I taught hundreds of people in AWS courses over the past two years. We often discussed continuous delivery and deployment and I asked if this was practiced at their places of work. I’d say about 5-10% of folks said yes. I conducted a very informal survey across two technical slacks as well. Unfortunately I had my terms wrong and asked about continuous delivery:

Wanted to do a quick poll. Can you please give a thumbs up to this message if you or your team does continuous delivery of your software product, and a thumbs down if you don’t. And a :penguin: if it doesn’t apply?

The results were:

  • Did CD: 27
  • Did not do CD: 25
  • Does not apply: 3

In the poll, I defined continuous delivery as “if a change is merged to the mainline branch and passes all the tests, it is deployed to production (or whatever environment your customers see) without human involvement”. This was actually a source of discussion, as some folks were very close to this (they deployed to beta environments where only a few customers saw it, or required one human to push a button to actually release, but everything up to that point was automated). Also, someone shared this link about the difference between continuous delivery and continuous deployment. Turns out I was using the term continuous delivery incorrectly. What I defined as continuous delivery was actually continuous deployment. Whoops!

That said, it was interesting that a large number of folks did not deploy code automatically, almost half (note that I believe the poll had a bias because I asked in one slack on the #devops channel. The numbers from the other slack had less than half doing continuous deployment). I’ve worked at a number of small startups, some without paying customers, and I’ve never worked in a place with continuous deployment. I’ve been in jobs with continuous integration and continuous delivery (and this provides a lot of value) but not continuous deployment. I wanted to talk about some reasons why.

The first reason is that continuous deployment simply doesn’t apply. If you are building software that is deployed to customer sites (on-prem), or is tied to hardware, then it doesn’t make sense to work toward CD because there will always be a manual delivery component. Another reason why it might not apply is legal compliance. Folks in the slacks pointed out that in some regulatory regimes you legally are required to have a human ‘push a button’ to deploy because more than one person needed to be involved in a code deploy to satisfy the law and the auditors. These are totally legitimate reasons for not doing continuous deployment.

Next, let’s discuss the reasons based on fear or lack of software hygiene (automated tests or a robust type system). Before I step into this, I want to acknowledge that there may be times in the life of your business where such software hygiene is detrimental to your chances of survival–you need to get an MVP out and test your value in the market, for example. However, in my years of experience I find that following proper software hygiene is far easier to do if adhered to from the beginning. If you don’t, eventually the difficulty of changing the system will grow along with its complexity. You can bolt on testing later, but it is difficult.

I also want to emphasize that I’ve been in all these situations myself. In some ways this blog post is a warning for future me when I try to shirk these practices.

  • If you don’t have automated test coverage, continuous deployment is reckless. This often happens in systems where the testing was bolted on after the system had been developed for a while. The solution is to work towards having enough test coverage to give yourself confidence (it swaddles your code).
  • A system may have configuration deeply tied to a database. Many content management systems are in this boat, which makes it very difficult to roll new configuration forward automatically.
  • Not having an automated rollback strategy. If you are going to continuously deploy, you need to have a way to rollback with confidence, with one script. If you are on heroku, heroku rollbacks help here. If you are running rails code, you can use db:rollback but you’ll need to know how many steps to rollback (I couldn’t find anything that rolled all migrations back to a given timestamp) and you’ll want to be careful about losing data. It may make more sense to run migrations in a different release, and always have the code be backward compatible. Lots of interesting reading about that strategy in strong_migration’s docs. This solution will vary from application to application.
  • Not having enough users to safely canary. One way to know if your new release has problems is to do a blue/green deployment and send just a fraction of your traffic there (you could use a weighted DNS round robin solution). But if you only have a small number of users, the canary userbase won’t adequately run through all the code paths.
  • Fear of breaking key user flows. At a recent company we did basic manual regression tests just before deployment. These could have been easily automated via selenium and would have made sure that at least basic functionality was available. Also see this post from 2013 on smoke testing.

All of these are not really technical issues, they’re prioritization issues. At this point in time most web applications can be continuously deployed. The tooling and the knowledge is out there, given the business and technology teams commitment.

However, this in some ways sidesteps the real question. Why is continuous deployment a goal worth prioritizing, especially when the team has to spend time supporting that instead of giving customers more features? CD is extra work to set up, but once it is running then you can deliver features at a very rapid pace, and you never have a feature sitting around waiting for other orthogonal features. So, in a way, it will actually lead to more features and better development. There’s also the long term benefits of software hygiene for the ability of the system to evolve.


What can you cut out?

Fractal image

Perhaps we could have made the site map a bit simpler?

“I have made this [letter] longer than usual because I have not had time to make it shorter.” – Blaise Pascal

I was a mentor for Go Code Colorado over the weekend (mentioned previously). It was a good experience. About 10 teams, 30 mentors, and a couple of hours. I had a lot of fun chatting with the teams, which were all using open data provided on the Colorado Information Marketplace to build an app that will serve a need. They divided the mentors up into functional areas (data science, marketing, developer, startup vet, etc) and let us wander amongst the teams. Sometimes I felt a bit useless (one team was trying to debug a Meteor app that would run locally but failed when deployed to a web server) and other times I felt like I was a bit of a bother (since the teams were also trying to get stuff done while being “mentored”). But for the most part I had interesting conversations about what the teams were trying to accomplish and the best means of doing so from a technical perspective.

One thing that came up again and again was “what can you cut out”. The teams have a fixed timeline (they are only allowed to work until the final competition in early June) and some of the ideas were pretty big. My continuing refrain was:

  • capture all the big ideas on a roadmap (you can always implement them later)
  • cut what you can
  • build a basic “something” and extend it as you have time
  • choose boring technology

For example, one project was going to capture some data and use the blockchain for data storage. I totally get wanting to explore new technology but for their initial MVP they could just as easily use a plain old boring database. Or frankly a spreadsheet.

Lots of developers don’t want to hear it, but when you are in the early stages of a startup, technology, while an important enabler, can get in the way of what is really important: finding customers, talking to them, and giving them something to pay for.

PS This is a great read on all the hardships of building a startup and how it is so so so important to minimize any unnecessary difficulties.


Workers in the Gig Economy Have Tremendous Autonomy

Driving a carThis essay by Bill Gurley, “The Thing I Love Most About Uber”, is well worth reading. In it he discusses the insane level of flexibility working for Uber (or, though he doesn’t state it, Lyft) gives the drivers. He also goes into some great details about the typical driver and earnings.

I have been a contractor for much of my career and when I was, I placed a large value on freedom. Freedom to choose clients, freedom to take time off, freedom to work when I needed to. As a software developer, if you are willing to accept the associated risks you’ve been able to choose autonomy for decades. (Some contractors are even more autonomous.)

But that level of autonomy still requires large blocks of contiguous time, some level of marketing capability, and specialized knowledge.

In the USA today, the ability to drive and car ownership is ubiquitous (88%). And Uber/Lyft take care of the marketing. And the demand is such that you don’t necessarily need large blocks of time. So the autonomy provided is at a much higher level than any previous type of contractor.

This is amazing. I can’t think of another market where the demand and supply pools are so large and the time and skill commitment are so small.


“Someone Else’s Problem”

Fingers pointing at each other

The bad kind of SEP

I remember when I first heard the acronym SEP. It was way back in the dawn of my career, and a grizzled old contractor was talking about some aspect of a project.

“The flurbuz is janky. Welp, SEP.” – him

“What does that mean” – me

“It’s ‘Someone Else’s Problem’ now.” – him

“Oh.” – me

It has stuck with me all my career, but as I think about it a bit more deeply, I realized that there are two different kinds of SEP declarations–one good and one bad.

The bad type of SEP is when a problem doesn’t have an owner and everyone is ducking it. You know, that architectural original sin no one wants to face, or the client who is late but no one wants to cut off. If everyone is saying SEP, then the problem will never be solved. No good. The remedy here (in small cos, at least) is often to have everything “fall up” to the CEO. She can be the utility player who owns everything not otherwise assigned. Of course, tackling all of these orphan problems can prevent her from focusing on her main job, but at least she has the clout to make the problem someone else’s if need be.

The good version of SEP is also known as delegation. This means you’ve handed off the problem to someone else, whether on your team, an external contractor, or a client who wants to take some part of a project in house . And when you say SEP, you are trusting them to take care of the issue. This allows you to focus on other tasks. It does require you to trust that they can do what was agreed to. (They did agree to it before it became their problem, right? If not, this may be an example of the bad type of SEP, or at a minimum a chance for a ball to be dropped.)

For me personally, delegation is a skill I struggle with, even though allows me to be more effective. So saying “SEP” and letting the other party own it is a great way for me to practice that skill.


The Original Sin Of A Software System

AppleI had coffee with an acquaintance a while back who works in a software company. We were talking about their system and he referred to the “original sin”.

It immediately struck a nerve.

Whenever I (the “I” being a team, as it’s extremely rare to build anything by yourself) build a system, I make assumptions based on a number of factors. What I need to provide functionality immediately. Where I expect the system to go in the short term. Where I expect the system to go in the long term. What I need to do to make the system maintainable. What I need to do to make sure I can run the system.

All of these assumptions mix together into an architecture and a design. Similar to the way I imagine one would lay out a subdivision, the early choices (where to put roads and lots) effect later decisions (where trees, parks, and homes go, which homes get the best view).

Often, one or more of these assumptions is incorrect, and it is often one of the core decisions, rather than one of the ancillary ones. (Or, perhaps they both have an equal chance to be incorrect, but if it is a core assumption, I notice it more.) This is the “original sin”.

I can think back to all of the systems that I ran for any length of time in production (that is, with real customers) and name an original sin (at least one, oftentimes more).

What do you do with this mistake?

You can live with it

The mistake is embedded so deep, or the business is evolving so quick, or there are so many other features that it never makes sense to rectify the issue. It just sits in your system, bugging you when you see it, making new features more complicated to build.

You can rip it out

If you have the time and now know what you didn’t know when you first built the system, you can modify the system to remove the original sin. This can take time and effort, because it is embedded and probably touches many pieces of your system. It also may slow down or stop new development. When you are done, you’ll have a system that is better for the world your business is living in now. However, you may add a new original sin.

You can work around it

Instead of living with it (no change to it) or ripping it out (pulling it out at the roots), you can work around it. I’ve done this before by swapping out pieces of an original sin component. This is an iterative approach that that can be intermingled with feature development. This is the most pragmatic approach for long lived software.

You can abandon the system

If the system has a really fundamental original sin, and the technology world has moved on, you may consider abandoning the system, either for a new build or a new buy. Don’t think of the money you spent on the first system as wasted, think of it as tuition.

It is tough to make predictions about the future usage of your system. Building a system is a balance between getting it done and making it flexible enough to meet future needs. However, some decisions are so rooted in your system that they will have ramifications for as long as those systems are used. And some of the decisions you make will be wrong, and you’ll be confronted with that fact as long as you work on that system. It’s OK to make these mistakes, but when you see one, think long and hard about how best to rectify it, balancing development and business pain. Discuss it with your team, including non technical members, so that everyone understands the pain.


Navigating new systems

A mazeHere are some tips and tricks I have for navigating new software systems, which can sometimes be like navigating a maze. If you’re truly unlucky, it’s a maze, but you’re blindfolded and the walls are covered in randomly placed razors.

The first is to get a clear set of expectations. Will I own the system? Who owns it now? How long have they owned it? How often is it modified? When will it need to be modified again? Is it shaky or stable? Getting these questions answered helps me understand and refine further steps.

The next step is to gain access. There are a lot of different pieces of most modern systems, so access can mean different things. Here are some kinds of access which it may be worth seeking:

  • shell
  • version control
  • database
  • ftp
  • http
  • app level (admin, user)
  • documentation
  • project planning
  • different CI environments (prod, UA, staging)
  • build system
  • admin users
  • end users

After I have access, I like to look at the front end and the back end. By the front end, I mean the user interface. And by the back end I mean the data store. Just looking around and seeing what tables and pages an application has can help.

If the system in question is not entirely custom built, googling for the user guide for the default version of the application can be helpful. Finding that user guide and skimming through it can give me more high level understanding, as well as teaching me key nomenclature. Of course if there is any local documentation, that’s helpful too, but I read that with a skeptical eye, as it doesn’t always keep pace with the system.

I also like to look at logfiles. This can help me determine something as simple as if I’m on the correct server (if I reload the page and the access log file doesn’t change, I am looking at the wrong log file or am on the the wrong server). Even better if the system aggregates logs into something like an ELK system or papertrail.

Setting up a local development environment can help. Again, this lets me gain an understanding of the big picture components, and also lets me poke at various parts of a system, possibly breaking them, without affecting other developers or, worse, customers.

Asking questions is really important, but this can be hard because often the folks with the most knowhow are the busiest.

I also like to see what files or database tables change as I move through the system. With a modest sized database, I do this by taking a database dump before taking some action and then after. Then I diff the files, sometimes using sed to break the dump file apart even further (replacing all commas with commas and newlines, for example). If using mysqldump, you can target individual tables and make sure not to use extended inserts, as that makes diffing harder.

For the filesystem, it’s even easier. I touch a file (ts) and then take the action, then run find . -newer ts -print. This command will show me all the files the system has written that are newer than ts.

Hopefully some of these tips will be helpful to you as you navigate your next new system.


My first day at Culture Foundry

I am excited to pen that title. I’ve joined Culture Foundry, a digital agency that connects the world through beautiful technology. There are a number of reasons I accepted a position with this firm, but a few that jump to mind are:

  • they are good people. This is important as a great job with a bad manager is no fun. A great manager can help make a bad job better.
  • they are working on interesting technology problems, including API integrations and high traffic websites.
  • they are 100% remote. This flexibility is really important to me.
  • they work on stuff their clients’ value.
  • the team is big enough to take on larger projects, but small enough to be agile.

For now, I’m going to be drinking from the fire hose, trying to get up to speed on their systems, so the blogging may slow down a bit. But I’ll definitely be sharing things I learn from this new opportunity. Here’s to new adventures!


Useful gem: stripe_event

If you are going to use stripe for payments, you need to set up your webhooks. If you are using rails, the easiest solution I’ve found is stripe_event. This gem mounts a configurable endpoint and takes care of all the authentication you need to receive the webhooks.You then set up configuration in an initializer to receive the various webhooks you want to receive. The type of hooks you want depends on your application, but all the available events are listed here and the stripe support folks are happy to point you toward interesting ones if you approach them with a problem.

You can (and should) test the stripe events by using fixtures and request tests. I found the most difficult part of that testing process to be getting sample data for the json payload. The documentation has some, but you may need to run a sample event through your test dashboard and capture the json via a generic webhook capture. I ended up using this type of puts debugging to help get the json for events:

events.all do |event|
  ## debugging
  puts "xxxdebugging all events"
  puts event.to_s
end

In my experience, we never received enough load to really stress out this gem (I’ve seen maybe 30 requests a minute), but if you plan to have a high webhook load, you may want to do some load testing.

Definitely a gem worth having if you are using stripe.


Zen of Code Review Series

Tree trunk with mushroomsI stumbled (thanks John!) on this post about the right way to review code. This is a key skill for working in a team and one that is underappreciated. A solid code review makes sure the code changes can be understood.

If code can’t be understood, it can’t be changed. If code can’t be changed, business processes can’t be changed (code is, after all, just business process made digital). If business processes can’t be changed, the organization can’t adapt to new opportunities or threats. Have you ever seen a tree grow around a fence? The fence constrains the tree, but the tree keeps trying its best to work around the fence. Neither end up serving their inner purposes. That’s what I think of when I see code that can’t be changed.

Here is a brief excerpt to give you a flavor of the post.

Your goal, then, is clear: question, probe, analyze, poke, and prod to make sure that you, the reviewer, could support the code presented to you for review. From an overall perspective, there are several questions to keep in mind as you begin your task…

It seems like this is a series, where the author discusses code reviews from a variety of different perspectives. Recommended.


Big hammer or small hammer?

HammerI was talking to a friend the other day about startup vs big company life and he used an analogy so good I’m going to steal it (and expand upon it).

If you think about the problem you are trying to solve as a rock, and the business you are in that is trying to solve as a hammer with which to chisel or otherwise transform said rock, you can choose a brick hammer (which is a small hammer) or a sledge hammer (a large, heavy hammer), or anything in between.

The smaller the hammer, the more effort you have to put into the swing. However, it’s fairly easy to pick up, to manipulate and to re-orient if you decide you need to approach the rock from a different viewpoint.

If you, on the other hand, choose the sledgehammer, then when you swing you are wielding a lot of force. It becomes easier to make progress on your initial approach, but if you need to switch up your emphasis, it’s going to take some time, because of the weight of the hammer.

The larger the business, the more leverage and power you have to attack a single problem. I’ve worked at large companies in the past, and I can tell you the size and scope of problems they were able to work on, often in parallel, were amazing. However, there was a lot of time and effort spent on coordinating those efforts, and and a lot of bureaucracy and red tape if there was a process improvement needed. (There was also dead weight at some of these companies.)

At a smaller company or a startup, as I’ve worked at in the past, I didn’t have the bandwidth to take on multiple large projects. Doing more than one or two major projects was a recipe for distraction and impotence. However, when focusing on one effort, it was easy to try different approaches, work really hard and be super flexible when incorporating feedback from customers and iterating.

There are strengths in both the small and big hammer approaches. The important thing is to choose what is a good fit for both the problem you are trying to solve and your working style (which may change over time).



© Moore Consulting, 2003-2017 +