Skip to content

What senior engineers do: fix knowledge holes

I had a bad week a few weeks back, and got together with a friend and former colleague. We started trading war stories. He mentioned one time when he was working for a company which had both a desktop client and a server component. The communication between the two pieces of the system was completely, utterly undocumented. The original engineers who had written each piece of the app had departed. The system was key to the company’s business (it was what they sold!).

This “knowledge hole” was identified by my friend. He was a software engineer. This was a key piece of software, but was unknown. The problem was not getting solved.

So, he solved it.

He installed the client and a TCP sniffer, and clicked around the client. He recorded all the traffic. He literally reverse engineered the client/server communication protocol. He then documented what the communication protocol was, so the the next group of engineers could benefit. He was was a software developer who was responsible for the back end systems. I don’t think he was directly responsible for the client/server protocol and the client was, I believe, outside his purview. (Updated 7/20) Note, he was a software engineer, not QA or a network engineer. But he didn’t let the role definition stop him.

I thought to myself, this is the textbook definition of a senior engineer. You see a problem, you solve it (thoroughly), you document it and you level up your team.

It was pretty awesome to hear about.

Exercism.io: Level up your coding skills

Children at playI’ve really enjoyed Exercism.io. This is an online learning platform for coding. It’s similar to HackerRank or Codewars.

The main difference is that for a subset of the problems, you actually have a mentor give you feedback. You download a problem (with tests) and code up the solution however you like. Then you push the solution up to a central repository where a volunteer mentor reviews it. I’ve had a different mentor for every problem. The same mentor stays with you for the duration of one single problem, and gives you feedback on style and API usage. They also provide accountability in a way that a computer grading my code just doesn’t do it for me.

Each language has a track (as well as some specialized frameworks like React). I’m working on two tracks right now. The first is a language I’m intermediate in (ruby). I’ve learned a lot about the standard library as well as ruby idiom. The second is a language I’m at best a beginner at: javascript (specially modern javascript which has changed a lot).

But there are a lot of other languages I’m looking forward to looking at when I finish these tracks (functional languages like OCaml, for one).

A final benefit of Exercism is that each problem is a challenge but not an impossible one. I find coding up a naive solution to the challenge takes about 15 minutes, and I’ll often do that first before I try to refine it to make it a bit better. It’s kinda like TDD but someone has written all the tests for you, so you just get to do the fun part.

Check it out.

Ever felt like your codebase was out of control?

I certainly have. A couple of times in my career the combination of technical debt, business model shift and lack of time for a proper fix have left me feeling out of control.

But reading this post on Hacker News made me realize that it all could have been so so much worse. A couple of “best ofs”:

To give you some examples, I originally came on as a contractor because they had some refactoring they wanted done. The entire system was home built (including the programming language) and there was a file size limit of 32,767 lines. They had many functions that were approaching this limit and they didn’t know what to do, so they hired me.

and:

Once upon a time, there was a search product and one of the data sources that it could search was a Solr/Lucene database. This should be no problem, since search is what Solr does. It should be as simple as passing the user’s query through to Solr and then reading the response. The problem was, it was important to know exactly which parts of any matched records were relevant to the search.

 

The Guy Before Me™ decided that the best way to implement this would be to split the user’s search into individual words, perform a separate search query through Solr’s HTTP API for each individual word, and then do a bunch of very clever and complex post-processing on the result sets to combine them into a single set of results.

and (last one, I promise):

At my first gig I teamed up with a guy responsible for a gigantic monolith written in Lua. Originally, the project started as a little script running in Nginx. Over the course of several years, it organically grew to epic proportions, by consuming and replacing every piece of software that it interfaced with – including Nginx.

 

There were two ingredients in the recipe for disaster. The first is that Lua comes “batteries excluded”: the standard library is minimalist and the community and set of available packages out there is small. That’s typically not an issue, as long as one uses Lua in the intended way: small scripts that extend existing programs with custom user logic (e.g. Nginx, Vim, World of Warcraft). The second is that Lua is a dynamic language: it’s dynamically typed, and practically everything can be overridden, monkey patched and hacked, down to the fundamental iterators that allow you to traverse data structures.

shivers. There, but for the grace of God.

Trust the compiler

I loved this post not because I love reading assembler but because it just illustrates so perfectly how often, when writing software, we can easily stand on the shoulders of giants.

My point is not that we should take what we’ve learnt from the LLVM-generated code and write a new version of our hand-rolled assembly. The point is that optimising compilers are really good. There are very smart people working on them and computers are really good at this kind of optimisation problem (in the mathematic sense) in a way that humans find quite difficult. It’s the job of language designers to give us the tools we need to inform the optimiser as best we can as to what our true intent is, and larger integer sizes are another step towards that.

Who’s Afraid of Continuous Deployment?

Fish leaping to a larger pool
Leaping to larger pool

So, who’s afraid of continuous deployment? I am, for one. And I’m not alone. I taught hundreds of people in AWS courses over the past two years. We often discussed continuous delivery and deployment and I asked if this was practiced at their places of work. I’d say about 5-10% of folks said yes. I conducted a very informal survey across two technical slacks as well. Unfortunately I had my terms wrong and asked about continuous delivery:

Wanted to do a quick poll. Can you please give a thumbs up to this message if you or your team does continuous delivery of your software product, and a thumbs down if you don’t. And a :penguin: if it doesn’t apply?

The results were:

  • Did CD: 27
  • Did not do CD: 25
  • Does not apply: 3

In the poll, I defined continuous delivery as “if a change is merged to the mainline branch and passes all the tests, it is deployed to production (or whatever environment your customers see) without human involvement”. This was actually a source of discussion, as some folks were very close to this (they deployed to beta environments where only a few customers saw it, or required one human to push a button to actually release, but everything up to that point was automated). Also, someone shared this link about the difference between continuous delivery and continuous deployment. Turns out I was using the term continuous delivery incorrectly. What I defined as continuous delivery was actually continuous deployment. Whoops!

That said, it was interesting that a large number of folks did not deploy code automatically, almost half (note that I believe the poll had a bias because I asked in one slack on the #devops channel. The numbers from the other slack had less than half doing continuous deployment). I’ve worked at a number of small startups, some without paying customers, and I’ve never worked in a place with continuous deployment. I’ve been in jobs with continuous integration and continuous delivery (and this provides a lot of value) but not continuous deployment. I wanted to talk about some reasons why.

The first reason is that continuous deployment simply doesn’t apply. If you are building software that is deployed to customer sites (on-prem), or is tied to hardware, then it doesn’t make sense to work toward CD because there will always be a manual delivery component. Another reason why it might not apply is legal compliance. Folks in the slacks pointed out that in some regulatory regimes you legally are required to have a human ‘push a button’ to deploy because more than one person needed to be involved in a code deploy to satisfy the law and the auditors. These are totally legitimate reasons for not doing continuous deployment.

Next, let’s discuss the reasons based on fear or lack of software hygiene (automated tests or a robust type system). Before I step into this, I want to acknowledge that there may be times in the life of your business where such software hygiene is detrimental to your chances of survival–you need to get an MVP out and test your value in the market, for example. However, in my years of experience I find that following proper software hygiene is far easier to do if adhered to from the beginning. If you don’t, eventually the difficulty of changing the system will grow along with its complexity. You can bolt on testing later, but it is difficult.

I also want to emphasize that I’ve been in all these situations myself. In some ways this blog post is a warning for future me when I try to shirk these practices.

  • If you don’t have automated test coverage, continuous deployment is reckless. This often happens in systems where the testing was bolted on after the system had been developed for a while. The solution is to work towards having enough test coverage to give yourself confidence (it swaddles your code).
  • A system may have configuration deeply tied to a database. Many content management systems are in this boat, which makes it very difficult to roll new configuration forward automatically.
  • Not having an automated rollback strategy. If you are going to continuously deploy, you need to have a way to rollback with confidence, with one script. If you are on heroku, heroku rollbacks help here. If you are running rails code, you can use db:rollback but you’ll need to know how many steps to rollback (I couldn’t find anything that rolled all migrations back to a given timestamp) and you’ll want to be careful about losing data. It may make more sense to run migrations in a different release, and always have the code be backward compatible. Lots of interesting reading about that strategy in strong_migration’s docs. This solution will vary from application to application.
  • Not having enough users to safely canary. One way to know if your new release has problems is to do a blue/green deployment and send just a fraction of your traffic there (you could use a weighted DNS round robin solution). But if you only have a small number of users, the canary userbase won’t adequately run through all the code paths.
  • Fear of breaking key user flows. At a recent company we did basic manual regression tests just before deployment. These could have been easily automated via selenium and would have made sure that at least basic functionality was available. Also see this post from 2013 on smoke testing.

All of these are not really technical issues, they’re prioritization issues. At this point in time most web applications can be continuously deployed. The tooling and the knowledge is out there, given the business and technology teams commitment.

However, this in some ways sidesteps the real question. Why is continuous deployment a goal worth prioritizing, especially when the team has to spend time supporting that instead of giving customers more features? CD is extra work to set up, but once it is running then you can deliver features at a very rapid pace, and you never have a feature sitting around waiting for other orthogonal features. So, in a way, it will actually lead to more features and better development. There’s also the long term benefits of software hygiene for the ability of the system to evolve.

The Original Sin Of A Software System

AppleI had coffee with an acquaintance a while back who works in a software company. We were talking about their system and he referred to its “original sin” (it was a database choice, for what that’s worth).

It immediately struck a nerve.

Whenever I (the “I” being a team, as it’s extremely rare to build anything by yourself) build a system, I make assumptions based on a number of factors. What I need to provide functionality immediately. Where I expect the system to go in the short term. Where I expect the system to go in the long term. What I need to do to make the system maintainable. What I need to do to make sure I can run the system.

All of these assumptions mix together into an architecture and a design. Similar to the way I imagine one would lay out a subdivision, the early choices (where to put roads and lots) affect later decisions (where trees, parks, and homes go, which homes get the best view).

Often, one or more of these assumptions is incorrect, and it is often one of the core decisions, rather than one of the ancillary ones. (Or, perhaps they both have an equal chance to be incorrect, but if it is a core assumption, I notice it more.) This is the “original sin”.

I can think back to all of the systems that I ran for any length of time in production (that is, with real customers) and name an original sin (at least one, oftentimes more).

What do you do with this mistake?

You can live with it

The mistake is embedded so deep, or the business is evolving so quick, or there are so many other features that it never makes sense to rectify the issue. It just sits in your system, bugging you when you see it, making new features more complicated to build.

You can rip it out

If you have the time and now know what you didn’t know when you first built the system, you can modify the system to remove the original sin. This can take time and effort, because it is embedded and probably touches many pieces of your system. It also may slow down or stop new development. When you are done, you’ll have a system that is better for the world your business is living in now. However, you may add a new original sin.

You can work around it

Instead of living with it (no change to it) or ripping it out (pulling it out at the roots), you can work around it. I’ve done this before by swapping out pieces of an original sin component. This is an iterative approach that that can be intermingled with feature development. This is the most pragmatic approach for long lived software.

You can abandon the system

If the system has a really fundamental original sin, and the technology world has moved on, you may consider abandoning the system, either for a new build or a new buy. Don’t think of the money you spent on the first system as wasted, think of it as tuition.

It is tough to make predictions about the future usage of your system. Building a system is a balance between getting it done and making it flexible enough to meet future needs. However, some decisions are so rooted in your system that they will have ramifications for as long as those systems are used. And some of the decisions you make will be wrong, and you’ll be confronted with that fact as long as you work on that system. It’s OK to make these mistakes, but when you see one, think long and hard about how best to rectify it, balancing development and business pain. Discuss it with your team, including non technical members, so that everyone understands the pain.

Zen of Code Review Series

Tree trunk with mushroomsI stumbled (thanks John!) on this post about the right way to review code. This is a key skill for working in a team and one that is underappreciated. A solid code review makes sure the code changes can be understood.

If code can’t be understood, it can’t be changed. If code can’t be changed, business processes can’t be changed (code is, after all, just business process made digital). If business processes can’t be changed, the organization can’t adapt to new opportunities or threats. Have you ever seen a tree grow around a fence? The fence constrains the tree, but the tree keeps trying its best to work around the fence. Neither end up serving their inner purposes. That’s what I think of when I see code that can’t be changed.

Here is a brief excerpt to give you a flavor of the post.

Your goal, then, is clear: question, probe, analyze, poke, and prod to make sure that you, the reviewer, could support the code presented to you for review. From an overall perspective, there are several questions to keep in mind as you begin your task…

It seems like this is a series, where the author discusses code reviews from a variety of different perspectives. Recommended.

“Choose boring technology”

Reading a boring bookThis article on choosing boring technology is so good. It’s from 2015 so some of the tech it references is more mature, but the thesis is that the technology underlying a business should be well known and boring to implement. It also discusses how important cohesive technology can be (I love the final footnote).

I think this is an interesting contrast to microservices, which is essentially solving repeatedly for local optimizations and then relying on the API or message boundary to allow scaling. Microservices, in my experience, often get a bit hand wavey when it comes to operationalization (“just put it on Kubernetes, it’ll be fine”).

I have definitely dealt with this in previous companies where I had to put the kibosh on new technologies being introduced (“nope, we’re going to do it in java”). This meant that as a small team we had a standard deployment stack and process and could leverage our logging knowledge and debugging tools.

Here are some arguments I hear for using the latest and greatest technologies:

  • “Our developers will get bored and we’ll have a hard time recruiting”
    • This is a great reason to have regular hackfests and or support developers working on open source or side projects.
  • “New tech will help us do things faster”
    • It’s definitely worth evaluating new technology periodically and adding it to your stack (especially if it is outsourced, a la RDS, or boring, a la memcached). As the original post mentions, if you get substantial benefits from the new technology, then use it full bore and plan for a migration. Don’t end up in a state where you are half in/half out. Also consider tooling and processes to get things done faster–these may have quicker iteration times with lower operational risk.
  • “Choose the right tool for the job”
    • Most turing complete languages can be used for any purpose. And you shouldn’t be optimizing for the right tool for the job, but rather the right tool for the business for the job.

Really, you should just go read the post. It’s good.

 

“Run less software”

RunningThis post from the folks at Intercom makes some really good points about the benefits you can get from leveraging other software solutions. It’s an interesting article, but the source works at a company that offers a SaaS solution to help you help your customers (I’m a fan). You can read some interesting discussion at HN. An interesting quote from the post:

Choosing standard technologies is very similar to this, only in software. You need to constrain yourself to solving problems mostly, but not exclusively, with a small, specific set of standard tools. By doing this over time we become experts in them. Then, we are able to build better and faster solutions from them.

From my experience with small teams, having a standard software stack is a big win. It lets you reuse business logic and skills across your business. (The spread of microservices may mean this matters less in the future, though with a small team and product microservices may be overkill.)

However, I wish Rich had titled it “write less software” because that is really what he is advocating for. When I use AWS or Intercom or Gmail, I’m not really running less software. In fact, I may be running more. Since I don’t have insight into those applications, I really don’t know. But I’m not concerned about writing, reading and maintaining as much software, and that is the true win.

Book Review: Working With Coders

Woman with 1s and 0sSoftware is so integral to business processes and relatively inexpensive compared to labor that I believe every company is going to be a custom software company, in the same way that every company is an accounting company or every company uses paper. I happened on an interesting blog post and saw the author had written a book, “Working With Coders”. How non technical folks interact with coders is a topic of perennial interest to me, so I picked it up after reading the first few pages on Amazon. The book is written for clients, CEOs or project managers who are going to be working with developers to deliver applications that will provide business value.

Frankly, I couldn’t put it down.

The author, Patrick, is an engaging, opinionated writer. He breaks down complicated concepts into easily digestible pieces. Where there’s more to the story, there’s a footnote with a snarky comment or a link to more information. Patrick also provides nuts and bolts examples to show why something that seems simple to change is not (scaling text in a browser, for example). He also covers how big decisions like language, frameworks and library choices at the beginning of a project constrain freedom and choices further down.

Patrick covers what developers do, how they think, and why projects often fail. I thought his explanation of the benefits of agile development was darn good, and his explanation that even agile projects fail more often then they succeed was pretty depressing. He also discusses how the house construction metaphor for building software is just a big fat untruth.

I also enjoyed the section about testing in general, the various types of testing, and where they make sense. There’s also a section on finding coders, including a good explanation of why not to hire them as employees (you might be better off just hiring a development shop, depending on your needs). The chapter on how to deal with common issues (“the team hates each other”, “we’re behind schedule”) was worthwhile. His solutions won’t work for everyone. Maybe you’ll want to deal with these issues differently, but considering them before they happen will only help you prepare.

Of course, I also enjoyed the chapter on how to keep coders happy (continuous learning, quiet, a fast computer). In general the author is careful to avoid stereotypes, but does do a good job of covering common themes. I haven’t met too many developers who love working in bullpen environments.

I am definitely not the target audience. Neither is someone who is an experienced manager of developers. However, I am a subject of the book, so it resonated with me and I definitely found myself nodding along. There aren’t too many books I have wanted to distribute copies of (the two others are “The Hard Thing About Hard Things” and “Climate Wars”), but this is one.

If you work in a consulting practice with inexperienced clients or if you work in a product company with an owner or higher up that isn’t technical, reading this book will give you insights into their questions and thought processes. And if you can find a way to give them this book without being condescending (“hey, I found this book fascinating for helping facilitate conversation, maybe you will too”), both they and you will benefit.