Book Review: Algorithms to live by

I finished Algorithms to Live By, by Brian Christian and Tom Griffiths. I enjoyed it immensely.

The premise of the book is that computer programs make decisions with algorithms all the time. There’s math behind how they do so, including tradeoffs and time considerations. Human beings face some of the same decisions and there’s no reason we can’t use this knowledge to live better lives.

Yes, this is kind of a self help book—“you can live a better life by thinking like a computer”. But it’s math, folks.

I actually recommended it to my SO because I feel like she’d understand me better after reading it. I’m always talking about how much of the stress in our life is due to resource contention.

The book covers a wide swathe of decision making. Here are some examples of the broad categories and some specific references:

  • when to decide to stop looking for a house or a partner
  • when to explore new knowledge vs exploit current knowledge in the context of clinical trials
  • how randomness can lead to better outcomes
  • how caches can help you determine what of your wardrobe to keep
  • how overfitting warps sports like fencing

As illustrated above, this is not a book about theory, but is actually hands on. I don’t recall seeing a single equation, though there are graphs. And the authors do mention plenty of researcher names, provide footnotes and have a twenty page bibliography. So if you want to learn more about the formal proofs, the info is there.

It’s hard to choose just one takeaway from this book, but if I had to pick one, it’d be the fact that game theory shows that you need external forces to avoid the tragedy of the commons, and that emotions may play a role in providing that.

Here’s an excerpt if you want a deeper look.

If you spend any time thinking about how you can make decisions better, Algorithms to Live By is worth reading.


What I wish I had known when I was starting out as a developer

As I get older, I wish I could reach back and give myself advice. Some of it is life advice (take that leap, kiss that girl, take that trip) but a lot of it is career advice.

There’s so much about being a software developer that you learn along the way. This includes technical knowledge like how to tune a database query and how to use a text editor. It also includes non technical knowledge, what some people term “soft skills”. I tend to think they are actually pretty hard, to be honest. Things like:

  • How to help your manager help you
  • Why writing code isn’t always the best solution
  • Mistakes and how to handle them
  • How to view other parts of the business

These skills took me a long time to learn (and I am still learning them to this day) but have had a material impact on my happiness as a developer.

I am so passionate about this topic that I’ve written over 150 blog posts. Where are these? They’re over at another site I’ve been updating for a few years.

And then I went and wrote a book. I learned a bunch of lessons during that process, including the fact that it’s an intense effort. I wrote it with three audiences in mind:

  • The new developer, with less than five years of experience. This book will help them level up.
  • The person considering software development. This book will give them a glimpse of what being a software developer is like.
  • The mentor. This book will serve as a discussion guide for many interesting aspects of development with a mentee.

The book arrives in August. I hope you’ll check it out.

Full details, including ordering information, over at the Letters To A New Developer website.


What senior engineers do: fix knowledge holes

I had a bad week a few weeks back, and got together with a friend and former colleague. We started trading war stories. He mentioned one time when he was working for a company which had both a desktop client and a server component. The communication between the two pieces of the system was completely, utterly undocumented. The original engineers who had written each piece of the app had departed. The system was key to the companies business (it was what they sold!).

This “knowledge hole” was identified by my friend. He was a software engineer. This was a key piece of software, but was unknown. The problem was not getting solved.

So, he solved it.

He installed the client and a TCP sniffer, and clicked around the client. He recorded all the traffic. He literally reverse engineered the client/server communication protocol. He then documented what the communication protocol was, so the the next group of engineers could benefit. He was was a software developer who was responsible for the back end systems. I don’t think he was directly responsible for the client/server protocol and the client was, I believe, outside his purview. (Updated 7/20) Note, he was a software engineer, not QA or a network engineer. But he didn’t let the role definition stop him.

I thought to myself, this is the textbook definition of a senior engineer. You see a problem, you solve it (thoroughly), you document it and you level up your team.

It was pretty awesome to hear about.


Exercism.io: Level up your coding skills

Children at playI’ve really enjoyed Exercism.io. This is an online learning platform for coding. It’s similar to HackerRank or Codewars.

The main difference is that for a subset of the problems, you actually have a mentor give you feedback. You download a problem (with tests) and code up the solution however you like. Then you push the solution up to a central repository where a volunteer mentor reviews it. I’ve had a different mentor for every problem. The same mentor stays with you for the duration of one single problem, and gives you feedback on style and API usage. They also provide accountability in a way that a computer grading my code just doesn’t do it for me.

Each language has a track (as well as some specialized frameworks like React). I’m working on two tracks right now. The first is a language I’m intermediate in (ruby). I’ve learned a lot about the standard library as well as ruby idiom. The second is a language I’m at best a beginner at: javascript (specially modern javascript which has changed a lot).

But there are a lot of other languages I’m looking forward to looking at when I finish these tracks (functional languages like OCaml, for one).

A final benefit of Exercism is that each problem is a challenge but not an impossible one. I find coding up a naive solution to the challenge takes about 15 minutes, and I’ll often do that first before I try to refine it to make it a bit better. It’s kinda like TDD but someone has written all the tests for you, so you just get to do the fun part.

Check it out.


Ever felt like your codebase was out of control?

I certainly have. A couple of times in my career the combination of technical debt, business model shift and lack of time for a proper fix have left me feeling out of control.

But reading this post on Hacker News made me realize that it all could have been so so much worse. A couple of “best ofs”:

To give you some examples, I originally came on as a contractor because they had some refactoring they wanted done. The entire system was home built (including the programming language) and there was a file size limit of 32,767 lines. They had many functions that were approaching this limit and they didn’t know what to do, so they hired me.

and:

Once upon a time, there was a search product and one of the data sources that it could search was a Solr/Lucene database. This should be no problem, since search is what Solr does. It should be as simple as passing the user’s query through to Solr and then reading the response. The problem was, it was important to know exactly which parts of any matched records were relevant to the search.

 

The Guy Before Me™ decided that the best way to implement this would be to split the user’s search into individual words, perform a separate search query through Solr’s HTTP API for each individual word, and then do a bunch of very clever and complex post-processing on the result sets to combine them into a single set of results.

and (last one, I promise):

At my first gig I teamed up with a guy responsible for a gigantic monolith written in Lua. Originally, the project started as a little script running in Nginx. Over the course of several years, it organically grew to epic proportions, by consuming and replacing every piece of software that it interfaced with – including Nginx.

 

There were two ingredients in the recipe for disaster. The first is that Lua comes “batteries excluded”: the standard library is minimalist and the community and set of available packages out there is small. That’s typically not an issue, as long as one uses Lua in the intended way: small scripts that extend existing programs with custom user logic (e.g. Nginx, Vim, World of Warcraft). The second is that Lua is a dynamic language: it’s dynamically typed, and practically everything can be overridden, monkey patched and hacked, down to the fundamental iterators that allow you to traverse data structures.

shivers. There, but for the grace of God.


Trust the compiler

I loved this post not because I love reading assembler but because it just illustrates so perfectly how often, when writing software, we can easily stand on the shoulders of giants.

My point is not that we should take what we’ve learnt from the LLVM-generated code and write a new version of our hand-rolled assembly. The point is that optimising compilers are really good. There are very smart people working on them and computers are really good at this kind of optimisation problem (in the mathematic sense) in a way that humans find quite difficult. It’s the job of language designers to give us the tools we need to inform the optimiser as best we can as to what our true intent is, and larger integer sizes are another step towards that.


Who’s Afraid of Continuous Deployment?

Fish leaping to a larger pool

Leaping to larger pool

So, who’s afraid of continuous deployment? I am, for one. And I’m not alone. I taught hundreds of people in AWS courses over the past two years. We often discussed continuous delivery and deployment and I asked if this was practiced at their places of work. I’d say about 5-10% of folks said yes. I conducted a very informal survey across two technical slacks as well. Unfortunately I had my terms wrong and asked about continuous delivery:

Wanted to do a quick poll. Can you please give a thumbs up to this message if you or your team does continuous delivery of your software product, and a thumbs down if you don’t. And a :penguin: if it doesn’t apply?

The results were:

  • Did CD: 27
  • Did not do CD: 25
  • Does not apply: 3

In the poll, I defined continuous delivery as “if a change is merged to the mainline branch and passes all the tests, it is deployed to production (or whatever environment your customers see) without human involvement”. This was actually a source of discussion, as some folks were very close to this (they deployed to beta environments where only a few customers saw it, or required one human to push a button to actually release, but everything up to that point was automated). Also, someone shared this link about the difference between continuous delivery and continuous deployment. Turns out I was using the term continuous delivery incorrectly. What I defined as continuous delivery was actually continuous deployment. Whoops!

That said, it was interesting that a large number of folks did not deploy code automatically, almost half (note that I believe the poll had a bias because I asked in one slack on the #devops channel. The numbers from the other slack had less than half doing continuous deployment). I’ve worked at a number of small startups, some without paying customers, and I’ve never worked in a place with continuous deployment. I’ve been in jobs with continuous integration and continuous delivery (and this provides a lot of value) but not continuous deployment. I wanted to talk about some reasons why.

The first reason is that continuous deployment simply doesn’t apply. If you are building software that is deployed to customer sites (on-prem), or is tied to hardware, then it doesn’t make sense to work toward CD because there will always be a manual delivery component. Another reason why it might not apply is legal compliance. Folks in the slacks pointed out that in some regulatory regimes you legally are required to have a human ‘push a button’ to deploy because more than one person needed to be involved in a code deploy to satisfy the law and the auditors. These are totally legitimate reasons for not doing continuous deployment.

Next, let’s discuss the reasons based on fear or lack of software hygiene (automated tests or a robust type system). Before I step into this, I want to acknowledge that there may be times in the life of your business where such software hygiene is detrimental to your chances of survival–you need to get an MVP out and test your value in the market, for example. However, in my years of experience I find that following proper software hygiene is far easier to do if adhered to from the beginning. If you don’t, eventually the difficulty of changing the system will grow along with its complexity. You can bolt on testing later, but it is difficult.

I also want to emphasize that I’ve been in all these situations myself. In some ways this blog post is a warning for future me when I try to shirk these practices.

  • If you don’t have automated test coverage, continuous deployment is reckless. This often happens in systems where the testing was bolted on after the system had been developed for a while. The solution is to work towards having enough test coverage to give yourself confidence (it swaddles your code).
  • A system may have configuration deeply tied to a database. Many content management systems are in this boat, which makes it very difficult to roll new configuration forward automatically.
  • Not having an automated rollback strategy. If you are going to continuously deploy, you need to have a way to rollback with confidence, with one script. If you are on heroku, heroku rollbacks help here. If you are running rails code, you can use db:rollback but you’ll need to know how many steps to rollback (I couldn’t find anything that rolled all migrations back to a given timestamp) and you’ll want to be careful about losing data. It may make more sense to run migrations in a different release, and always have the code be backward compatible. Lots of interesting reading about that strategy in strong_migration’s docs. This solution will vary from application to application.
  • Not having enough users to safely canary. One way to know if your new release has problems is to do a blue/green deployment and send just a fraction of your traffic there (you could use a weighted DNS round robin solution). But if you only have a small number of users, the canary userbase won’t adequately run through all the code paths.
  • Fear of breaking key user flows. At a recent company we did basic manual regression tests just before deployment. These could have been easily automated via selenium and would have made sure that at least basic functionality was available. Also see this post from 2013 on smoke testing.

All of these are not really technical issues, they’re prioritization issues. At this point in time most web applications can be continuously deployed. The tooling and the knowledge is out there, given the business and technology teams commitment.

However, this in some ways sidesteps the real question. Why is continuous deployment a goal worth prioritizing, especially when the team has to spend time supporting that instead of giving customers more features? CD is extra work to set up, but once it is running then you can deliver features at a very rapid pace, and you never have a feature sitting around waiting for other orthogonal features. So, in a way, it will actually lead to more features and better development. There’s also the long term benefits of software hygiene for the ability of the system to evolve.


The Original Sin Of A Software System

AppleI had coffee with an acquaintance a while back who works in a software company. We were talking about their system and he referred to the “original sin”.

It immediately struck a nerve.

Whenever I (the “I” being a team, as it’s extremely rare to build anything by yourself) build a system, I make assumptions based on a number of factors. What I need to provide functionality immediately. Where I expect the system to go in the short term. Where I expect the system to go in the long term. What I need to do to make the system maintainable. What I need to do to make sure I can run the system.

All of these assumptions mix together into an architecture and a design. Similar to the way I imagine one would lay out a subdivision, the early choices (where to put roads and lots) effect later decisions (where trees, parks, and homes go, which homes get the best view).

Often, one or more of these assumptions is incorrect, and it is often one of the core decisions, rather than one of the ancillary ones. (Or, perhaps they both have an equal chance to be incorrect, but if it is a core assumption, I notice it more.) This is the “original sin”.

I can think back to all of the systems that I ran for any length of time in production (that is, with real customers) and name an original sin (at least one, oftentimes more).

What do you do with this mistake?

You can live with it

The mistake is embedded so deep, or the business is evolving so quick, or there are so many other features that it never makes sense to rectify the issue. It just sits in your system, bugging you when you see it, making new features more complicated to build.

You can rip it out

If you have the time and now know what you didn’t know when you first built the system, you can modify the system to remove the original sin. This can take time and effort, because it is embedded and probably touches many pieces of your system. It also may slow down or stop new development. When you are done, you’ll have a system that is better for the world your business is living in now. However, you may add a new original sin.

You can work around it

Instead of living with it (no change to it) or ripping it out (pulling it out at the roots), you can work around it. I’ve done this before by swapping out pieces of an original sin component. This is an iterative approach that that can be intermingled with feature development. This is the most pragmatic approach for long lived software.

You can abandon the system

If the system has a really fundamental original sin, and the technology world has moved on, you may consider abandoning the system, either for a new build or a new buy. Don’t think of the money you spent on the first system as wasted, think of it as tuition.

It is tough to make predictions about the future usage of your system. Building a system is a balance between getting it done and making it flexible enough to meet future needs. However, some decisions are so rooted in your system that they will have ramifications for as long as those systems are used. And some of the decisions you make will be wrong, and you’ll be confronted with that fact as long as you work on that system. It’s OK to make these mistakes, but when you see one, think long and hard about how best to rectify it, balancing development and business pain. Discuss it with your team, including non technical members, so that everyone understands the pain.


Zen of Code Review Series

Tree trunk with mushroomsI stumbled (thanks John!) on this post about the right way to review code. This is a key skill for working in a team and one that is underappreciated. A solid code review makes sure the code changes can be understood.

If code can’t be understood, it can’t be changed. If code can’t be changed, business processes can’t be changed (code is, after all, just business process made digital). If business processes can’t be changed, the organization can’t adapt to new opportunities or threats. Have you ever seen a tree grow around a fence? The fence constrains the tree, but the tree keeps trying its best to work around the fence. Neither end up serving their inner purposes. That’s what I think of when I see code that can’t be changed.

Here is a brief excerpt to give you a flavor of the post.

Your goal, then, is clear: question, probe, analyze, poke, and prod to make sure that you, the reviewer, could support the code presented to you for review. From an overall perspective, there are several questions to keep in mind as you begin your task…

It seems like this is a series, where the author discusses code reviews from a variety of different perspectives. Recommended.


“Choose boring technology”

Reading a boring bookThis article on choosing boring technology is so good. It’s from 2015 so some of the tech it references is more mature, but the thesis is that the technology underlying a business should be well known and boring to implement. It also discusses how important cohesive technology can be (I love the final footnote).

I think this is an interesting contrast to microservices, which is essentially solving repeatedly for local optimizations and then relying on the API or message boundary to allow scaling. Microservices, in my experience, often get a bit hand wavey when it comes to operationalization (“just put it on Kubernetes, it’ll be fine”).

I have definitely dealt with this in previous companies where I had to put the kibosh on new technologies being introduced (“nope, we’re going to do it in java”). This meant that as a small team we had a standard deployment stack and process and could leverage our logging knowledge and debugging tools.

Here are some arguments I hear for using the latest and greatest technologies:

  • “Our developers will get bored and we’ll have a hard time recruiting”
    • This is a great reason to have regular hackfests and or support developers working on open source or side projects.
  • “New tech will help us do things faster”
    • It’s definitely worth evaluating new technology periodically and adding it to your stack (especially if it is outsourced, a la RDS, or boring, a la memcached). As the original post mentions, if you get substantial benefits from the new technology, then use it full bore and plan for a migration. Don’t end up in a state where you are half in/half out. Also consider tooling and processes to get things done faster–these may have quicker iteration times with lower operational risk.
  • “Choose the right tool for the job”
    • Most turing complete languages can be used for any purpose. And you shouldn’t be optimizing for the right tool for the job, but rather the right tool for the business for the job.

Really, you should just go read the post. It’s good.

 



© Moore Consulting, 2003-2020