Skip to content

Open source is code escrow on the cheap

I read this article a while back about the VC backed open core playbook. Worth a read.

If you haven’t had a chance yet, the playbook is:

  • start an open source product
  • create a company around it
  • use the siren of open source to drive adoption
  • take VC money
  • continue to developer the open source solution
  • build out closed source functionality, typically as a hosted SaaS
  • over invest in the closed source edition
  • let the open source version wither on the vine
  • profit !?!

The crux of the argument is that the open source version of whatever software is being sold is a pure marketing play, and that all the focus will eventually arrive on the closed source extensions or functionality. After all, that is what drives the revenue, and since the company took VC money, they need outsize revenue.

There are a few flaws in the argument, including, but not limited to:

  • it is possible, and even likely if a project succeeds, that there is a community of other companies that will drive the open source project forward under another name (see Opensearch or Valkey)
  • the marketing value of the open source project doesn’t necessarily recede as the closed source functionality drives more revenue; it may even grow
  • not every open source company is VC funded

But I’m going to set all those aside for now, and focus on the value of open source for the end developer. In particular, why is that so attractive to devs? Why is it such a powerful marketing tool to drive adoption?

There are two reasons OSS is important for dev focused tools:

  • permissionless access
  • derisking the future

Let’s dig into each of these.

Permissionless access

When a product is open source, a developer can access it, typically by downloading and running the code, without talking to anyone. They don’t have to fill out a ‘contact sales’ form or give any information to the vendor. By the way, marketing departments hate that, not because they want to spam developers, but because it’s really hard to do modern marketing when you have no idea who your users are.

Just as importantly, a developer downloading an OSS project does not have to ask for money or permission from their own organization either. They are spending time, which is an opportunity cost, but for typical developers that isn’t an expense that is tracked too closely. Even agency developers billing by the hour have time to explore and can justify investigating a tool if they think it will speed delivery.

Decreasing the friction of trying a tool means, all other things being equal, more people will try it. If the product is good, and by that I mean it solves a need, this is the first step to adoption.

Derisking the future

Whenever a developer picks up a tool, whether it is a SaaS product or a library, they do so with the knowledge that the tool and the uses for it will break something in the future. This is an unfortunate side effect of the fluidity of software and anyone who has spent days or weeks upgrading from one version of a framework or library to another will understand that it is part of the job.

Using an OSS tool derisks this unpleasant task in two ways, and therefore makes the future better for the developer, increasing the attractiveness of using OSS.

The first is bug fixes. It is quite frustrating to be stopped by a bug in a software library you are using. I still remember decompiling java classes two decades ago to characterize a bug in a software package we were using. I found the bug and then had to raise an issue with the vendor; in the meantime I had to code a workaround.

When you have access to the source as a developer, you can do the fix yourself. You typically want to upstream it to the vendor to ease the burden of maintaining a fork, but you are not stopped in your tracks. And finding the fix is easier because the source and build instructions are available.

The second is operations. If pricing gets punitive or the vendor has a hard time operating the software in a way that meets your availability needs, you can run it yourself. Or, if you don’t want to, you can pay someone else to do so. If the software is successful enough, a hyperscaler may offer a managed service (hello Elasticsearch!). HavingĀ  competition for running the product makes it less likely you’ll be stranded engaging with a vendor that doesn’t meet your needs. It’s code escrow without paying Iron Mountain truckloads of cash.

Conclusion

I think the risk of relying on OSS companies that take funding is real. VCs aren’t in the business of giving away value, so there will eventually need to be a business model and I think that the author of the original post described what is unfortunately a pattern. But developers justifiably value the benefits of OSS highly too. Permissionless access lets them get on with doing their job while source code availability derisks future problems.

I predict we’ll see more OSS companies started in the devtools space because of these factors. But the long term trend of successful companies moving from OSS licenses to more restrictive ones will also continue.

Multi-tenancy options

Multi-tenancy is a key part of building a SaaS product. You want to amortize your software investment across different paying customers. Customers should never be able to access any other customer’s data. And it is often the case that customers don’t want their data intermingled with other customers’ data, though that depends on the type of data and the customer needs.

By separating customers into tenants, you can achieve this data separation. There are a number of levels of multi-tenancy. In increasing order of isolation, they are:

  • no multi-tenancy. In this situation, all the customer data is co-mingled in the database. Any data that is unique to a customer is tied to that customer with an id.
  • logical multi-tenancy. Isolation is enforced in code and the database (every table has a ‘tenant id’ key, and you’re running joins). When you are using this type of isolation, you want to resolve the tenant as soon as possible. This is often done with a different hostname or path. You also want to ensure that any users accessing tenant data are part of the tenant. If you use this approach, mistakes in your code can be used to ‘escape’ the tenant limitation and read other customer’s data. However, one advantage of this approach is that you have operational simplicity: one version of the code to maintain and one version of the database. There may be support for this in your framework, or you may be rolling your own.
  • logical multi-tenancy supported by the database. Some databases support row level security isolation. In this scenario, each tenant has a different user, and the data isolation is enforced by the database. Your code is limited to looking up the correct user for a given tenant.
  • container level multi-tenancy. In this scenario, you run separate containers for each tenant. If you are using a solution like Kubernetes, you can run them in different namespaces to increase the isolation. The operational complexity increases (I did mention Kubernetes, did I not?) but it becomes far more difficult for an attacker to use the access of one tenant to get another tenant’s data. However, now you can have multiple versions of the codebase running. This can be a blessing and a curse, as it allows each client to control their version (if you enable it). This can increase support burden depending on the complexity of your application. You could also choose to run the latest code on every container, upgrading all containers every time a change is made to your software.
  • virtual machine multi-tenancy. Here you use different databases and virtual machines for each tenant. You can leverage common security defense-in-depth practices at the network level, using network access controls and firewalls. This physical isolation makes it even harder for an attacker to escape and view other tenants’ data. However, it increases your operational costs both in terms of complexity (are you going to force everyone to upgrade across the entire fleet?) and support (there may be configuration and/or code drift between the different VMs). If you pursue this, it behooves you to automate the creation of these virtual machines.
  • physical hardware isolation. With this choice, you actually run different hardware for each tenant, possibly in different data centers. This is the most secure, but the most operationally intensive. There are some options for API driven hardware setup, but the isolation, while a boon for security, makes updates and upgrades more difficult.

What is the best option for your SaaS solution? It depends on the security needs of your customers as well as your cost structure and your operational maturity. The higher the level of isolation, the harder it is to run and upgrade the various systems.