Use managed services. Please.

“Use managed services.”

If there was one piece of advice I wish I could shout from the mountains to all cloud engineers, this would be it.

Operations, especially operations at scale, are a hard problem. Edge cases become commonplace. Failure is rampant. Automation and standardization are crucial. People with experience running software and hardware at this scale tend to be rare and expensive. The mistakes they’ve made and situations they’ve learned from aren’t easy to pick up.

When you use a managed service from one of the major cloud vendors, you’re getting access to all the wisdom of their teams and the power of their automation and systems, for the low price of their software.

A managed service is a service like AWS relational database service, Google Cloud SQL or Azure SQL Database. With all three of these services, you’re getting best of breed configuration and management for a relational database system. There’s configuration needed on your part, but hard or tedious tasks like setting up replication or backups can be done quickly and easily (take this from someone who fed and cared for a mysql replication system for years). Depending on your cloud vendor and needs, you can get managed services for key components of modern software systems like:

  • File storage
  • Object caches
  • Message queues
  • Stream processing software
  • ETL tools
  • And more

(Note that these are all components of your application, and will still require developer time to thread together.)

You should use managed services for three reasons.

  • It’s going to be operated well. The expertise that the cloud providers can provide and the automation they can afford to implement will likely surpass what you can do, especially across multiple services.
  • It’s going to be cheaper. Especially when you consider employee costs. The most expensive AWS RDS instance is approximately $100k/year (full price). It’s not an apples to apples comparison, but in many countries you can’t get a database architect for that salary.
  • It’s going to be faster for development. Developers can focus on connecting these pieces of infrastructure rather than learning how to set them up and run them.

A managed service doesn’t work for everyone. If you need to be able to tweak every setting, a managed service won’t let you. You may have stringent performance or security requirements that a managed service can’t meet. You may also start out with a managed service and grow out of it. (Congrats!)

Another important consideration is lock-in. Some managed services are compatible with alternatives (kubernetes services are a good example). If that is the case, you can move clouds. Others are proprietary and will require substantial reworking of your application if you need to migrate.

If you are working in the cloud and you need a building block for your application like a relational database or a message queue, start with a managed service (and self host if it doesn’t meet your needs). Leverage the operational excellence of the cloud vendors, and you’ll be able to build more, faster.


Load Testing Weirdness With AWS Aurora

Confused personSo I was doing a load test and saw behavior that reminded me that sometimes you just need to test.

Ran a test with 1500 requests/second with multiple servers (20ish) and smaller number of bigger servers (2-3). Saw some weird behavior with a number of 500 errors (bad gateway). Didn’t see these errors under a lower load.

Looked at the database (an aurora cluster with a single read and a single write instance) and saw that it was maxed out (cpu pegged, connections at max, couldn’t even connect at times.

Thought I need to upgrade the database. I upgraded the write instance. It was late and I failed to notice that that upgrade flipped the read and the write instances. So now the read instance was at the bigger server size and the write instance was at the smaller (original) server size. Then I re-ran the load test and everything went swimmingly (response time under 500 ms, where before it had spiked to 100 secs or more).

Great, problem solved. The larger instance size solved it.

But wait, it didn’t. The app was connecting to the primary endpoint, which is the master write node. I didn’t believe it, so I double checked and matched test times against connection spikes to the db.

So somehow, the flipping of the database to have a different primary Aurora instance (but no change in db size) caused a radical change in system behavior under heavyish loadfor a distributed php application.

Mysteries.


Let AWS RDS handle database scutwork

Amazon DatabasesRDS is a service I’ve mentioned in the past, but it’s fantastic. You can outsource large chunks of database administration to AWS. Tasks you can forget about include backups, failover, read only replicas, and OS and DB upgrades.

This is a great fit for spinning up databases for small scale to large scale systems and prototyping.

Things to keep in mind if you start using RDS:

  • The database is launched into a VPC and will have a security group around it. You’ll need to allow IP addresses or security groups access to the port the database is living on or your connections will time out.
  • The database RDS creates is a normal database that you can manage like you can any other database you have set up and installed, but there are certain limitations (for example, no MySQL UDFs). Read the documentation and understand the limitations, but be aware they are constantly changing. I suggest subscribing to the AWS Database blog RDS category for updates.
  • RDS uses EBS under the covers and has the performance constraints of that technology. For the largest scale production systems you’ll want to test before jumping in whole hog.
  • If you are using MySQL or PostgreSQL and are running into concurrency problems, Aurora may be worth evaluating.
  • If you want to have backups past thirty five days for peace of mind of compliance concerns, you’ll need manual snapshots.
  • RDS only supports certain RDBMS and limits databases to certain sizes. If you want to run anything else on AWS, you will need to self manage your DB on EC2 or look at other data management solutions. Here are some other gotchas.
  • When using RDS you aren’t freed from all database administration tasks. There are still users to manage, indices to add, and queries to tune. Most of your RDBMS skillset is applicable to RDS, however. You’ll also need to determine when to schedule DB and OS upgrades, backups and how to size your instances. You still need to set up the optimal architecture of an RDS system including standbys and read only replicas and do other configuration both at the network and database level.
  • You can manage RDS system attributes via cloudformation, terraform and the CLI in the same way you can manage other AWS infrastructure. That said, the RDS system is stateful so you can’t treat it entirely as “cattle”.

You can learn more about RDS in the extensive documentation.



© Moore Consulting, 2003-2020