Load Testing Weirdness With AWS Aurora

Confused personSo I was doing a load test and saw behavior that reminded me that sometimes you just need to test.

Ran a test with 1500 requests/second with multiple servers (20ish) and smaller number of bigger servers (2-3). Saw some weird behavior with a number of 500 errors (bad gateway). Didn’t see these errors under a lower load.

Looked at the database (an aurora cluster with a single read and a single write instance) and saw that it was maxed out (cpu pegged, connections at max, couldn’t even connect at times.

Thought I need to upgrade the database. I upgraded the write instance. It was late and I failed to notice that that upgrade flipped the read and the write instances. So now the read instance was at the bigger server size and the write instance was at the smaller (original) server size. Then I re-ran the load test and everything went swimmingly (response time under 500 ms, where before it had spiked to 100 secs or more).

Great, problem solved. The larger instance size solved it.

But wait, it didn’t. The app was connecting to the primary endpoint, which is the master write node. I didn’t believe it, so I double checked and matched test times against connection spikes to the db.

So somehow, the flipping of the database to have a different primary Aurora instance (but no change in db size) caused a radical change in system behavior under heavyish loadfor a distributed php application.


Let AWS RDS handle database scutwork

Amazon DatabasesRDS is a service I’ve mentioned in the past, but it’s fantastic. You can outsource large chunks of database administration to AWS. Tasks you can forget about include backups, failover, read only replicas, and OS and DB upgrades.

This is a great fit for spinning up databases for small scale to large scale systems and prototyping.

Things to keep in mind if you start using RDS:

  • The database is launched into a VPC and will have a security group around it. You’ll need to allow IP addresses or security groups access to the port the database is living on or your connections will time out.
  • The database RDS creates is a normal database that you can manage like you can any other database you have set up and installed, but there are certain limitations (for example, no MySQL UDFs). Read the documentation and understand the limitations, but be aware they are constantly changing. I suggest subscribing to the AWS Database blog RDS category for updates.
  • RDS uses EBS under the covers and has the performance constraints of that technology. For the largest scale production systems you’ll want to test before jumping in whole hog.
  • If you are using MySQL or PostgreSQL and are running into concurrency problems, Aurora may be worth evaluating.
  • If you want to have backups past thirty five days for peace of mind of compliance concerns, you’ll need manual snapshots.
  • RDS only supports certain RDBMS and limits databases to certain sizes. If you want to run anything else on AWS, you will need to self manage your DB on EC2 or look at other data management solutions. Here are some other gotchas.
  • When using RDS you aren’t freed from all database administration tasks. There are still users to manage, indices to add, and queries to tune. Most of your RDBMS skillset is applicable to RDS, however. You’ll also need to determine when to schedule DB and OS upgrades, backups and how to size your instances. You still need to set up the optimal architecture of an RDS system including standbys and read only replicas and do other configuration both at the network and database level.
  • You can manage RDS system attributes via cloudformation, terraform and the CLI in the same way you can manage other AWS infrastructure. That said, the RDS system is stateful so you can’t treat it entirely as “cattle”.

You can learn more about RDS in the extensive documentation.

© Moore Consulting, 2003-2017 +