The cloud is amazing for load testing your system. If you design your system to be behind a load balancer (which, in many applications, means pushing state to a database and having stateless compute nodes), you can easily switch out those nodes in different scenarios.
I just load tested a system I’m working on and changing out the compute nodes was fairly easy. Once I’d built a number of servers (something I scripted partially but didn’t fully automate because the return wasn’t there) and troubleshot some horizontal scaling issues that popped up in the application, I was able to:
- take a server out of service behind the load balancer
- stop it
- change the instance type
- start it
- re-run any needed config changes on the server
- update DNS if needed (depending on if you have a pinned IP address or not)
- add it back to the load balancer
Swap out a few instances and you have a new setup for your load test. When you are done, follow the process in reverse to save yourself some money.
Incidentally, increasing the number or size of compute nodes didn’t have the desired effect of being able to handle more load.
What turned out to be the root issue? The database was pegged, both in terms of CPU and connections. Just goes to show that when you’re load testing, you really need to be looking at different aspects of the system, thinking about where your weak bottlenecks are, and use the scientific method of hypothesis, experiment, result.