When I taught AWS certification courses, I’d often get questions about how a service behaved under load or other unusual circumstances. Frequently I could answer from personal experience or by asking other instructors; occasionally class members provided their insights. Sometimes I could dig up relevant vendor documentation.
However, my default answer was:
“Test it for yourself. There’s no substitute for testing.”
This is one of the great advantages of the cloud. When you have a question about the performance or behavior of a service or system, spin it up and test it. This will cost you money and some time configuring the system, but certainly will be cheaper than ordering hardware, racking it and then also configuring the system. When you’re done with your testing, you can tear down the infrastructure and never worry about it again. Sure beats shipping a server back to the manufacturer.
Of course, no testing scenario can replicate production perfectly. But you can get pretty close (especially if you can reuse production traffic).
When you do test, start by documenting what you want to achieve. What is the question you are trying to answer? Make sure to seek feedback from other team members and/or search online, as it’s possible someone has already answered your question. If you do find answers, understand under what circumstances the tests were performed, as the cloud and the offered services change over time.
Some examples of cloud infrastructure questions you might want to answer:
- How do EBS volumes of different sizes and types perform under load?
- When a Kubernetes cluster running on GKE is under load, what happens when you add an additional node? An additional pod?
- What happens when you turn off a NAT gateway while a file is being uploaded to S3 from an EC2 instance in a private subnet (without an S3 VPC endpoint)?
- What is the cold start time for an empty Azure function? What about a function loading your dlls?
Think about what steps you are going to take to try to answer the question.
With your question and methodology spelled out, spin up your testing environment. Having your infrastructure represented as code will make this quick, especially if you have a complicated environment. If you are creating the test environment manually, record settings and other configuration in a text file to be able to re-create the environment later.
Run your tests. If you are load testing, find an open source or commercial load testing tool. What you need depends on your goals: you need a different tool to test 100k+ simultaneous users on a website than you do when trying to understand how an internal API handles 100 requests/second.
Review the data to see if your questions are answered. More questions or areas of interest may appear. Adjust your tests to answer them.
Once you have your answers to the desired level of certainty, tear down your testing infrastructure.
Document what you tested, how you tested and your results. Circulate this internally to help your team. If possible, publish it on your company blog to both help others in the same boat and to boost your company’s standing in the community.
All the vendor documentation in the world is no substitute for rolling up your sleeves and testing.