{"id":3061,"date":"2018-05-19T14:23:44","date_gmt":"2018-05-19T20:23:44","guid":{"rendered":"http:\/\/www.mooreds.com\/wordpress\/?p=3061"},"modified":"2021-11-21T12:42:01","modified_gmt":"2021-11-21T18:42:01","slug":"obstacles-to-high-availability","status":"publish","type":"post","link":"https:\/\/www.mooreds.com\/wordpress\/archives\/3061","title":{"rendered":"Obstacles to building high availability software systems"},"content":{"rendered":"<figure id=\"attachment_3063\" aria-describedby=\"caption-attachment-3063\" style=\"width: 300px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-3063\" src=\"http:\/\/www.mooreds.com\/wordpress\/wp-content\/uploads\/2018\/05\/open-sign-1745436_640-300x181.jpg\" alt=\"Open sign\" width=\"300\" height=\"181\" srcset=\"http:\/\/edit.mooreds.com\/wordpress\/wp-content\/uploads\/2018\/05\/open-sign-1745436_640-300x181.jpg 300w, http:\/\/edit.mooreds.com\/wordpress\/wp-content\/uploads\/2018\/05\/open-sign-1745436_640.jpg 640w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><figcaption id=\"caption-attachment-3063\" class=\"wp-caption-text\">Is your system available?<\/figcaption><\/figure>\n<p>I saw a discussion on a slack about obstacles to high availability systems and wanted to record the edited version for posterity (mostly for future me, as <a href=\"http:\/\/www.mooreds.com\/wordpress\/archives\/2188\">I blog for myself<\/a>). Note that in any mention of high availability systems would be remiss if I didn&#8217;t mention the <a href=\"https:\/\/landing.google.com\/sre\/book\/index.html\">Google SRE book<\/a>, which is slow reading but free and full of <a href=\"https:\/\/landing.google.com\/sre\/book\/chapters\/service-best-practices.html\">great information<\/a>.<\/p>\n<p>First, what is high availability? I like <a href=\"https:\/\/www.digitalocean.com\/community\/tutorials\/what-is-high-availability\">this definition from Digital Ocean<\/a>:<\/p>\n<blockquote><p>In computing, the term availability is used to describe the period of time when a service is available, as well as the time required by a system to respond to a request made by a user. High availability is a quality of a system or component that assures a high level of operational performance for a given period of time.<\/p><\/blockquote>\n<p>Design considerations of a system that will hinder high availability fall into two categories.<\/p>\n<p>The first category is actions that you don&#8217;t take, but could take:<\/p>\n<ul>\n<li>single points of failure: if you have a piece of your system which is unique and it fails (and <a href=\"https:\/\/www.slideshare.net\/AmazonWebServices\/high-availability-websites-part-one\/12-Everything_fails_all_the_time\">everything fails, all the time<\/a>), the entire system&#8217;s availability will be affected.<\/li>\n<li>missing or incomplete automation: if you need human beings to resurrect failed parts of your system, it will meaningful amounts of time and will be error prone.<\/li>\n<li>failing to build in elasticity and scalability of resources: when usage increases, new resources should be automatically brought online. Failure to do so will impact system performance and that could impact system availability<\/li>\n<li>missing or incomplete system instrumentation: if you don&#8217;t monitor your system, you won&#8217;t be able to even know its availability (until you hear from your users).<\/li>\n<li>application statefulness (on the compute nodes): this impacts your ability to use elastic resources and to grow parts of your system that are under load. (If you aren&#8217;t designing a greenfield system, this may be an externally imposed requirement due to existing software.)<\/li>\n<\/ul>\n<p>The second is in actions you can&#8217;t take because of external requirements on the system:<\/p>\n<ul>\n<li>data sovereignty: if you are legally limited to certain data centers, you have fewer options for your system, this can hinder building the system.<\/li>\n<li>tenancy: if you need to have single tenancy for security or legal reasons, you may have fewer options for elastic solutions.<\/li>\n<li>data models and authority requirements: poorly performing data models can impact performance. If your application requires certain operations must be from the source of record (permissions checks, for example) then a poorly performing source data model can impact performance which can impact availability.<\/li>\n<li>latency: if you have a highly latency sensitive system, then you may need to trade availability for decreased latency. Since availability often means geographic dispersion (to avoid disasters impacting multiple pieces of a system), it impacts latency requirements.<\/li>\n<li>cost: high availability systems, because they have no single points of failure, cost more.<\/li>\n<\/ul>\n<p>Again, this was a discussion from a slack of AWS instructors, but the commentary is mine, as are any mistakes. Thanks to <a href=\"https:\/\/twitter.com\/brightkey_cloud\">Chad<\/a>, <a href=\"https:\/\/www.safaribooksonline.com\/search\/?query=author%3A%22Richard%20A.%20Jones%22&amp;sort=relevance&amp;highlight=true\">Richard<\/a>, <a href=\"https:\/\/nubedehelado.com\/\">Jon<\/a>, <a href=\"http:\/\/linkedin.com\/in\/ryandymek\">Ryan<\/a> and everyone else!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I saw a discussion on a slack about obstacles to high availability systems and wanted to record the edited version for posterity (mostly for future me, as I blog for [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[79,3,39,84,37],"tags":[],"class_list":["post-3061","post","type-post","status-publish","format-standard","hentry","category-aws","category-books","category-cloud-computing","category-devops","category-tips"],"_links":{"self":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/3061","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/comments?post=3061"}],"version-history":[{"count":7,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/3061\/revisions"}],"predecessor-version":[{"id":3069,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/3061\/revisions\/3069"}],"wp:attachment":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/media?parent=3061"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/categories?post=3061"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/tags?post=3061"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}