Rails Views Cached In Production Environment

Railroad tracksI was troubleshooting a data issue in a production environment. It wasn’t heroku, rather a rails environment hosted on AWS. It was Rails 4.2, ruby 2.2.3.

First off, it’s worth noting that there were two or three bugs that were commingled and causing issues for our client. A number of folks had spent a long time trying to troubleshoot the issue. At this point, I was tasked with taking a look and had access to all the environments. The problem only seemed to appear on production, and appeared to be a data issue. I was editing views directly on production to track down where the data issue appeared, as well as running queries on the production database and using the rails console to see what rails thought was happening. In other words, it was a hot mess. However, this debugging story isn’t the point of this post. Rather, I ran into the most peculiar situation and wanted to document it so that if I ever ran into it in the future, I would remember it.

Basically, I had a view that looked something like this:

text
<% cache('[key]') %>
other text
<% end %>

I changed text to be new text which included some useful debugging information. Debugged the problem and went on my merry way. The next day, early, I realized that I hadn’t changed it back, so logged back into prod and changed it back to text. Reloaded the page and didn’t see the change. What? Tried to clear the cache using the rails console and Rails.cache.delete(). No change.

After lots of googling, I realized that the view text, outside of cache tags, is cached in some other fashion. I finally figured out how to reset the cache by following these steps:

  • edit config/environments/production.rb
  • set config.cache_classes=false
  • restart passenger by touching tmp/restart.txt (see here for more on that)
  • reload the page, and now I could see text instead of new text
  • set config.cache_classes=true
  • restart passenger by touching tmp/restart.txt

This only happens when you both have a mutable production environment and are changing the view files in that environment. This won’t occur if you were using a platform like Heroku, or if you never troubleshot on production.


(Links To) Advice For Someone Selling a SaaS Business

Sold signI ran into someone at a meetup recently who’d built a SaaS that had a pretty decent MRR. Enough to support one person. Which is a huge achievement!

He was wondering what options he had to either grow or exit the business. This is something I’ve been reading about for a number of years, so I had some advice. I thought I’d write it down so that others could benefit (or chime in). These are resources I’ve found insightful.

This is a great first hand account by patio11 of selling a software business (it wasn’t SaaS, rather a one time digital product sale, but I think there are a number of common themes). He mentions the broker he uses, the due diligence process, and what you can do now to set yourself up for success (have separate accounts, for one).

I can’t even talk about SaaS without telling folks to raise their prices. It’s a reflex now. Amy Hoy has two great posts on this: grandfathering and new features (with a lot of communication mixed in). I experienced this myself at a previous startup, where we almost doubled our monthly subscription price in 18 months.

Finally, here’s an interesting post from a venture capitalist about how private equity is a new exit option for SaaS companies. In that vein, I chatted briefly with one such PE firm, SureSwift Capital, about part time work a year or so ago. I don’t know how they are to work with (the position wasn’t a fit) but at the time they were focused on acquiring SaaS companies with good MRR and helping them grow.


What’s New With Ruby?

Red Rubyish DiamondFor the past couple of months I’ve been doing a short segment at the beginning of the Boulder Ruby Meetup called “What’s happened in Rubyland?”

I basically look at 3-4 blogs and google searches and see what is happening. Of course, far more than what I can collate is happening (I don’t look at any major gem releases, for example) but this gives quick insight into major happenings.

Here are the past three editions. Enjoy the starkness of my presentation.

PS We’re always on the lookout for speakers. Let me or tweet the organizers if you’re interested.

 


Load Testing Weirdness With AWS Aurora

Confused personSo I was doing a load test and saw behavior that reminded me that sometimes you just need to test.

Ran a test with 1500 requests/second with multiple servers (20ish) and smaller number of bigger servers (2-3). Saw some weird behavior with a number of 500 errors (bad gateway). Didn’t see these errors under a lower load.

Looked at the database (an aurora cluster with a single read and a single write instance) and saw that it was maxed out (cpu pegged, connections at max, couldn’t even connect at times.

Thought I need to upgrade the database. I upgraded the write instance. It was late and I failed to notice that that upgrade flipped the read and the write instances. So now the read instance was at the bigger server size and the write instance was at the smaller (original) server size. Then I re-ran the load test and everything went swimmingly (response time under 500 ms, where before it had spiked to 100 secs or more).

Great, problem solved. The larger instance size solved it.

But wait, it didn’t. The app was connecting to the primary endpoint, which is the master write node. I didn’t believe it, so I double checked and matched test times against connection spikes to the db.

So somehow, the flipping of the database to have a different primary Aurora instance (but no change in db size) caused a radical change in system behavior under heavyish loadfor a distributed php application.

Mysteries.


Using AWS for load testing experimentation

Someone with heavy weightThe cloud is amazing for load testing your system. If you design your system to be behind a load balancer (which, in many applications, means pushing state to a database and having stateless compute nodes), you can easily switch out those nodes in different scenarios.

I just load tested a system I’m working on and changing out the compute nodes was fairly easy. Once I’d built a number of servers (something I scripted partially but didn’t fully automate because the return wasn’t there) and troubleshot some horizontal scaling issues that popped up in the application, I was able to:

  • take a server out of service behind the load balancer
  • stop it
  • change the instance type
  • start it
  • re-run any needed config changes on the server
  • update DNS if needed (depending on if you have a pinned IP address or not)
  • add it back to the load balancer

Swap out a few instances and you have a new setup for your load test. When you are done, follow the process in reverse to save yourself some money.

Incidentally, increasing the number or size of compute nodes didn’t have the desired effect of being able to handle more load.

What turned out to be the root issue? The database was pegged, both in terms of CPU and connections. Just goes to show that when you’re load testing, you really need to be looking at different aspects of the system, thinking about where your weak bottlenecks are, and use the scientific method of hypothesis, experiment, result.


Follow the money, cloud edition

Clouds in the sky

No, not that kind of cloud

This post was really eye opening and lets you know who are the real players in the public cloud space. I especially enjoyed the metric of capex as percent of revenue. From the post:

As I keep repeating, CAPEX is both a prerequisite to play in the big boy cloud and confirmation of customer success. Both IBM and Oracle are tens of billions of dollars in cloud infrastructure CAPEX behind Amazon, Google, and Microsoft. Oracle’s spending has at least ticked up, but their spending is not enough to keep pace, much less to have any hope of catching up to the infrastructure of the big three.

The whole post is worth reading if you are interested in public cloud providers in any way.


Farewell Boulder New Tech Meetup

Girl letting go of balloonI, along with many others, received this email last week:

Participating and watching the Colorado tech community evolve has been an amazing experience. Over the past 12 years we have had so many people engage and support our efforts. This includes the attendee’s, presenters, organizers, and sponsors. Give yourselves a big hand, IMO you are the reason Colorado has such a vibrant startup ecosystem.

I’m saddened, but it is time for me step away and stop organizing/hosting the Boulder chapter of New Tech Colorado. I’ve attempted to find a replacement over the past year, but no one has stepped up. I think thats ok, we have many other pitch events happening throughout the front range, including other New Tech events.

The Boulder New Tech, starting with the June event will no longer accept reservations and I’m going to shut down BDNT.org.

Thank you for giving me your attention and for sharing your experiences over the past 12 years. See you around town.

It was from Robert Reich, the moving force behind the New Tech Meetups here in Colorado. After over a decade, there will be no more Boulder meetups (though it looks like other cities are going strong, at least from the meetup page). I totally understand where Robert is coming from. I’ve been to many of these meetups, but over the last couple of years attendance was definitely down. However, every time I attended I met interesting people and saw a different slice of the Boulder ecosystem. I will say that it seemed like BDNT was a welcoming initial introduction to that ecosystem, but once a newcomer understood the landscape, I think they were better served by a more focused meetup. I know Robert experimented with a number of different formats and concepts–I hope he writes them all down for future meetup organizers.

I also had the opportunity to speak a few times at BDNT. Once I presented on GWT, which was my first experience talking to a group of over one hundred people (note to self, don’t present a technology at a pitch night 🙂 ). I also spoke in Denver with Brian Timoney–that was fun because of the 3d google earth submarine navigation demo and because Josh Fraser met with us and gave us some tips. And in 2016 I presented the startup of which I was a co-founder. Each time the community was very supportive and helpful.

I want to thank a few folks:

  • Robert and all the other organizers over the years.
  • The speakers, who made the night interesting every time I went. I never left without learning something new.
  • The hosts. I know the event was hosted for many years by Silicon Flatirons, but also attended events at Galvanize Boulder.Also appreciated the snacks!
  • The community. Always supportive and present.

Institutions don’t have to live forever (especially those that survive on the efforts of volunteers). It’s OK for them to end. I will miss the BDNT event, but I know the community of support for entrepreneurship in Boulder and the front range is larger than ever.

Fare thee well, Boulder New Tech.


Greenfield webapp data storage decision tree

Choice on a sign

Choices choices

I saw a discussion about storing data in one of my slack channels and saw a line too good to leave in the slack.

Here’s a data storage decision tree for 95% of applications. Do you have data to durably store?

1. Use an open source relational database

(The original poster specified PostgreSQL, but MySQL/MariaDB are viable alternatives in my mind. Each has different strengths.)

The modern, open source RDBMS is flexible and scaleable (even web scale [warning, video has cursing]). It’s free from licensing fees (though you can pay for support). There are turnkey solutions for managing it in the cloud. There are plenty of developers who know how to use it (even more developers know how to use SQL) and many DBAs and sysadmins who know how to tune it. . You can scale it out by using read replicas and scale up by buying better hardware (or VMs). Every language has a library that talks to RDBMS. The database will maintain data integrity. Many tools that non technical users favor can connect to them (even Excel, if you install the right ODBC driver).

There are plenty of other solutions out there (filesystem, no SQL variants, xml databases, data lakes). They exist for a reason. For certain problems and at certain scale they are better solutions than the Swiss army knife of a RDBMS. But the default decision should always be an RDBMS, and the onus should be on the other solution to justify its present. For 95% of problems, the your default should be MySQL/MariaDB or Postgres.


Fixing the RubyGems “Too Many Requests 429” error

lots of gemsA server on which I am working runs this command: /usr/bin/gem install --no-rdoc --no-ri aws-sdk to get the aws-sdk. I was seeing this error message:

Error: Execution of '/usr/bin/gem install --no-rdoc --no-ri aws-sdk' returned 1: ERROR:  While executing gem ... (Gem::RemoteFetcher::FetchError)
    bad response Too Many Requests 429 (https://api.rubygems.org/api/v1/dependencies?gems=aws-sdk-elasticloadbalancingv2)

Every time I ran it I’d see a different gem that triggered the 429 response. There wasn’t much out there when searching, other than a note that I should update to a new version of bundler (which I wasn’t using).

Finally, I figured out how to get past this. What I did was manually run /usr/bin/gem install -f --no-rdoc --no-ri aws-sdk multiple times, and each time the command would get a little further. Finally all the dependencies had been downloaded. Then I was able to run it without the -f switch after that.


Obstacles to building high availability software systems

Open sign

Is your system available?

I saw a discussion on a slack about obstacles to high availability systems and wanted to record the edited version for posterity (mostly for future me, as I blog for myself). Note that in any mention of high availability systems would be remiss if I didn’t mention the Google SRE book, which is slow reading but free and full of great information.

First, what is high availability? I like this definition from Digital Ocean:

In computing, the term availability is used to describe the period of time when a service is available, as well as the time required by a system to respond to a request made by a user. High availability is a quality of a system or component that assures a high level of operational performance for a given period of time.

Design considerations of a system that will hinder high availability fall into two categories.

The first category is actions that you don’t take, but could take:

  • single points of failure: if you have a piece of your system which is unique and it fails (and everything fails, all the time), the entire system’s availability will be affected.
  • missing or incomplete automation: if you need human beings to resurrect failed parts of your system, it will meaningful amounts of time and will be error prone.
  • failing to build in elasticity and scalability of resources: when usage increases, new resources should be automatically brought online. Failure to do so will impact system performance and that could impact system availability
  • missing or incomplete system instrumentation: if you don’t monitor your system, you won’t be able to even know its availability (until you hear from your users).
  • application statefulness (on the compute nodes): this impacts your ability to use elastic resources and to grow parts of your system that are under load. (If you aren’t designing a greenfield system, this may be an externally imposed requirement due to existing software.)

The second is in actions you can’t take because of external requirements on the system:

  • data sovereignty: if you are legally limited to certain data centers, you have fewer options for your system, this can hinder building the system.
  • tenancy: if you need to have single tenancy for security or legal reasons, you may have fewer options for elastic solutions.
  • data models and authority requirements: poorly performing data models can impact performance. If your application requires certain operations must be from the source of record (permissions checks, for example) then a poorly performing source data model can impact performance which can impact availability.
  • latency: if you have a highly latency sensitive system, then you may need to trade availability for decreased latency. Since availability often means geographic dispersion (to avoid disasters impacting multiple pieces of a system), it impacts latency requirements.
  • cost: high availability systems, because they have no single points of failure, cost more.

Again, this was a discussion from a slack of AWS instructors, but the commentary is mine, as are any mistakes. Thanks to Chad, Richard, Jon, Ryan and everyone else!



© Moore Consulting, 2003-2017 +