Navigating new systems

A mazeHere are some tips and tricks I have for navigating new software systems, which can sometimes be like navigating a maze. If you’re truly unlucky, it’s a maze, but you’re blindfolded and the walls are covered in randomly placed razors.

The first is to get a clear set of expectations. Will I own the system? Who owns it now? How long have they owned it? How often is it modified? When will it need to be modified again? Is it shaky or stable? Getting these questions answered helps me understand and refine further steps.

The next step is to gain access. There are a lot of different pieces of most modern systems, so access can mean different things. Here are some kinds of access which it may be worth seeking:

  • shell
  • version control
  • database
  • ftp
  • http
  • app level (admin, user)
  • documentation
  • project planning
  • different CI environments (prod, UA, staging)
  • build system
  • admin users
  • end users

After I have access, I like to look at the front end and the back end. By the front end, I mean the user interface. And by the back end I mean the data store. Just looking around and seeing what tables and pages an application has can help.

If the system in question is not entirely custom built, googling for the user guide for the default version of the application can be helpful. Finding that user guide and skimming through it can give me more high level understanding, as well as teaching me key nomenclature. Of course if there is any local documentation, that’s helpful too, but I read that with a skeptical eye, as it doesn’t always keep pace with the system.

I also like to look at logfiles. This can help me determine something as simple as if I’m on the correct server (if I reload the page and the access log file doesn’t change, I am looking at the wrong log file or am on the the wrong server). Even better if the system aggregates logs into something like an ELK system or papertrail.

Setting up a local development environment can help. Again, this lets me gain an understanding of the big picture components, and also lets me poke at various parts of a system, possibly breaking them, without affecting other developers or, worse, customers.

Asking questions is really important, but this can be hard because often the folks with the most knowhow are the busiest.

I also like to see what files or database tables change as I move through the system. With a modest sized database, I do this by taking a database dump before taking some action and then after. Then I diff the files, sometimes using sed to break the dump file apart even further (replacing all commas with commas and newlines, for example). If using mysqldump, you can target individual tables and make sure not to use extended inserts, as that makes diffing harder.

For the filesystem, it’s even easier. I touch a file (ts) and then take the action, then run find . -newer ts -print. This command will show me all the files the system has written that are newer than ts.

Hopefully some of these tips will be helpful to you as you navigate your next new system.


“Choose boring technology”

Reading a boring bookThis article on choosing boring technology is so good. It’s from 2015 so some of the tech it references is more mature, but the thesis is that the technology underlying a business should be well known and boring to implement. It also discusses how important cohesive technology can be (I love the final footnote).

I think this is an interesting contrast to microservices, which is essentially solving repeatedly for local optimizations and then relying on the API or message boundary to allow scaling. Microservices, in my experience, often get a bit hand wavey when it comes to operationalization (“just put it on Kubernetes, it’ll be fine”).

I have definitely dealt with this in previous companies where I had to put the kibosh on new technologies being introduced (“nope, we’re going to do it in java”). This meant that as a small team we had a standard deployment stack and process and could leverage our logging knowledge and debugging tools.

Here are some arguments I hear for using the latest and greatest technologies:

  • “Our developers will get bored and we’ll have a hard time recruiting”
    • This is a great reason to have regular hackfests and or support developers working on open source or side projects.
  • “New tech will help us do things faster”
    • It’s definitely worth evaluating new technology periodically and adding it to your stack (especially if it is outsourced, a la RDS, or boring, a la memcached). As the original post mentions, if you get substantial benefits from the new technology, then use it full bore and plan for a migration. Don’t end up in a state where you are half in/half out. Also consider tooling and processes to get things done faster–these may have quicker iteration times with lower operational risk.
  • “Choose the right tool for the job”
    • Most turing complete languages can be used for any purpose. And you shouldn’t be optimizing for the right tool for the job, but rather the right tool for the business for the job.

Really, you should just go read the post. It’s good.

 


“The future is already here, but it’s only available as a managed AWS service”

This entire post about how Kubernetes could become the distributed operating system of choice is worth reading.  But one statement really struck me:

Well, as they say, the future is already here, but it’s only available as an AWS managed service.

The “they” in this is apparently not William Gibson, as I thought.  More details here.

For the past couple of years the cloud providers have matured and moved from offering infrastructure as a service (disk, compute) to platform as a service offerings (sqs, which is a managed message queue like activemq, or kinesis, a managed data ingestion system like kafka, etc).  Whenever you think about installing a proprietary or open source package, you should include the cloud provider offerings in your evaluation matrix.  Of course, the features you need may not be there, or the cost may be prohibitive, but including them in an evaluation makes sense because of the speed of deployment and the scaling available.

If you think a system architecture can benefit from a message queuing system, do you want to spend time setting up and maintaining such a system, or do you want to spin up an SQS queue in a few minutes?

And the cost may not be prohibitive, depending on the skillset of your internal team and your team’s desire to run such plumbing services.  It can be really hard to estimate running costs of infrastructure services, though you can estimate it by looking at internal teams and seeing similar services they run and how much money it takes.  The nice thing about cloud services is that the costs are very transparent.  The kinesis data streams pricing example walks through a scenario and concludes:

For $1.68 per day, we have a fully-managed streaming data infrastructure that enables us to continuously ingest 4MB of data per second, or 337GB of data per day in a reliable and elastic manner.

Another AWS instructor made the point that AWS and other cloud services invert the running costs of IT infrastructure.  In a typical enterprise, the running costs of your data center and infrastructure are like an iceberg–10% is explicit (server costs, electricity, etc) and 90% is implicit (payroll, time spent upgrading and integrating systems).  In the cloud world those numbers are reversed and far more of your infrastructure cost is explicitly laid out for you and your organization.

Truly the future.


Thoughtworks Radar

Last night at the Boulder Ruby Meetup, Marty from Haught Codeworks walked through the Thoughtworks Technology Radar.  This is an informative way to keep on top of the latest technology trends without spending a lot of time researching and reading.  Sure, you could get the same information from reading Hacker News regularly, but the Radar is less haphazard.

The Radar is published every six months or so, and pulls from experts inside Thoughtworks (a leading software consultancy). You can view techniques, languages and frameworks, tools, and platforms.  Each area has a number of technologies and the proposed level of adoption (hold, assess, trial, adopt).  Within each radar you can dive in and learn more about the technology, like say, weex. You can also see the movement of a technology through the proposed levels of adoption over time.

You can also search for a given technology, if you are interested in seeing what the status us.  Sometimes technologies are archived, like Elm.

Note that this is not perfect.  The lag is real, but may help you avoid chasing shiny objects.  The Radar is also inherently biased because of the methodology, but given Thoughtworks size, scope and leadership, it’s probably one of the best technology summaries.  It’s certainly the best one I’ve run across.


Meetup talk outline

If you are thinking about doing a tech talk at a meetup, you should!  It’s a great way to deepen your experience, try a different skill and learn a lot.  It also has the benefit of making you a higher profile developer.

I was coercing a friend into talking at a meetup and he asked if I had any questions for his talk.  ‘X’ is what he was talking about.  (Where ‘X’ in this case was webhooks, but it could be any technology or protocol that is of interest to you.)

I rattled off the following set of questions that would be of interest.  I thought they might make a good template for any future meetup talks, so wanted to record them here for posterity.

  • what is X?
  • why does X exist?
  • what are prominent apps that use this tech?
  • how do you use it?
  • how would you test it?
  • how do you deal with dev/test/prod environments?
  • are there any gotchas?  Have any war stories?
  • how do you troubleshoot?
  • alternatives?  strengths and weaknesses of this solution or the alternatives?
  • any third party libraries that someone should be aware of?  How about tools?

What do you want to hear from presenters?


Boulder Startup Week

If you are into the tech scene in Boulder, Boulder Startup Week is a great set of events–it’s coming up May 15-19 this year.  This is a totally volunteer run set of events which highlight various aspects of startup and technology in the Boulder area.  You can learn more at the website.  It’s a great place to network and to learn about new things.

I’m lucky enough to be participating in two events this startup week.  I’ll be hanging out at the Engineering Leadership dinner.  And I’ll be presenting on bootstrapping a startup as a developer with a few other bootstrappers.  Most of my short presentation will cover lessons I’ve learned from joining The Food Corridor.  I’m especially looking forward to hearing about Brian and Inversoft that day, because I’ve been friends with him for a number of years and have followed along with some of his trials and triumphs.

Hope to see you there!


The Four Types of Slacks

I have been using slack for a few years now, but have really noticed an uptick in the last year or so.  (If you aren’t familiar with slack, here’s an intro to slack usage, and if you are, here’s a great code of conduct for public slacks.)
It seems to me that there are four main types of slack groups.

The first is the company/department slack.  This slack is long lived, contains many channels, and is multi purpose.  There are channels for ops, marketing, etc.  This slack is typically limited to the employees of a company, though contractors are also given access.  The main purposes of this slack are an ad-hoc knowledge base and to reduce email.  Depending on IT, this slack may be under the radar and compete with other solutions like hipchat, wikis or internal mailing lists.

The next type is a project slack. This is related to the company slack, but is less long lived, and has fewer members and channels.  It is used for coordination amongst disparate people, often a set of contractors.  May be maintained by the client or prime contractor, also serves as an ad-hoc knowledge base, but is primarily a means for coordination of effort.

Both of the above slacks may have other integrations with systems (CI/CD, monitoring, etc).  These integrations with external systems can make the slack a one stop shop for corporate knowledge and memory, especially if the members are on paid accounts.

The above types are obviously limited in membership.  The next two types of slacks are more public.

Another type of slack is the event slack.  This slack replaces or augments Twitter as a way for people at a conference to communicate.  May exist between events, but is quiescent while events are not happening.  Here channels may be related to aspects of the event or tracks, and the slack is typically owned by the event coordinator and provided as a service to the conference attendees.

Slack can also be an email list replacement.  I have been a member of several email lists for user groups/meetups in the front range, and it serms much of the activity on some of them have been driven to slack (the BDNT meetup is a good example). In addition, I see a lot of new slacks being created that would, a decade ago been email groups.  (Facebook groups are also a replacement for email groups, depending on your audience, but I have found slack to be far superior in searchability.). The number of channels is typically related to member list size and length of existence.  I have found these slacks be on the free slack plan, with its limits. I have also heard of slacks of this type charging for membership.

What has been your slack experience?  What did I miss?


Bare minimum of ops tasks for heroku

Awesome, you are a CTO or founding engineer of a newborn startup.  You have an web app up on Heroku and someone is paying you money for it!  Nice job.

Now, you need to think about supporting it.  Heroku makes things way easier (no racking and stacking, no purchasing hardware, no configuring apache) but you still to set up some operations.

Here is the bare minimum you need to do to make sure you can sleep at night.  (Based on a couple of years of heroku projects, and being really really cheap.)

  • Have a staging environment
    • You don’t want to push code direct to prod, do you?
    • This can be a free dyno, depending on the complexity of your app.
    • Pipelines are nice, as is preboot.
    • Cost: free
  • Have a one line deploy.
    • Or, if you like CD/CI, an automatic deploy or a one click deploy.  But make it really easy to deploy.
    • Have a deploy script that goes straight to production for emergencies.
    • Cost: free
  •  Backups
    • User data.  If you aren’t using a shared object store like S3, make sure you are doing a backup.
    • Database.  Both heroku postgresql and amazon RDS have point and click solutions.  All you have to do is set them up.  (Test them, at least once.)
    • Cost: freeish, depending on the solution.  But, user data is worth spending money on.
  • Alerting
    • Heroku has options if you are running professional dynos.
    • Uptimerobot is a great free third party service that will check ports every 5 minutes and has a variety of alert options.  If you want SMS, you have to pay for it, but it’s not outrageous.
    • Cost: free
  • Logging
    • Use a logging framework (like slf4j or the rails logger, and mark error conditions with a string that will be easy to search for.
    • Yes, you can use heroku logs but having a log management solution like papertrail will make you much happier.  Plus, it’s free for 2 days of logfiles.
    • Set up alerts with papertrail as well.  These can be more granular.
    • Cost: free
  • Create a list of third party dependencies.
    • Sign up for status alerts from these.  If you have pro slack, you can have them push an email to a channel.  If you don’t, create an alias that receives them.  You want to be the person that tells your clients about outages, not the other way around.
    • Cost: free
  • Communication
    • Internal
      • a devops_alert slack channel is my preferred solutions.  All deploys and other alerts go there.
    • External
      • create a mailing list for your clients so you can inform them of issues easily.  Google groups is fine, but use whatever other folks are using.  Don’t use an alias in your email–you’ll forget to add new clients.
      • do not use this mailing list for marketing purposes, unless you want to offload the burden of keeping the list up to date to the marketing department.
      • do make sure when you gain or lose clients you keep this up to date
    • Run through a disaster in your mind and make notes on how you would communicate the issue, both internally and externally.  How often do you update your team?  How often do you update your clients?  What about an internal issue (some of your code screwed up) vs an external issue.  This doesn’t need to be exhaustive, but thinking about it ahead of time and making some notes will help you in the crisis.
    • Cost: free

All of this is probably a four hour project, max.

But once this is done, you’ll rest easier at night, knowing you have what you need to troubleshoot and recover from production issues.


Machine sympathy vs human constraint

I had beers with an work acquaintance recently. He’s a developer of a large system that helps contact management. Talk turned, as it so often does in these situations, to the automation of development work. We both were of the opinion that it was far far in the future. This was three whole decades of experience talking, right? And of course, we weren’t talking our book–ha ha. I’m sure that artisan weavers in the 1800s were positive that their bespoke designs and craftmanship would mean full employment no matter what kind of looms were developed.

But seriously, we each had an independent reason for thinking that software development would not be fully automated anytime soon.

My reason:

It’s very hard to fully think through all the edge cases of development. This includes failure states, exceptional conditions, and just plain human idiosyncrasies. Yes, this is what every system must do. That’s right. Anything you want handled by an automated system has two options: plan for every detail or bump exceptional cases up to human beings to make judgements. The former requires a lot of planning and exercising the system, while the latter slows the system down and introduces labor costs into the mix.

This system definition is hard to do and hard to automate. I’ve seen at least five new languages/IDEs/software platforms over the years that claimed to allow a normal human being to build such robust automatic systems, but they all seem to fail in the short term. I believe that is because normal human beings just don’t think through edge cases, but those edge cases are a key part of software.

His reason:

When systems reach a certain size, abstractions fail (I commented about this years and years ago). Different size, different failures. But just as an experienced car mechanic knows what kind of system failures are likely under what conditions, experienced software engineers, especially those who understand first principles, have insight into these failures. This intution (he called it “machine sympathy”) is something that can only be acquired by experience, and, by its very nature, can’t be automated. The systems are so complex and the layers so deep that every failure is likely to be unique in some manner.

So, which one is more likely to remain a relevant issue. It depends on the organization and system size. Moore’s law (and all the corollaries for other pieces of software systems) works both for and against machine sympathy. For, because, as hardware gets better, the chances of system breakdown decrease, and against, because as hardware gets better, larger and larger systems get more affordable. Whereas I believe the human constraint is ever present at all sizes of system (though less present in smaller ones where there is less concern about ‘bumping up’ issues to humans, or even just not handling edge cases at all).

What do you think?


The Trouble with Snapchat

I joined Snapchat a while ago. I found tremendous value in the snapstorms by Mark Suster. And some value in the chats from Justin Kan and Gary Vanyerchuk. I’m no Snapchat expert–never made a snap. Just followed people for their stories. But I was interested and was checking the app a couple times a day for a while.

Yet, now I deleted the app from my phone.

Why?

Because even though I was getting value from the media I was consuming, there were two major issues.

  • I couldn’t share a great snapchat. Other than suggesting “hey, why don’t you get on snapchat and follow this person because they are talking about lots of interesting things”, you can’t share the knowledge. I didn’t think that was very important until the fifth or sixth time I thought “geez, XXX would really enjoy this” and then realized I couldn’t share it with them and felt a twinge of annoyance. I miss having a universal resource locator that I can share as I please.
  • I couldn’t consume a snapchat when I wanted to. I often will email myself an article, or leave a tab open, or even post it to Twitter or Hacker News if I scan it and know I’d like to come back and read it more fully later. Even in my Twitter or Facebook feeds, I can scroll back years if I want to. Snapchat forces you to consume content on their schedule. And that gets frustrating.

I can see why both of these attributes good for content creators–they force the consumer to engage more. More on that here, from msuster. But this consumer is saying goodbye to Snapchat. At least until they give me URLs.



© Moore Consulting, 2003-2017 +