Skip to content

Five rules for troubleshooting an unfamiliar system

trouble photo
Photo by Ken and Nyetta

A few weeks ago, I engaged with a client who had a real issue.  They sold a variety of goods via a website (if this was the 90s, they would have been called an ‘e-tailer’), and had been receiving intermittent double orders through their ecommerce system.  Some customers were charged two times for one order.  This led, as you can imagine, to very unhappy customers.  This had been happening for a while and, unfortunately, due to some external obstacles, internal staff were not available to investigate the issue–they had their hands full with an existing higher priority project.

I was called in to see if I could solve this issue.  I had absolutely no familiarity with the system.  But in less than ten hours of time, I was able to find the issue and resolve it.  How I approached the situation can be summed up in five rules:

Number one: define the problem.  Ask questions, and capture the answers.  What is the exact undesired behavior?  When is the undesired behavior happening?  What seems to trigger it?  When did it start?  Were there any changes that happened recently?  Does the client have reproduction steps?

I gathered as much information as I could, but keep it high level.  I asked for architecture and system diagrams.  For the history of the application.  For access to all systems that could possibly be relevant (this will save you time in the future).  For locations of log files, source repositories, configuration files.  For database credentials and credentials for third party systems like CC processors.  It is important at this time to resist the temptation to dive in–at this point the job is to get a high level understanding so I can be efficient in the next steps.

You will get speculation about what the solution is when you are asking about the problem.  Feel free to capture that, but don’t be influenced by it.

Number two–find the finish line.  After getting a clear definition of the problem, I looked in the orders database and find out if the double orders were showing up there.  They were, which was a clue as to which part of the system was malfunctioning, but more importantly let me see the effectiveness of any changes I was making.  It also lets the customer know the objective end goal, which can be important if this is a t&m project, and it let me know the end state to which I was headed–important for morale.  (BTW, don’t do fixed bids for this type of project–overruns will be unpleasant, and there will be overruns.)

I was able to write a SQL script to find double orders over a given time frame.  I ended up writing a script which emailed the results of this query to myself and the client nightly, as an easy way to track progress.  The results of this query were a quantifiable, objective measure of the problem.

Number three–start where you are familiar.  I could have dove in and looked at the codebase, but due to my problem definition, I knew that there had been no changes to the checkout portion of the code base for years.  I also was unfamiliar with the particular software that managed the ecommerce site and could have wasted a lot of time getting up to speed on the control flow.  Instead, once I had the SQL query, I could find users that had been double charged, and look at their sessions in the web server logs.  I’ve been looking at apache http logs for over a decade and was very familiar with this piece of the system.

Number four–follow your nose. I followed a few of the user sessions using grep and noticed some weirdness in the logs.  There were an awful lot of messages that indicated the server had been restarted, and all the double orders I looked at had completed 5-6 seconds after the minute changed.  (It’s hard to define weirdness explicitly, which is why it behooved me to start with a portion of the system that I was experienced with–it made the “weirdness” more obvious.)  From here, I ended up looking at why or how the server was being restarted regularly.  Ended up finding an errant cron job which was restarting the server often enough that the ecommerce system was getting confused and double booking orders–once before the restart and once after.  This was easily fixed by commenting out the cron job.

Number five–know when to stop.  This ecommerce system obviously had a logic flaw–after all, restarting the web server shouldn’t cause an order to be entered twice, whether you restart it every hour or once a year.  I could have dug through the code to find that out.  But instead, I commented out the cron job, let the system run for a week or so and waited for more double orders.  There were none, indicating that the site was low traffic enough that whatever flaw was present didn’t get exercised often, if at all.  I confirmed with the client that this situation met his expectations of completeness, and called it good.

Being thrown into a new system, especially when troubleshooting, is a difficult task.  I am thankful the client was relatively responsive to my questions, and that pressure, while present, wasn’t intense.  These five steps should help you, if you are put in any troubleshooting situation.

What a pleasurable way to learn a language!

This site was recommended to me, and I have to say, it is a fun way to become more familiar with the syntax of a language. There’s the journey aspect:

things are not what they appear to be: nor are they otherwise
your path thus far [...X______________________________________________] 19/280

and the fact that when you see something you want to investigate further, you just write another unit test:

  def test_slicing_arrays
    array = [:peanut, :butter, :and, :jelly]

    assert_equal [:peanut], array[0,1]
    assert_equal [:peanut,:butter], array[0,2]
    assert_equal [:and,:jelly], array[2,2]
    assert_equal [:and,:jelly], array[2,20]
    assert_equal [], array[4,0]
    assert_equal [], array[3,0] # my addition
    assert_equal [], array[4,100]
    assert_equal nil, array[5,0]
  end

Now, running through these koans certainly isn’t going to make me a Ruby expert, but I will have passing familiarity with the language and be ready to use it on my next small project.

Apparently I’ve been living under a rock, because there appear to be koans projects for quite a few languages: java, haskell, erlang (cue whatsapp reference), and even bash. I was, however, unable to find a koans package for assembler.

How to use platform specific configuration in your Cordova app

This post comes out of a question I answered over at the guest post I did on Devgirl’s blog, ‘Three Hooks Your Cordova Project Needs’.

A commenter asked:

How do you retain the project level settings for cordova Android projects? Platforms folder removes project level setting when you run ‘cordova platform rm android’

I answered over there, but thought I’d expand a bit and write a post here.

When you are doing Cordova development, there are two main tooling paradigms. You can use native tooling (Eclipse, XCode) to manage your source, edit your javascript and CSS, etc–this is called ‘Native Platform Dev’. Or you can use tooling more typically used in web development (a text editor like Sublime or vi) plus Cordova CLI–this is called ‘Web Project Dev’. Here’s a bit more on the names of these paradigms.

In the first case, you are probably not removing the files under platform all that often–you are more likely to work out of that directory. In the second case, everything under platform is derived from your www directory, plus your plugins, so you can remove the platform directory easily.

I can’t really speak to Native Platform Dev, because it isn’t a Cordova workflow I’ve used. My book is entirely about Web Project Dev and how to do it most efficiently. If that is your paradigm, I imagine you won’t have much trouble with platform specific settings, because the native tooling is pretty good about capturing that in version control, so you can rely on it.

If, on the other hand, you are using Web Project Dev, then if you want to modify platform specific settings, you have three options:
You need to either:

  1. Only modify your project in ways that can be expressed in config.xml. Review the config.xml reference and the platform guides to figure out if your needed customization can be captured in this way.
  2. Write an after_platform_add hook which copies your changes over from elsewhere in your source tree (if you have a modified .java file for example).
  3. Write a plugin which modifies an XML files (AndroidManifest.xml) to insert your needed project level config, (an IntentFilter, for example), and add that plugin to your project in an an after_platform_add. Note you can only add XML nodes to config files with plugins, you can’t modify attributes or remove nodes.

Which of these is correct for you depends on exactly which platform specific feature you are trying to modify.

Finding company phone numbers

I ran into a situation recently at work where I was trying to find the phone number of a company. Checked the company site, including the footer and contact us page, and no phone number was available.

Now, I can see why, as a small startup (which is what this company was), you would not want a phone number available. But we were doing due diligence and I thought a phone call or two would be appropriate.

So, here’s a list of places to go to find a company’s phone number, if it isn’t available on their site:

  • Dun and Bradstreet. Especially useful if the company has an iphone app, because Apple requires DNB registration for a corporate developer account.
  • Facebook often has different information than a corporate website, if they have a facebook page at all.
  • The whois database sometimes has address and phone numbers associated with a domain.
  • The Secretary of State office, for whatever state they are in. This is where business documents are file, and is worth checking.
  • Twitter. Depending on your situation, you can also just tweet them directly: “@foo, I’m looking to call you, do you have a phone number”.
  • LinkedIn, to look for people who know people who work there. This may be more or less useful based on the spheres you run in and where you live.
  • The Wayback Machine, which lets you see how websites appeared at various points in time. This is useful if the company at one point had a phone number on their website, but now does not.
  • Use the above methods on any other names you have turned up during your search, as well.

It is simply amazing what you can find on the internet with some digging. So, if you are looking to find the phone number of that company, because you need to talk to them, don’t give up.

Trying out a habit

I have been wanting to practice meditation for the longest time.  Periodically, I would subscribe to newsletters, read articles, download apps (I love the Chakra Chime app) watch videos, and get fired up about the benefits.  Then I would meditate for one or two days, and then would have a tough day and fall into bed exhausted, meditation forgotten.  Having fallen off the bandwagon one day, it was easier to skip it the next day, then meditate the following day, then skip it the next three days, until I wasn’t meditating at all.

I mentioned this difficulty to Corey, a friend, and he recommended a different approach.  It has three components:

  1. A monthly calendar.  You can print one out from this site.  Write the activity at the top.  Put it by your bed.
  2. A sharpie.  Put it by your calendar.
  3. An agreement with yourself that no matter what, you’ll do what you want to do once a day.
Once you perform the activity, you can put a big fat X on that calendar.  I’ll tell you what, once you get four or five Xes, you start to gain some momentum.  Even when I’ve had some really long tiring days, I still want to keep the streak going, and the calendar provides that extra bit of motivation to do it.
I don’t know if I’ll continue to meditate once I’m done with the calendar, but at the least this method made it easy to try it out as a habit.  If you have a habit you’ve been wanting to try, but haven’t been able to make room in your life for, try Corey’s three step method.

 

Using Munin To Track Business Values

Munin is a great piece of software that we use at my company to track overall trends in disk usage, CPU and other system purposes.  Now, we don’t have a ton of servers, so I’m not sure how munin scales for many machines, but it has been invaluable in troubleshooting problems and giving us historic context.

One thing we’ve started to do is to incorporate business specific metrics into munin.  This is good because it ties the technical operations more tightly to the business, making us aware when there are issues.

Anything you can run a sql query or do a wget for, you can graph in munin.  (Here’s something I wrote about writing munin plugins a year ago.)

I don’t think that munin is acceptable as a general purpose dashboard.  I’d probably look at Google Analytics if I was web drivingdriven (updated Feb 25 2012), and at statsmix if I needed to integrate a bunch of disparate services.  But for bringing additional business awareness to a technical team, writing a few custom munin plugins that will graph key business metrics can be very useful.

How to connect a Jabra VBT2050 earpiece to a Palm Centro

Everyone should use an earpiece.  My SO is reading “Disconnect: the truth about cell phone radiation, what the industry has done to hide it, and how to protect your family” and it’s some scary stuff.

I’ve struggled with setting up my earpiece enough times that I want to document what I did just now.  There are a lot of instructions on the internet, but they all seem incomplete, or aimed at a different phone.  Here’s the Jabra manual (PDF), even though it isn’t much help.

So, here are my step by step instructions on how to connect my AT&T Palm Centro with my Jabra VBT2050 earpiece.

  • Turn off the earpiece by holding down the side button until you see 4 fast blinks.  Disconnect it from the charger.
  • Then, turn off bluetooth on the phone.  Disconnect it from the charger.
  • Turn on the bluetooth on your phone
  • Make sure your phone’s bluetooth setting is ‘visible’
  • On the phone, choose ‘setup devices’
  • Choose ‘trusted devices’
  • Choose ‘add device’
  • Turn on the Jabra earpiece by holding down the button you used to turn it off
  • Continue to hold the on/off button down until you see it blinking three times slowly.
  • Press the center button of the earpiece (the one you use to connect/disconnect calls) for about 10-20 seconds, until the light on the earpiece is steady.
  • Choose ‘find more’ on your phone.
  • Select the Jabra earpiece.
  • Choose ‘OK’
  • Enter the passcode: 0000 (I’ve not found a way to change this, which seems rather idiotic).
  • Choose ‘OK’  You should now be at the ‘Trusted devices’ screen.
  • Select the Jabra earpiece from the list
  • Press the menu key on your Centro.  It’s next to the ‘Alt’ key.
  • Select ‘Connect’.  The earpiece should be blinking one time slowly.
  • Click ‘OK’ on out.  You should see a blue set of headphones next to your bars when you get to the main screen.
  • You can change your phone’s bluetooth visibility to ‘hidden’

Hope this helps!

Gratitude List

Every so often, I just need to stop and say thanks.

Thanks for my family and parents, providing all kinds of education when they raised me.

Thanks to my wife, for being patient and loving, even when I’m not my best.

Thanks to my past and current clients, colleagues and employers, for giving me amazing opportunities, challenging me and trusting me.

Thanks to the vast number of people who contribute to making the internet a fantastic place (this is a software blog, after all).  Whether you create content or build plumbing, the internet is richer for your contribution (even the guy behind lolcats, but especially stuff like this and this).

Every so often, I get wrapped up in some small drama, but nothing knocks me back on my heels and makes me say ‘Damn, I’m lucky’ like making a gratitude list, even one like this which is incomplete.

Say thanks–it feels good!

Shuttering a blog

Not this one!  But recently, as part of an effort to simplify my life, I’ve shuttered a couple of blogs I had been running for a while.  Here’s my list of best practices to do so:

  • Forgive yourself.  Running a blog is a lot of work, so if you don’t have time to do it adequately, it’s better to admit it than to do a poor job.
  • Try to find someone else who is interested.  If you’ve built up a following (you know how many visits you’re getting a day, right) and want to continue to see whatever topic you are blogging about more fully explored, you may be able to find someone else.  If you do, make sure to add them as a contributor, and discuss where they see things going from there–do they want to fully take over the blog, just write the occasional post, or what?
  • Announce the hibernation on the blog.  Feel free to phrase is as a long pause, or a hibernation, or whatever, but make sure you tell your readers.
  • Leave the content up for as long as you can.  Consider moving it to a different server if you need to consolidate, but if you’ve put up useful posts, people will continue to visit via search engines.
  • Leave up your contact info as well.  I’ve had several emails about my shuttered blog, and they’ve actually spiked my interest to start writing again.
  • Shut down your CTAs; for instance, I had a newsletter signup on one blog.  To me, it’s not fair to ask people to give their email address if I’m never going to send them useful content.  Along with the previous two suggestions, this is really about being respectful of your readers.

Do you have any tips for shutting down a blog in a graceful manner?  Here’s an interesting post on the topic, but written from the perspective of a more professional blogger.