Five rules for troubleshooting an unfamiliar system

trouble photo

Photo by Ken and Nyetta

A few weeks ago, I engaged with a client who had a real issue.  They sold a variety of goods via a website (if this was the 90s, they would have been called an ‘e-tailer’), and had been receiving intermittent double orders through their ecommerce system.  Some customers were charged two times for one order.  This led, as you can imagine, to very unhappy customers.  This had been happening for a while and, unfortunately, due to some external obstacles, internal staff were not available to investigate the issue–they had their hands full with an existing higher priority project.

I was called in to see if I could solve this issue.  I had absolutely no familiarity with the system.  But in less than ten hours of time, I was able to find the issue and resolve it.  How I approached the situation can be summed up in five rules:

Number one: define the problem.  Ask questions, and capture the answers.  What is the exact undesired behavior?  When is the undesired behavior happening?  What seems to trigger it?  When did it start?  Were there any changes that happened recently?  Does the client have reproduction steps?

I gathered as much information as I could, but keep it high level.  I asked for architecture and system diagrams.  For the history of the application.  For access to all systems that could possibly be relevant (this will save you time in the future).  For locations of log files, source repositories, configuration files.  For database credentials and credentials for third party systems like CC processors.  It is important at this time to resist the temptation to dive in–at this point the job is to get a high level understanding so I can be efficient in the next steps.

You will get speculation about what the solution is when you are asking about the problem.  Feel free to capture that, but don’t be influenced by it.

Number two–find the finish line.  After getting a clear definition of the problem, I looked in the orders database and find out if the double orders were showing up there.  They were, which was a clue as to which part of the system was malfunctioning, but more importantly let me see the effectiveness of any changes I was making.  It also lets the customer know the objective end goal, which can be important if this is a t&m project, and it let me know the end state to which I was headed–important for morale.  (BTW, don’t do fixed bids for this type of project–overruns will be unpleasant, and there will be overruns.)

I was able to write a SQL script to find double orders over a given time frame.  I ended up writing a script which emailed the results of this query to myself and the client nightly, as an easy way to track progress.  The results of this query were a quantifiable, objective measure of the problem.

Number three–start where you are familiar.  I could have dove in and looked at the codebase, but due to my problem definition, I knew that there had been no changes to the checkout portion of the code base for years.  I also was unfamiliar with the particular software that managed the ecommerce site and could have wasted a lot of time getting up to speed on the control flow.  Instead, once I had the SQL query, I could find users that had been double charged, and look at their sessions in the web server logs.  I’ve been looking at apache http logs for over a decade and was very familiar with this piece of the system.

Number four–follow your nose. I followed a few of the user sessions using grep and noticed some weirdness in the logs.  There were an awful lot of messages that indicated the server had been restarted, and all the double orders I looked at had completed 5-6 seconds after the minute changed.  (It’s hard to define weirdness explicitly, which is why it behooved me to start with a portion of the system that I was experienced with–it made the “weirdness” more obvious.)  From here, I ended up looking at why or how the server was being restarted regularly.  Ended up finding an errant cron job which was restarting the server often enough that the ecommerce system was getting confused and double booking orders–once before the restart and once after.  This was easily fixed by commenting out the cron job.

Number five–know when to stop.  This ecommerce system obviously had a logic flaw–after all, restarting the web server shouldn’t cause an order to be entered twice, whether you restart it every hour or once a year.  I could have dug through the code to find that out.  But instead, I commented out the cron job, let the system run for a week or so and waited for more double orders.  There were none, indicating that the site was low traffic enough that whatever flaw was present didn’t get exercised often, if at all.  I confirmed with the client that this situation met his expectations of completeness, and called it good.

Being thrown into a new system, especially when troubleshooting, is a difficult task.  I am thankful the client was relatively responsive to my questions, and that pressure, while present, wasn’t intense.  These five steps should help you, if you are put in any troubleshooting situation.

Sometimes you just need a technical project manager

stacked blocks photo

Photo by A. Drauglis

As a developer, my skills are applicable across a wide variety of domains.  However,  I re-engaged recently with a prospective client that I wrote about a while ago. They had used another booking solution over the past few months and still had the same pain.  After thinking and doing some research, I can to a conclusion that hiring a developer was not the right answer for them.

This company had an interesting set of constraints:

  • Wedded to a platform (Shopify) because of previous investment and its excellent shopping experience.
  • The platform doesn’t provide all the functionality needed.
  • Users (both internal and customers) are ill served by another system to login and manage.
  • No third party plugins seem to meet the needs, either alone or in combination (at least, no solutions that I could find).  It’s apparently a fairly unique problem set.
  • They were not interested in solving the problem in incremental steps.
  • They had budget limitations (don’t we all).

In this situation, the best solution is to look for someone who can undertake the following steps:

  1. Write up a clear description of the problem. This doesn’t have to be a detailed requirements doc, but should be a clear explication of the issues, needs, timelines (if any), and current systems.
  2. Post the description to the shopify forums and contact the platform vendor directly, to see if anyone has encountered any of the same issues and ask how they solved them. The point of this isn’t to solve the problem, it’s to see if anyone else has solved pieces of the problem. This will help the company identify partners and/or adjust scope of the project. (If Shopify customer support says ‘whoa, we’ve never heard of this’, it’s a different size problem than if they say ‘well, you might want to bolt these three pieces together’.)
  3. After the client has more knowledge, send the requirements to Shopify focused dev shops (and possibly Elance contractors who work with Shopify, but not just Shopify themes). Work with at least two to three of them to see what a solution would cost, either custom or building on their current code. At this time, avoid getting development quotes from anyone who doesn’t have experience with Shopify development (like me!), simply because the integration with the platform is so critical.
  4. Evaluate the results of the RFP process, including following up any avenues that the experts or forums turn up. Consider whether the budget allows for a comprehensive solution or whether it makes more sense to look at point solutions for the high pain areas.
  5. If the results point to a comprehensive solution being within budget, engage the solution provider.  If not, identify the high pain areas and go back to step 1 with the smaller scope.

So, what this client really needs is a technical project manager (TPM). While most freelancers have this skill to some degree, as it is hard to survive without it, making a full time living as a contract project manager is difficult. I know of only one person in 15 years who was making a living as a contract project manager (and she’s not doing it anymore). This particular project doesn’t seem like a full time effort, so the client should be able to get by with a moonlighter, at least until the results of step 4 are known.

Good technical project managers are hard to find. From a friend’s company’s job req (his company who is looking for a TPM, and the job desc captures the description of the skill set well), they have:

balance between hands-on technical knowledge, a ravenous appetite for order, and an understanding of humans and how they work.

Developers (or former developers) who want to project manage and have people skills aren’t quite the unicorn as someone who can both design and develop, but they are almost as rare.  Unless the company can find a contract project manager who has technical chops, they’ll want to either source this internally or find an external contractor with another skill set (developer or designer) who wants to PM this project.

This is a great chance to for an employee to expand their skill set. Based on my experience managing vendors during my time at 8z (a 4 month website relaunch and a longer term, less time intensive data provisioning engagement), I would advise that this employee have technical skills.  Challenging custom solution providers on technical grounds, or at least being able to follow along, ensures the company will get the best solution. This doesn’t work for “take it or leave it” SaaS apps, but this project appears custom enough that the TPM really needs to have both business and technical considerations in mind when managing the development shop. If they are looking for an outside contractor, engaging with a designer or developer who has PM experience would be an option.

As far as stretching budget, proposing a lower price to the development shop in exchange for shared ownership of the code might make sense. The partner could market this code to other clients and recoup some costs, while the client retains a perpetual license.

Sometimes, a generic developer isn’t the right answer for a software development problem. A technical project manager (or someone wearing that hat) can often stretch budget and leverage skill sets of focused development teams.

Helping a friend gather data and reach prospects with gentle intros

coffee photo

Photo by My Aching Head

I had coffee with a friend the other day, and he shared a business idea. I thought it was an awesome idea–I certainly saw the need in the marketplace and believed he had the skillset and resources to execute on the idea.

He’s still in the exploratory phase, so I offered to send gentle intros to people in my network who I thought would benefit from his idea. (The target market is anyone with a custom web application that makes money, or anyone who builds custom web applications and is looking for a way to provide ongoing support–if that is you, contact me if you would like to learn more.)  I asked him to write a small spiel that he’d feel comfortable with me sharing.  If you are thinking of doing this, make your friend write a spiel for you.  If they can’t write a spiel, chances are they won’t be good at follow up and your intros will be wasted.

Then, I went through my LinkedIn network and put contacts into categories:

  1. this person (or the company for which they work) might want to partner with my friend
  2. this person (or the company for which they work) is a possible client for my friend’s offering
  3. this person might know people who are in categories 1 or 2.
  4. this person (or the company for which they work) is not a good fit for what my friend is working on
  5. who is this person?

And then I sent soft pitch emails to almost everyone in categories 1, 2 and 3.  The content varied based on which category someone was in, but for category 1, the email was something like:

I have a friend who owns a hosting company who is looking to talk to consulting companies about a possible new product he is thinking about offering.  Here is his spiel:


[…spiel from friend …]


I wasn’t sure if this kind of software maintenance was something that your company wanted to keep inhouse, or if you would be interested in discussing this with him.  I wanted to check before I did intros.    Is this something you think is worth learning more about?

This way, my friends and contacts on LinkedIn don’t get spammed from someone they don’t know.  Instead, they get an informative email from me, asking if they want to learn more.  If they do (and about 10% did), I do mutual introductions, and then the ball is in their court.  (Side note: here’s a great intro email etiquette guide.)

Why did I do this?  Well, there were a couple of reasons.

First and foremost, because I thought it would be a win win for both sides.  My friend gets more data about his offering and how the market will react to it.  My contacts/friends on LinkedIn learn about a new product from a trusted source.

Second, I was able to do some social network housecleaning.  I was able to ‘unlink’ with all people in category #5–it’s always nice to clean up your social graph.

Third, I reached out to people and had some interesting conversations.  Some folks I hadn’t talked to in years.  It’s good to reach out to people, and always better to do so with something of use to them, rather than a plea for work.

This was a fair bit of effort (a couple of hours).  I can’t imagine doing this monthly, but once a quarter seems reasonable, especially if I’m reaching out to a different segment of my network each time.  And I don’t have to do the whole process every time–spiel, linkedin, soft pitch, intro.  I actually like scanning news sites and simply sending interesting articles to old contacts: “Thought you might be interested in this <link> because of XXX and YYY”.  Those are super simple to send, and again, provide value and raise your profile.

Next time you talk to a friend who has a great idea, who can execute on it, and who will follow up with anybody you introduce them to, consider reviewing your social graph for prospects.  Gentle intros can benefit all three of you.

#TBT: Precision and Accuracy in Software

I originally wrote this in Dec of 2004. I still think that having someone who can answer engineers’ questions authoritatively increases productivity (of the engineer). However, now I’d emphasize that developers need to spend some time learning their domain to gain some intuition, and truly great business software engineers will learn when to bump a question up to a business person and when their intuition can be trusted.


Back in college, when I took first year physics lab, there was a section of the course that focused on teaching the difference between precision and accuracy in measurement. This distinction was crucial in experimental physics, since measurement is the bedrock of such experimentation. Basically, precision is how many digits of a measurement actually mean something. If I’m measuring the length of a room with my stride (and found it to be 30 feet long), the precision is less than if I were to measure the length of the room with a tape measure (and found it to be 33 feet, 6 and ¾ inches long). However, it’s possible that the stride measurement is more accurate than the length found with the tape measure, that is, it reflects how long the room actually is. (Perhaps there’s clothing on the floor which adds tape measurement, but which I stride over.)

These concepts aren’t just valid in physics; I think they’re also useful in software. When building a piece of software, I am precise if I build what I say I am going to build, and I am accurate if what I build actually meets the client’s business needs, that is, it solves the business problem. Almost every development tool either makes development more precise or more accurate.

The concept of precision lends itself easily to automation. For example, unit testing is rapidly gaining credence as a useful software technique. With unit testing, a developer writes test cases for each part of their code (often at the method level). The running of these tests ensures that code is actually doing what the developer thinks it is doing. I like writing unit tests; it gives me comfort to know that corner cases are taken care of and that changes to code can be fairly easily regression tested. Other techniques besides unit testing that help ensure precision include:

Round tripping: using a tool like TogetherJ, I can ensure that the model (often described in UML) and the code are in sync. This makes it easier for me to verify my mental model against the code.

Specification writing: The more precise a spec is, the easier it is to translate into code.

Compilers: the checking that occurs at compilation time can be very helpful in ensuring that the code is doing what I think it is doing–at a very low level. Obviously, this technique depends on the language used.

Now, precision is needed, because if I am not confident that I understand what the code is doing, then I’m in real trouble. However, accuracy is much more important. Having a customer onsite is a great example of a technique to ensure accuracy: you have a business domain expert available all the time for developers’ questions. In this situation, when a developer stumbles across a part of the business problem that they don’t quite understand, the don’t do what developers normally do (in order of decreasing accuracy):

  1. Ask another developer, which works great if the target audience is developers, but not so well otherwise.
  2. 2Best approximation (read: guess at the correct answer).
  3. Ignore the issue. (‘I’ve got a lot more code to write before I can go home today, and we’re shipping in two weeks. We’ll just let the customer discover it and deal with it as a bug.’)

Instead, they have a real live business person, to whom this software really matters (hopefully), who they can ask. Doing this makes it much more likely that the final solution will actually solve the business problem. Other techniques to help improve accuracy include:

Issue tracking software (I use Bugzilla): Having a place where questions and conversations are recorded is truly helpful in making sure the mental model of the business user and the programmer are in sync. Using a web based tool means that non-technical users can participate and contribute.

Specification writing: A well written spec allows both the business user and developer to have a sense of what is being built, which means that the business user can correct invalid notions at an early stage. However, if a spec is too detailed, it can be used to justify precision at the cost of accuracy (‘hey, the code does exactly what’s specified’ is the excuse you’ll hear).

Spring and other dependency injection tools, as well as IDEs: These tools help accuracy by decreasing the costs of changing code.

Precision and accuracy are both important in software engineering. Perhaps the best way to characterize the two concepts is that precision is the mapping of the programmer’s model of the problem to the computer’s model, whereas accuracy is the mapping of the business’ needs to the programmer’s model. However, though both are needed, accuracy is much harder to obtain. Knowing that I’m building precisely what I think I’m building is beneficial only insofar as what I think I’m building is actually what the customer needs.

Be an Informal Recruiter

link photo

Photo by elcovs

As I kick start my consulting business, I’m talking to many people–everyone in my network, anyone that I am referred to, and random people from Hacker News.  Employment is far less fungible than other purchases (even housing), so it behooves anyone selling labor to cast a wide net.

These introductions and conversations have not just given me the chance to talk about my skills and knowledge, but also to learn about other needs of the company.  Even if they don’t have need for a senior developer who can talk business and learn new languages, they may have a need for someone else in my network.

I’ve been about to do a few such intros since early August.  It actually is quite fun to do this, and it is good for karma.  It’s also a great way to stand out from the pack–if I an helpful to the organization before I take a job/contract with it, imagine how helpful I will be when I am engaged day to day.

During initial contact with the interesting organization, I talk about my skills and how that might fill the organization’s needs–after all, they are interested in meeting and learning about me.  But I also note any other needs, either via postings on their websites, needs they imply or mention in the conversation, or by simply asking them: “do you have any other needs at this time?  I like to help and would be happy to ping my network”.  I take notes.

Once I know some needs, I consider who in my network might help fulfill them.  Then I reach out to the members of my network and see if they can help. Typically, I send an email with the details of the need, and ask if they know anyone who might be a fit for the company.  Network members don’t have to be looking to make a move.  They will probably know of folks who can be a good fit.  For example, a marketing will typically know far more marketers than I will.  Therefore, if an organization I am interested in helping needs a marketing assistant, reaching out to marketers in my network and asking if they know anyone looking will be helpful. This interaction is useful to my network contacts–it lets them reach out to their network, opens a conversation with me, informs them of labor market conditions with minimal work on their part, and could end up in a new job if the fit is right.

This technique also lets me have a soft touch point with the prospect in a week or so.  I say something like “I reached out to my network about the position X you posted, and haven’t heard back from anyone, just wanted to let you know.”

If I don’t have a specific person who could help fill this role (either themselves or via their contacts), there are other ways to add value.  I’ve passed on recruiting tips (or interested tech articles), helpful employment sites, or general labor market advice such as “based on what I’ve seen, you might to have a hard time finding someone expert in tech X for pay Y”, or “in my experience, rates for tech X are $Y/hour”.  All of these add value to the interaction at very little cost to me.

Connecting people with openings is great for hiring managers on the other side of the hiring/contracting process.  If I am casting a wide net looking for contracts, I have recent data as well as a network and perspective worth sharing.  Since this is low cost to me and has benefits for me, the organization I am interacting with and members of my network, it is worth the extra effort to be mindful of needs and to send that intro email.

“Where there’s muck, there’s brass”

hydra photo

Photo by Andrew Jian

I was reading this Ask HN post about a consulting arrangement that seemed a Sisyphean task. Here’s an excerpt:

I have been asked to consult a company in how they should speed up their development process.


Today the application which in the end of it all is a web application, consists of a lot of old ASP classic code, a COM+ bridge for being able to function within a mix of a lot of different .NET libraries written in .NET 3.5 (CLR 2). COM+ acts as a bridge between the two technologies.


The teams cannot compile code inside the development tools, it can’t debug unless they do it with some obscure hacks and workarounds and it seems that no one is really in control of what is going on in the core code base and no one really want to touch the original code base to clean it up and refactor/re-write it.

Seems pretty hopeless, right?  “Teams can’t compile code”?!  Unfortunately, this type of task is more typical of consulting engagements than not. After all, if there was a simple solution, why would the company have engaged a high priced consultant?  (If you are consulting and not ‘high priced’, well, that’s a problem, but a different one better left for another post.)

The comments on the post are interesting and worth reading. I left one, but wanted to expand on it. Now, I’m not familiar with the tech stack at all, but I am familiar with a large codebase (where large is relative to the size of the team supporting it–100K LOC can be large to two person team) with a lot of technical debt that was crucial to the business.  I have also consulted for years.

Whenever you are consulting, the first task is always to ascertain the real problem.  Hint–it’s often not what you were explicitly hired to do.  In this case, I’m guessing the real issue is that the web application needs to change to meet business needs, and that it can’t do so fast enough because of the accretion of complexity.  But a guess isn’t good enough, you need to find out what you are being hired to do–it could be you are being hired to provide cover to spend money to rewrite the app or to be blamed when a development team misses dates or to actually speed up compilation.

Then, you need to learn who wants the task done, and who is writing the checks.  They are sometimes the same person, but not always.  You also need to learn how these folks want to be communicated with, including method, verbosity and frequency.

Finally, you can start to dig into the (software) problem.  (This process assumes you are doing time and materials billing.)  Do a preliminary investigation and look at some of the following:

  • given the end goal, what are intermediate steps that can get you there?  How long would it take to get one or two of these steps?
  • are there third party solutions that can get you 90% of the way to solving the business problem–this can include framework upgrades?
  • are there subsystems of the current hand coded solution that are isolated and can be reworked with minimal impact on the system?
  • are there one or two huge issues that would be a relatively easy win (version control, big bugs, moving configuration from code to a database, etc)?

Prepare ballpark estimates on the level of effort to accomplish some of these.  After that, you need to sit down with whoever wants the task done and whoever is paying for it.  Present your options, making it clear that any time estimates are truly SWAGs.

Let them decide.

If they ask for a recommendation, be prepared to make one, but the decision must be theirs.  They have the business context to know how much to invest in this system.

After that, either withdraw or start executing against the plan you and the decision makers have decided on.

Simple, right?

Build your capital

I was working on a post about how important it is to have a side project, but then read this post by patio11: “Don’t End the Week with Nothing”, which could be more accurately titled “Don’t End the Week with Nothing except your Paycheck”. Not that there is anything wrong with just having a paycheck, but Patrick’s point is that when you work on something you own, rather than something you are paid for, you can (in the right circumstances, with hard work and luck) get accumulating returns.

He did such a good job explaining how to move your career forward as a software developer (a superset of the topic I was covering with my “have a side project” post), that I wanted to call your attention to it. The whole article is worth reading, but here’s my favorite part:

Telling people you can do great work is easy: any idiot can do it, and many idiots do. Having people tell people you do great work is an improvement. It suffers because measuring individual productivity on a team effort is famously difficult, and people often have no particular reason to trust the representations of the people doing the endorsements.

This is one of the reasons I blog, it’s why I have spoken at several user’s groups, it is why I wrote a book, and it is why I have a side project.

Q and A: Unexpected technical issues when contracting

I got a question from a friend who is doing some freelancing.

Perhaps an odd question, but when you do web work and run into issues that are only showing up in IE browsers, do you bill the client for the extra time it takes to try to figure out how to make the site work on that crappy browser? I know the web developer(s) we used for the farm calculator [a project for which he was the client] bill for everything, even if they are redoing something they screwed up… but I’m curious as to your way of handling things like this. I want to be fair to my client, and myself!

This is a great question, and goes beyond just “IE browser” issues.  Here was my answer:

When I run into an IE problem, I will usually stop and ask the client if making it work perfectly on IE is really important to them.  It would be useful to have stats for IE on their website (or the broader internet: ), so they can know if 10% of their users are on IE6 or 0.01%.  It also would be useful to have an estimate for how long it will take (as long as you’re clear that it is an estimate).

If I’m billing time and materials, and I’ve had this conversation, I absolutely bill the client, but try to keep them informed as to how long this is taking me.

If it is a fixed bid, then I might go to the client and say ‘I’ve run into this issue, for this browser, which is x% of your website traffic.  There’s solution A and solution B, but both of them are things I didn’t expect.  Can we talk about this additional work’.  If they say no, I grit my teeth and deal.

So, to make it more broadly applicable, if you run into issues that you didn’t expect, here’s my advice:

  • Stop work and identify the issue.  Don’t keep spinning your wheels.
  • Gather useful facts to help the client make an informed decision (IE browser % in the example above).  Include a rough estimate if you can, but make sure the client knows it is an estimate.
  • Talk to the client about the issue and find some kind of resolution.
  • If the resolution is you doing the work, then, if you are on a fixed bid, explain how you didn’t consider this particular issue and see if the client is flexible about paying for it.
  • If the resolution is you doing the work, and you are on a time and materials contract, then bill for the extra work.
  • In either case make sure you keep the client in the loop about time spent and schedule changes due to the issue.

Surprises come up all the time.  What is important is that you come to a fair accommodation with your client.

My experience implementing GWO for a non profit, part 2

I was finally able to get access to the server via FTP (previous cliffhanger resolved).  After cautioning me to be very careful (“please proceed with caution and help keep from blowing up…warnings from my IT guy”) Emily handed over FTP access.

I proceeded very carefully.

We had already had discussions about what we were going to vary to test the donation button.  Pictures, location of button, text of button, and text around button were the major variables.  One of the hard things about GWO is deciding what to test–the possibilities are infinite.  Even with our handful of variables, we ended up dropping some options and still have 100 variations to test!

Actually installing GWO was pretty easy.  The only wrinkle was the fact that the goal was a click of a button and not another page.  This post was helpful.  One item that that post didn’t cover was validation–GWO doesn’t let you start an experiment if the program can’t verify that the script tags are installed correctly.  Since we were doing a non standard install, I gimmicked up a goal page for validation, then added the goal tracking to the onclick event as described in the post.

So, the experiment is currently running on The WILD Foundation homepage.  It’s been running for about a week, and has only 1 non test conversion.  I worry that we are not testing big enough changes (a donate lightbox, rework the entire front page), but I think it makes sense to let the test run for a few weeks and see what kind of data we get.

© Moore Consulting, 2003-2015 +