shutterstock_131496734

The Fight Against Ambiguity

Can dreadful medical mistakes teach us something about software engineering?  I think so.

In my book “Must-Have Skills for Software Engineers”, I referred to assumption as “the mother of all evil” in engineering.

If that is true, then ambiguity may be the father of all evil.  Assumption plus ambiguity is a recipe for failure.  The software industry is particularly prone to failures due to these factors, because of the abstract nature of the domain.

We are not alone in this regard.

The Ugly Effects Of Ambiguity

I learned something interesting in a recent discussion with some professionals from the healthcare industry.

There is a term in heathcare called “wrong-side/wrong-site, wrong-procedure, and wrong-patient adverse events (WSPEs)”.  This term is applied to situations where a patient’s problem is treated incorrectly for one reason or another.

According to the National Center for Biotechnology Information web site [1], there were 5,940 WSPEs (2,217 wrong-side surgical procedures and 3,723 wrong-treatment/wrong-procedure errors) over a 13-year period, as recorded in the National Practitioner Data Bank (NPDB).

Think about that for a moment.  Go ahead, read it again.  I’ll wait.

That means 2,217 cases where a patient had the wrong surgical procedure!

Although this is seems to be a very small number compared to surgery counts of the period, is common enough to have its own acronym.  According to the US patient safety network [2], “Root cause analyses of WSPEs consistently reveal communication issues as a prominent underlying factor.”

Wait a minute!  How can a surgical team of highly educated, very intelligent people still have communication issues that result in this type of tragic problem?  Scary?  You bet!

Fortunately for us, the medical community has a vested interest in improving their own processes.  There are some interesting techniques that have evolved to reduce the occurrences of such mistakes.

One in particular should be very interesting to engineers, since it is directly applicable to a large number of situations.

Surgical Timeouts

The surgical timeout is a technique that is now required before any surgery in the United States.  This technique is considered to be important enough that there is a “National Timeout Day”, to encourage adherence to its proper use.

The surgical timeout involves a set of structured, direct, verbal and written confirmations of the patient identity and the procedure(s) to be performed.  These confirmations are done with the patients before the start of procedures, and again immediately before the surgery itself.  Here is an example of the interactions in such a timeout, derived from a real scenario:

  • Before surgery, a patient’s medical forms are presented.
  • At this point, everyone involved must stop whatever they are doing: doctors, nurses, scrub techs – everyone.  The purpose is for the entire group to acknowledge the timeout and mentally engage with each other on activities that must be performed.
  • The patient’s name and identifying information is called out.
  • The procedure that is about to occur is verbally enunciated, and it is written out on a whiteboard in the surgery room.
  • The surgical group narrates verbally through what they are about to do.  As an example: “This is patient <first name> <last name>, their identification number is <id>.  Today we will be performing a <procedure name>, with the possibility of <other procedure name>.  The incision site is marked and visible……”
  • All participants on the team must verbalize agreement with the statements – in other words, each involved stakeholder must say “Yes, I agree.”
  • Interestingly, the actual site of the surgical incision is to be marked with the word “Yes”, and not “X”, since “X” may be considered ambiguous [4].

As a software engineer, you can see that this timeout practice is meant to directly reduce or remove any surrounding errors that might be present due to assumption, ambiguity, or paperwork mishaps.

The structured statements reinforce that the operating team has the same understanding of the problem being solved.  There are examples of very detailed checklists of this procedure as shown in [3].

Engineering Corollaries

Learning about the Surgical Timeout prompted me to consider how this practice might be used in some common software situations.

Sprints

In iterative development methodologies such as Scrum, it is very common to lose sight of what is truly important in a sprint, amidst the overhead that often plagues projects.  Sprint planning can become an exercise in juggling stories, prioritizing, estimating, fighting the associated project tools, and so forth.

I have found it valuable for each sprint or iteration to be prefaced by something like the Surgical Timeout, so that there is a clear purpose and mission to the iteration.

This way, when decisions must get made, or functionality must be re-considered, the decisions are made within the context of a known and well-understood goal.  Each iteration should have a “theme” or a goal, if possible.

For example, “Iteration 6: version 1 of user survey functionality.  By the end of this Iteration, we should be able to survey users,” or a similar objective.  This goal can be re-stated by the product owner at the start of each standup, so the group is always aware of the context and value of their work.

Code Review and Pair Programming Practices

Despite the great collaboration tools we have today, such as Fisheye/Crucible for on-line code reviews, many organizations go through the motions, taking a passive approach to reviewing artifacts of work.

Often this is due to schedule pressure, lack of training in code review best practices (yes, there really are code review best practices), and so forth.

However, a code review is a learning opportunity of the highest order for a technical person, and it is a great way to practice one’s ability to communicate with one’s peers.

For code reviews of high complexity, or of large size, I always recommended giving an in-person overview similar to the Surgical Timeout approach; the idea is to give a direct, in-person talk about the code changes, the impact of the changes, and so forth.

Some organizations benefit from having a checklist of items to look for in a code review.  Roles should be clearly assigned, and the end expectations should be enunciated (I will touch on this further in a future blog post.)  Leaving a complex code change up to tools and wishful thinking is not the way to ensure a good result.

Pair-programming, if one’s organization practices it, is another way to apply the Surgical Timeout approach; discuss the objective and approach with your colleague up front; hold each other to account for both items.  Put the coding on hold and get to a whiteboard to diagram things out if needed.

Don’t approach it passively, or disengage from the objective.

Meeting Effectiveness

Meeting effectiveness is often ignored in the technical realm, but a meeting is an opportunity to apply a structured approach similar to the Surgical Timeout.  If a meeting is worth having, it is always better install a structure, clear purpose, and follow-up.

In the absence of clear meeting goals, I will usually ask the question of the organizer, “What would you like to achieve in this meeting?

If I cannot see that the meeting is productive for whatever reason, I will again query the organizer, “Are you getting the information that you need from this meeting?” or  “What do we need to do, in order to make sure you have the support that you need from this group?”

Customer Interactions

I once had surgery to repair a torn rotator cuff.  It was my first surgery, so I was understandably nervous about the procedure.

Thankfully, the medical team involved me in the pre-Surgery Timeout process; they re-stated the purpose of the operation, I was allowed to ask specific questions, and my injured shoulder was marked with a “Yes” using permanent marker.  I was asked at the end whether I had any additional questions or concerns.

The overall process was very reassuring, since I felt that the team had taken the time, energy, and diligence to correctly interpret the problem and to communicate their understanding back to me.

This method of interacting is directly applicable to any customer conversation.  When you interact with a customer, that customer will naturally have a better impression if you are asking good questions, taking notes, enunciating the problem back for clarification, following up, and putting your diligence into understanding the scope of the overall objective.

Challenge

I invite you to talk with other professionals, or learn how other professions correct themselves in cases of failure, errors, or omissions.  Are there other practices that can be adapted for use in the software industry?

Let me know if you find something interesting.

References

[1] http://www.ncbi.nlm.nih.gov/pubmed/16983037

[2] http://psnet.ahrq.gov/primer.aspx?primerID=18

[3] http://www.aorn.org/Secondary.aspx?id=20867

[4] http://www3.aaos.org/member/safety/guidelines.cfm

  • linuxster

    amen.

  • Hi, I read your blogs regularly. Your story-telling style is awesome,
    keep it up!

    • cory.berg

      Thanks! I will do my best to keep it up.