Planning Poker and User Story Sizing

Planning poker cardsMany Scrum teams that I’ve coached use Planning Poker as a mechanism for sizing user stories on their Product Backlogs. Often the use of this technique came about as a mandated process aimed at standardizing the Scrum practices across teams on a corporate program (as opposed to something that was cherry picked by teams on their own journeys of process discovery). Developers are inclined to view non-coding time as being non-productive time, so it’s possible that the story sizing session may become viewed with skepticism and ‘ceremony disdain’.

In my opinion, Planning Poker is really good at what it was intended to do! What follows here is a fun training exercise aimed at coaching teams around the answer to two questions:

  1. Why do we bother with user story sizing?
  2.  Why do we use Planning Poker and its weird abstract scale?

Note that this exercise and the article content will benefit teams who are already practicing Scrum and have already attempted planning poker.

What you will need:

  • 4 to 6 players and a facilitator
  • Bag of marbles
  • Some cups
  • 4 small items of vastly varying weight.
  • A kitchen scale
  • Stopwatch or timer
  • 4 to 6 sets of planning poker cards (one for each participant)

The coaching is achieved in two parts: a lecture preceding a fun, practical exercise.

Part 1: Why do we bother with story sizing and planning poker?

The audience is typically composed of Scrum Pigs so the explanation is presented around indisputable truths (i.e. a logical argument).

Truth 1: The Project Sponsor needs an understanding of how long funding needs to continue on a particular initiative. This requirement is agnostic of the methodology being followed and is always present.

What sponsors want to know

What sponsors want to know

Truth 2: We use Scrum when we have constraints around requirements stability and/or practice on the technology

The concept below was first explained to me on a Scrum Master course with Ken Schwaber in Miami in 2012. It’s a Stacey diagram and shows the type of situation where it makes sense to use Agile techniques (as opposed to predictive techniques like Waterfall). Put simply, when UNKNOWNS > KNOWNS then any attempt to come up with a long term plan is likely to be fraught with frequent updates (replanning sessions). Under these circumstances, Agile techniques using past performance data to predict the future – make more sense. As a developer on an Agile project or program the corollary is also true: you are probably in a situation that has a lot of requirement flux or tricky delivery technology (or both).


In Scrum, there is typically one predictor mechanism that answers the question: “How much more time?”…and that is the trend of the Product Backlog burndown curve. Where that extrapolation intersects the horizontal ‘time’ axis is the most likely end date based on the current user story burn rate.

So, the answer to “why do we do we bother with story sizing?”:

  1. The team has consensus on the relative size of things effort-wise (it’s a ‘wideband delphi‘ mechanism for you inquisitive pedants). The exercise to follow exposes how uncomfortable things can get without a consensus tool!
  2. The discussion of story scope and what it entails is a valuable entrance to Sprint planning (we usually do Product Backlog grooming and story sizing in advance of task breakdown in Sprint planning)
  3. We have just enough data to be able to answer sponsorship’s question regarding project timelines (we use velocities and a Product Backlog burndown to predict the end of a project).

…ok but you’re still looking at me skeptically so the next part aims to do some practical convincing!

Part 2: Why do we use a funny abstract scale? Why not just use ‘man hours’?

Truth 3: Planning is a time-boxed affair. We stop planning before the accuracy starts leveling off (because of a lack of requirements stability and/or practice on the technology)

Plan accuracy is sensitive to:

  1. Level of understanding of user requirements
  2. Level of practice on the use of the delivery technology
  3. Enough time to process this knowledge and decompose the work into atomic planned tasks

It would be efficient to continue the task decomposition process only until you get to diminishing returns on accuracy. I have drawn the following picture (sometimes in anger…) on several occasions. Usually I do it to explain to someone in management that more time spent planning is not going to improve the accuracy of the plan because our constraints are knowledge (i.e. execution practice) and data!

Stop planning when accuracy starts to plateau

Stop planning when accuracy starts to plateau

As I mentioned in Part 1, agile delivery mechanisms are selected because either the requirements are not well understood and/or the technology is new – to the extent that frequent new discoveries would result in frequent refactoring of the plan (this is like a tax as its not productive work and involves contributions from the most experienced team members). In an agile planning exercise the philosophy goes along the lines of: “We concede that the processes and/or technology are a journey of discovery so we will spend the barest minimum effort on planning before we get to diminishing returns on accuracy”.

Because its hard to tell when we are going to get into the space of diminishing returns accuracy-wise (much easier in retrospect), we use a ‘timebox’: a preset agreed-upon duration of time which may not be exceeded.

In a time-limited planning session sensitive to unknowns in requirements and technology, the best we can do is to get down to a relative size of things and give it a rudimentary estimate:

Shoes <skateboard < bicycle < motorcycle < car < dump truck < combine harvester

“…on that scale, this story looks like a ‘skateboard’ and the other one is a ‘dump truck’ ”

The practical exercise that follows affirms the value of Planning Poker’s consensus mechanism and also shows why the alternative of measuring in hours (an absolute scale) is just not practical given the constraints of available data and time.


1)      Four players sit around a table. On the table is a bag containing 4 small items of vastly differing weight, some polystyrene cups and a bag of marbles.

2)      The facilitator takes one of the items (not the heaviest or lightest), presents it to the group and says that this is the ‘benchmark story’ and it has a value of “8” story points. Facilitator explains that in this exercise, the objects represent user stories and that their weight is (metaphorically) equivalent to the amount of delivery effort.

3)      The facilitator now pulls another item out of the bag and starts a timer. The team has 8 minutes strictly to do 3 planning poker estimates on the remaining three objects. They are supplied one at a time. The team can refactor their story points after all three objects have been supplied (provided that they stay within the 8 minutes timebox). At the end they will have a relative sense of the weights. It is assumed that these are Scrum pigs who are already familiar with planning poker so they can facilitate the poker estimates themselves. The facilitator can give them updates as to how much time they have left (or do it on a kitchen timer that they can see). Typically team members pass the item around and feel the weight in their hands relative to other objects before voting.

4)      The facilitator creates the table (example shown below) based on the team’s planning poker estimates:

Object Story points
Calculator 5
Sampler 8
Impact Driver 20
Hammer 40

Of course, your objects will be different (these happened to be what was lying around my workshop when I concocted the exercise)! Note that for team’s that I’ve run this exercise with, they usually get to consensus on three planning poker estimates relative to a supplied benchmark inside of 8 minutes. Because they only have to do 3 estimates and the scale is quite crude there aren’t too many debates.

5)      Now the Players are asked to estimate the equivalent weight in marbles of the four small items one at a time taking no more than 8 minutes for the estimate (i.e. exactly the same timebox). The facilitator explains that this is equivalent to doing an absolute estimate, such as estimating the stories in hours (as opposed to the initial relative estimate).

6)      After each estimate is given, the facilitator uses an actual kitchen scale to verify the accuracy of the estimate (by weighing the marbles) and logs the variance against the true value (percentage error). I suggest that you weigh your four objects in advance so that you have their true weights. At this stage the facilitator does not allow the team to see their accuracies. Watch carefully for team members who say “Take a few marbles out” or “Add two or three marbles” *.

Object Percentage error (estimate vs actual)
Calculator 29%
Sampler -8%
Impact Driver 15%
Hammer -32%

7)      When the team is done, the facilitator plots their result on a distribution graph. The distribution will be fairly random and not very accurate (certainly not at the ‘take out two marbles’ level of accuracy).

8)      Facilitator points out that:

  • Spending more time on the exercise would probably not result in more accurate results
  • It’s not that easy to weigh something with just your hands…nor is it easy to predict how long something is going to take to build when you don’t know the ‘something’ nor how to build it (typical of an agile project). * Now would be a good point to make the observation that ‘adding (or removing) a few marbles’ gave you an illusion of estimation accuracy when in fact you were 40% inaccurate in reality (or whatever the case was – I’ve yet to encounter a team that had any level of consistent accuracy here).
  • The marble represents a commonly understood unit of measure (like ‘hours’) yet it’s very difficult for four people to agree on a common value with any reasonable level of accuracy given crude estimation mechanisms. In comparison there was very little debate or discomfort with the first part of the exercise using story points and planning poker.
  • Coming up with a relative value is far easier than an absolute value (in marbles or hours) for a timeboxed exercise.

Conclusion: the planning poker exercise exposes the estimating ‘crudeness’ for what it really is and forces people to agree on the barest minimum accuracy that is possible under the circumstances. 

This entry was posted in Project management and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s