Skip to content

Forget Commitment, Make Reliable Forecasts

There is a widespread confusion about the meaning and the relevance of “commitment” for teams that develop software according to Scrum. “Commitment” is one of the five Scrum values, as introduced in the book “Agile Software Development with Scrum”. However, since 2011 the word does not appear in the Scrum Guide any more, which has not helped to clear up any ambiguity. Plus, there is no really good translation of the word in German, complicating things even more for me in discussions with co-workers in Hamburg.
So, what is commitment and how can it, or an improved version, be a useful concept? And finally, how can this abstract value be brought to life by actual Scrum teams? I believe it is best to replace “commitment” with a more precise term that addresses more issues, and put that to work.

The Origins

In ye olden times, before the 2011 Scrum Guide, development teams used to commit to a sprint goal or even to a list of user stories. The future is unpredictable, estimates are way off, impediments show up, developers are easily distracted, miscellaneous foreign managers often distract the team, nevertheless the dev team promise really hard to deliver the sprint results the product owner ordered. The scrum master is in place to remove at least some of the issues.
The Agile Manifesto, in contrast, demands that teams be formed by motivated individuals who just give it a go. These individuals already have a basic inclination toward the project goal and sprint goal, and they do not need an additional appeal to their integrity in the fashion of “but you promised!” Also, they are an empowered team and therefore are capable of removing impediments of any kind for the higher purpose of customer value.
A commitment resembles a contract: The dev team promise to deliver a certain service (sprint goal) by a certain date (sprint review meeting). However, there are no provisions in place to reward a successful delivery, or to punish a failed delivery, other than the dev team feeling embarrassed in public when they have nothing to show at the sprint review. Also, only one party is asked to sign it, not both. In fact, Scrum’s commitment is almost, but not entirely different from a contract.

Building Trust

Now that we have had a look at the use of commitment, how is that useful to anyone? Some people seem to believe that, as it is really hard to keep a commitment due to the various slings and arrows of outrageous fortune, it is a wise move to only commit to learning, to adhering to the Scrum process, or to the team, but never to a specific outcome. Most times your commitment will be broken anyway (never mind whether on your account or by adverse circumstances), so pressure will increase, quality will degrade, control will increase, and your team will be in a place they really wanted to avoid.
I disagree and argue that it is rather important to make commitments to outcomes in the general direction of your customers, i.e., to your product owner. Trust is the cornerstone of customer relationship not only in the Agile Manifesto. It is also the foundation of the teamwork pyramid in Lencioni’s book “The five dysfunctions of a team”, and we need to see the dev team plus product owner as one team in order to avoid the biggest risk, namely, of building the wrong product. In order to improve teamwork, you need to build trust, writes C. Avery, and to do that, he advises to repeatedly make small promises and keep them, this way building a track record of reliability, thereby expanding your freedom. Foremost, Avery recommends to only make promises you can definitely meet, because one single failed commitment can ruin the positive effect of a hundred kept promises in an instant and destroy your client’s trust in you.
To me, a forecast you can count on most of the time is good enough, and close enough to a commitment that one can replace the other. When the dev team trust their own ability to execute and the product owner trusts the developers, they should work together smoothly even when the going gets tough, and resolve issues instead of shifting the blame about. Replacing “commitment” with “reliable forecast” works for me.
The obvious opportunity to repeatedly make and keep promises and build trust are sprints and sprint deliverables. This brings us to the question of how a dev team can actually keep their promises regarding sprint outcomes to the client.

Ready, Willing and Able

Meeting a sprint commitment (or better, “reliable forecast”) depends on three aspects that are easily confused:

  1. The organization must be ready.
  2. The dev team must be willing to do all they can for a sprint success.
  3. They must be able to perform all tasks that might turn out to be necessary to reach the sprint goal.

Many authors, including Schwaber in his 2002 book and this more recent article, focus on the organization as the biggest obstacle to sprint success. A team can only commit to a sprint goal when they are empowered to meet it and to blast through any impediments that might occur, even if they hurt feelings and disturb processes in other parts of the organization. The underlying assumption is that the biggest challenge for Scrum teams is a lack of organizational support when they ruthlessly turn towards customer value. If the organization is not ready to accommodate this change in mindset, the team is doomed from the sprint planning meeting on and is better off not entering any kind of commitment.

Now, on toward willingness of the dev team to work as hard as they can (see, e.g., this article). Asking the team for a commitment to the sprint goal may or may not be helpful. Hope is that, in order to meet their promise, the dev team are now a bit more motivated to focus, to remove obstacles, and build the important things first. This way, the chances of achieving the sprint result are supposed to be higher than without commitment. The Agile Manifesto starts out with the assumption that developers are motivated individuals, as discussed above, so there is no need for an additional act of commitment. However, there is the real danger that a lackluster interest in success is a self-fulfilling prophecy, whereas a “play to win”-attitude can help bring success about. The (fictional) Yoda has (fictional) success with his famous “do, not try” approach. But the best way of having the dev team’s full attention on the sprint goal is to grab them by their intrinsic motivation. In Appelo’ Management 3.0 (Chapter 5, “How to energize people”), there is a lot of good advice on how to help people give their best at work. Only one of them is grabbing people by their sense of honor and integrity, by having them make a promise that they are reluctant to break. Asking for commitment is just one of the many ways to increase the willingness of the dev team to deliver.

On to the third aspect of keeping commitments forecasting reliably, the ability to do so because the dev team have the necessary time, know-how, tools, and supply from upstream stations in the value stream. This is the classic area of team-level impediments and their removal. It is not enough to have a supportive company and a motivated team, they also need to be able to remove all impediments that slow them down. They need to notice them, attack them, and remove them for good. Noticing impediments is a science on its own, but good books have already been written on retrospectives, so this is not the place to dive in further.

Commitment and Sprint Planning

Let me wrap up this inspection of the notion of commitment by suggesting how to use “commitment” or “reliable forecast” in a sprint planning meeting.
The point of a Scrum Master is to address impediments from within and without the team, and to detect looming obstacles on the team’s likely path.

  1. When the organization is not ready to tackle the work in the sprint, the dev team plus Scrum Master must call out and provide the required infrastructure, helping hands, or authority.
  2. When the team is not willing to run for the sprint goal, it is necessary to notice that not everyone supports the sprint backlog, and bring possible side-agendas or doubts about the general direction on the table before they blow up in your face during the sprint.
  3. When the team is not able to deliver, it is important to raise any concerns right away, whether they concern limited availability of team members, know-how, technical unclarity, or any other risk threatening the sprint success.

The idea here is to educate people to act proactively (first habit of Covey) and to assume responsibility for the sprint goal and sprint backlog instead of evading a clear position (Avery).
So, in my role as Scrum Master, I usually ask two questions at the end of the sprint planning meeting: 1. Is this a good plan? 2. Can you do it?

  1. Look at the taskboard. Given this sprint goal and the stories planned here (and other stuff like the product vision on the wall over there). Do you think this is a sensible plan?
  2. Look at the table of who is available in this sprint for how long, at the tasks on the taskboard, at our list of impediments next to the taskboard, the recent sprint velocities in this chart here, etc. In light of that, can you do all the things on the board and thereby achieve the sprint goal, or is it wishful thinking?

The first question serves to draw out any discrepancies between individual goals and the team goal, and bring any doubts about the goal to the surface. Through the second question, I want the team to discuss all risks on the way to that goal, including the question of whether they are simply too optimistic.
These two questions work for me to replace the request for this mysterious commitment, which is, as mentioned above, especially awkward to ask for in German. Also, there is no waterfallish smell of a mini-contract, while addressing the questions of whether the organization is ready, and the dev team is willing and able to achieve the sprint goal.
The sprint may still fail, but I believe that this is the best way to get a forecast that is mostly reliable.

References

Phantomjs crashes in CI

Lately, we were facing many failed builds because of phantomjs crashing when executing our joounit tests. Grepping through the build logs did not give us much information. All we found was:

[WARNING] PhantomJS has crashed. [...]

Phantomjs did not do us the pleasure to crash when executing the same tests, not even when testing the same modules. In most cases, phantomjs did not even crash When running the next build. Fortunately, there are many others out there facing similar problems, see https://github.com/ariya/phantomjs/issues/12002, for example. Using the reproducer given in that issue, we can derive a wrapper script to automatically retry the tests a few times. We can now replace the phantomjs executable by a wrapper script that evaluates the exit code of the phantomjs like this:

BIN=/usr/local/phantomjs-1.9.7/bin/phantomjs
RET=1
MAX=5
RUN=0
until [ ${RET} -eq 0 ]; do
  ${BIN} $@
  RET=$?
  RUN=$(($RUN +1))

# exit immediately after max crashes
  if [ ${RUN} -eq ${MAX} ]; then
    echo "got ${RUN} unexpected results from phantomjs, giving up ..."
    exit ${RET}
  fi

# allowed values are 0-5
# see https://github.com/CoreMedia/jangaroo-tools/blob/master/jangaroo-maven/jangaroo-maven-plugin/src/main/resources/net/jangaroo/jooc/mvnplugin/phantomjs-joounit-page-runner.js
  if [ ${RET} -lt 5 ]; then
    if [ ${RET} -eq 1 ]; then
      echo "phantomjs misconfigured or crashed, retrying ..."
    else
      exit ${RET}
    fi
  else
    echo "got unexpected return value from phantomjs: ${RET}. Retrying ..."
  fi
done

Fortunately, we have set up the joounit phantomjs runner to use exit code 1 only in the rare case that it is completely misconfigured, so any other valid test outcome, e.g., timeout, is still captured correctly.

Automated Documentation Check with LanguageTool

Introduction

Here at CoreMedia we write our documentation in DocBook using IntelliJ Idea as an editor for the XML sources. From this XML we generate PDF and WebHelp manuals.

The documentation is part of our source code repository and is also integrated in CoreMedia’s continuous integration process with Jenkins, Sonar and the like. Naturally, the demand for a Sonar like quality measurement for documentation arouse.

Solution

The first task is to determine the metrics that we want to monitor.  Unfortunately, there is, at least now, no way to automatically test for accuracy and completeness of the information, so we have stick to more obvious features, such as:

  • Size of the manual measured through the number of chapters, tables, figures…
  • Spelling errors
  • Grammar errors
  • CoreMedia style guide errors

The first point is easy; simply count the corresponding DocBook tags in the manual using XPATH. The others require a checker that can be integrated into the build process and that delivers a usable format for further processing.

After searching the web we stumbled upon LanguageTool (www.languagetool.org). LanguageTool is an open source tool that offers a stand-alone client, a web front-end and a Java library for all the checks we want to do.

Integrating the Java library in our adapted version of the docbkx-maven-plugin was easy. Adding the Maven dependency to the project and creating a new Maven goal which instantiates the LanguageTool object:

langTool = new JLanguageTool(new AmericanEnglish());
langTool.activateDefaultPatternRules();

The second line shows the big power of LanguageTool, the rules. Spell checking is done with hunspell but all of the grammar and style checks are defined in rules, either written in Java code or in XML. A simple XML rule that checks for the correct usage of email, would look like this:

<rule id="mode" name="Style: Do not write e-mail">
<pattern>
<token>e-mail</token>
</pattern>
<message>CoreMedia Style: Its <suggestion>email</suggestion>  not e-mail</message>
<example type="correct">Send an <marker>email</marker></example>
<example type="incorrect">Send an <marker><match no="1"/></marker></example>
</rule>

More complicated rules are possible using regular expressions and POS (part of speech, see http://en.wikipedia.org/wiki/Part_of_speech) tags. LanguageTool comes with a huge chunk of predefined rules for common grammar errors and can be extended by own rules. So, we implemented our style guide with XML rules.

When we start the check we get the results as a list of RuleMatch objects:

List<RuleMatch> matches = langTool.check(textString);

From a RuleMatch objects we can get all interesting information, such as the error message, the position, a suggested correction and more. In our HTML result pages, we show, for instance, the following information from a predefined rule:

ExampleErrorPage

In the build process we generate an overview site for all manuals:

ResultView

False Positives

At the beginning we got a lot of errors that were not real errors but shortcomings of the checker. There were mostly three reasons for this:

  • Words not known by the spellchecker (all of these acronyms used in IT writing, for example)
  • Grammar rules not applicable to the format of our text
  • Words like file names or class names that can’t be known by the spellchecker

We applied three measures to overcome the false positives:

  • Creating a list of ignored words for the spellchecker. The list is managed in the repository so everyone can add new words.
  • Deactivating rules in LanguageTool with langTool.disableRule(deactivatedRule);. The list of deactivated rules is also managed in the repository
  • Tagging all specific words with the appropriate DocBook element and filtering the DocBook sources.

With this approach we were able to remove nearly all false positives.

Conclusion

Having an overview page for the documentation enhances the visibility and leads to better quality of the documentation. LanguageTool is a great product for this. It’s easy to integrate and to use and is very powerful. Questions in the forum or the mailing lists have been answered quickly. So, give it a try when you want to monitor the quality of your documentation.

API Design Kata

by

For our fortnightly coding dojo I recently suggested to focus on API design instead of implementation – at least for one session. The idea was that APIs live much longer than API implementations and that consequently flaws in the API design hurt much more than flaws in the actual algorithms. And because developers code much more often than they design APIs, the need for practice should be expected to be much more urgent.

The Task

Our goal was to design a generic caching API. Some use cases were given:

  • Lookup a value from the cache.
  • Compute a value that is not currently cached.
  • Let a value computation register dependencies. A dependency is a representation of a mutable entity.
  • Invalidate a dependency. All values whose computation registered that dependency must be removed from the cache.
  • Configure a maximum cache size.
  • Let one cache be responsible for fundamentally different classes of value at the same time.

The approach was to write the API, only, and to provide test cases simply to evaluate how a client would use the API. No implementation of the actual API was allowed, just implementation of callback interfaces that are normally provided by clients of the cache.

Under this assumption the tests would not run, but the test code was using the API and had to look natural and understandable. Of course, the crucial aspect was to make the API convenient for clients. API documentation snippets were written only as far as absolutely necessary.

Our Experience

We decided to work on a single laptop connected to a beamer to allow all participants to comment and implement improvements alternatingly. It turned out that it is surprisingly difficult to build an API without building the implementation. There is the temptation to let the intended internal data structure shine through in the API (“But how are dependencies stored after all?”) when the client of the API couldn’t care less.

There is also the tendency to skip the ‘test-driven’ design and write down the cache interface immediately when in fact the tests given you a good feeling which information has to be provided to the cache somehow.

It was observed that some upfront drawing would have helped a lot. It wouldn’t have to be proper UML, but an overview of the entities involved and their relationships would have given us a quicker start. Caching is more complex than the above use cases might suggest.

Java generics were a recurring topic. While we are all used to instantiating generic classes, actually defining the right type parameters for an interface is different matter.

We talked a bit about code style. The @Nonnull annotation sparked the most intense discussion.

Your Turn

If you decide to repeat the kata, also think (after the API is done) about possible performance implications of the design choices. Look for further missing features. On the other hand, look for redundant features that only make the API harder to understand.

Is the Backlog an Unnecessary Proxy?

High Priority Mail

Last week a letter reached me that had the announcement IMPORTANT DOCUMENTS on the envelope. When I eagerly tore open the letter, awaiting some life-changing documents inside, it turned out they were not. It is a common pattern: You all receive e-mails stating URGENT in the subject, but they’re not. And how many e-mails are flagged “!” important — but they’re not? When I judge the importance of a communication to me, its outward appearance is not the only thing I take into consideration, it is also very much the sender that counts. If I know the author to be trustworthy, based on my previous experiences with her, I tend to treat those messages as much more relevant than messages from an insurance company (that usually wants to sell more insurances) or from a business I never dealt with before.

An Intermediate Artifact

This thought crossed my mind when at the #lkce13, “the” David https://twitter.com/lkuceo stated that a prioritized backlog introduces an unnecessary proxy variable. Agreed, stakeholders and dev team are better off talking to each other instead of capturing conversations in an artifact with only the product owner talking to each of the sides. Also, a backlog may imply a commitment to a path, when in fact it might just show alternate future directions in which the product may involve.

Backlog for Focus

On the other hand, if you have n stakeholders and m developers, and every developer is to talk directly to every customer, you will have n * m conversations taking place. When the product owner acts as information hub, only n + m conversations take place, which might be the only feasible way.

Also, the product owner is supposed to act as the business value expert, condensing the multiple voices of the stakeholders, and even opening up new options. It makes sense to me to have a domain expert drive this and not spread accountability for priorities all over the team.

The third and final point brings me back to the story about the “important” letter: it is about trust. When the dev team trusts the product owner enough to make good decisions on priority, there is less urgency to discuss priorities directly with stakeholders. Maybe the PO has shown good judgment in the past. Or the decision process is transparent enough, showing that all relevant stakeholders are involved, that all sides have been taken into consideration. If you do not trust your product owner to make good priority decisions, you might need to ask why, and address that issue.

 Trust the Product Owner

The backlog is not to be used as a contract artifact on paper, inhibiting face-to-face conversation and feedback loops. There is a balance to strike: The team needs to alternate between an opening, questioning stance involving all stakeholders, and a focusing, deciding stance where the product owner is responsible to narrow down all possibilities to a manageable number. The product owner is supposed to lead by reducing uncertainty about the future, providing clarity for the team. She can achieve this task better when all parties trust her decisions, based on her openness, her focus, and her commitment to the job. A prioritized backlog is a tool to achieve this, but it is abused when just used for written “THIS IS IMPORTANT” statements.

So, watch out for backlogs that are used to hide information like value, options, or risk, inhibiting collaboration. Make your backlog a dual-use tool both to start conversations and to provide focus.

The State of Agile (Enterprise) UX – A Blog Review

Introduction

In our last post, we discussed a method for quantitative user research. In contrast to the specific topic of the last post, this post gives pointers on how products with a great user experience (UX) can be achieved in the context of an agile development process.

Nowadays 84 % of software companies apply agile methods – yet, how the topic of how UX can be integrated into the process is the topic of ongoing discussions between UX professionals and with other stakeholders within the organization.

Recently, a number of blog posts about UX in different settings were written and caught my attention. As the area of UX – especially in the combination of agile development of enterprise software – is quickly developing, these articles are definitely worth reading if you are a UX professional (or if you work together with UX professionals). I thought this worth to sharing some pointers.

UX SketchScrum Board

Blog-Post Reviews

A must-read is Aviva Rosenstein’s article “The UX Professionals Guide to Working with Agile Scrum Teams”, published on Boxes and Arrows. As the title says, the article is focused on the Scrum methodology (a popular framework for organizing agile development). The article discusses a survey about the challenges UX professionals face in Scrum and gives detailed practical advice on how these challenges can be addressed. In particular, the article discusses the problems faced in creating a UX vision when UX work is aligned strictly with the Scrum teams’ sprints or brought in too late in the process. Among the suggestions are the creation of user personas (and using these personas in the user stories or scenarios) and including the team in the design process.

The article “Fitting Big-Picture UX Into Agile Development” by Damon Dimmick was published on Smashing Magazine. For me, the important ideas in this article are 1) the idea of including UX-focused sprints in the agile development process and 2) the idea of a design owner whose role is to foster design and ensure consistency.

Jon Innes post “Integrating UX into the Product Backlog” on Boxes and Arrows suggests a UX Integration Matrix that allows including UX efforts into a products backlog. Jon highlights the necessity of a good feedback loop for agile and UX work. I do love Jon’s idea to estimate UX complexity using Fibonacci numbers and to use task completion rate as a metric for the usability of a UI implementation!
The article also discusses the cause for the (perceived) different value of UX in consumer-oriented vs. enterprise-oriented businesses. While an enterprise company makes money even if user’s can not handle effectively leverage the software or by selling professional services to fix problems that occur, a consumer-oriented business cannot operate that way. I would add to this argument that the perception of the value of UX is shifting. Buyers increasingly understand that an efficient interface design saves time and money by generating more – and higher – quality output faster. As a consequence, even analysts with a technical focus are integrating usability efforts in their reviews of market offerings as an evaluation criteria. Therefore, the perception that UX is less important in enterprise-oriented businesses that in customer-oriented businesses is misleading.

Paul Bryan’s text “Is UX Strategy Fundamentally Incompatible with Agile or Lean UX?”  on UX Matters focuses on the field of UX strategy and its compatibility with agile. It discusses two approaches for fitting UX better into modern development processes: Lean UX and Agile UX. The author identifies three key questions are discussed around the agile UX topic: 1) participating in an agile team structure, 2) modifying UX activities and deliverables such that interdisciplinary work gets more effective, and 3) scheduling the timing of UX activities to more easily fit to the sprints of the development team. Especially with respect to the third topic, Paul Bryan offers a number of quotes discussing the importance to do UX strategy and planning before the development starts.
The compatibility of lean UX within large companies is discussed and the author concludes that “[t]hey are fundamentally incompatible”. The reasons are is that “[t]he lean UX form of UX strategy is too lightweight, too reliant on small data sets, too malleable, and too subject to the assertiveness of a just few individuals. For a large company that has a deep understanding of its market, its customers, and its competition, and the resources to fund the best path forward, it’s not the optimal approach.”

An older but valuable post  is Jeff Patton’s “Twelve emerging best practices for adding UX work to Agile development” published on AgileProductDesign.com. Among the suggestions Jeff makes are the usage of low fidelity prototypes in communication and the use of prototypes as documentation.

Conclusion

While there is still a lot of uncertainty on how UX is best integrated in the agile development process, especially in large enterprises businesses, there are some factors that are beginning to clarify.
The answer to the question if UX resources should be part of the agile (e.g., Scrum) team or a central resource from my perspective largely depends on the structure of the organization: In smaller organizations and agencies, the need for holistic tasks and planning (e.g., developing a UX vision and a style guide) is smaller than in larger organizations with multiple product teams. Thus, allocating UX resources in the agile team is less problematic. In larger businesses, in which multiple development teams work towards a common goal, the necessity for holistic tasks is much larger. As it is hard to integrate such holistic long-term tasks into a sprint rhythm (mostly two weeks), in such a setting, having UX as a shared resource is more beneficial.
A very lean approach to UX with largely reduced deliverables might be just too lean for large enterprises. Still, communicating both with development teams and with decision makers to create a shared understanding and a common goal based on a visualization of the UX team’s product vision is the most important key factor to success and is more important than highly polished deliverables.
Making the design and research process open and transparent also reduces or eliminates problems with the perception of the work of the UX staff. To be able to achieve an efficient communication, it is very important to establish working processes and a language that everybody understands. For example, by establishing design personas that both, product owners and UX designers use in their stories and scenarios (tip: Make your personas available in your intranet or wiki and allow that stories and scenarios can be linked to them!).
Interdisciplinary thinking is beneficial, too. It does not hurt if UX designers have an understanding of the technical concepts of the software product, but it also does not hurt if developers and other stakeholders take pen and paper to sketch their ideas in visual brainstorming sessions rather than staying in the abstract.
Having somebody fulfilling the role of a UX design owner seems to be especially important in a B2B (business-to-business) settings. Here, the results of having a bad UX are not as immediately visible as in B2C (business-to-customer) settings. And as such, it is more important that somebody advocates the importance of UX. A bad UX in a check-out process in an e-Commerce setting will immediately result in a smaller conversion rate and loss of customers and can straightforwardly be identified with methods such as A/B testing. In contrast, the results of a bad UX in B2B software are much less directly visible (this is not to say they are less drastic!).

Thanks

…  go to to Jochen Toppe and Jesco von Voss for review and valuable comments.

System Usability Scale (SUS) – An Improved German Translation of the Questionnaire

written by Kris Lohmann and Jörg Schäffer

Introduction

Usability professionals have a variety of choices when it comes to picking a questionnaire to evaluate and measure the usability of a software product or a website in a quantitative manner.

A particularly often-used scale is the System Usability Scale (SUS). This scale was developed and made available by Brooke (1). The SUS consists of ten items. Ratings are performed on a 5-point scale on which 1 corresponds to “strongly disagree” and 5 to “strongly agree”. The evaluation results in an easy-to-interpret score from 0 – 100, similar to a percentage score. This intuitive score is a definitive plus when communicating the results to stakeholder and the management.

Aside from being a de-facto standard, the SUS has some advantages over other scales. It is scientifically validated and has been shown to outperform other scales in the sense of validity (it measures what it is supposed to measure) and sensitivity (the result shows small changes) (2, 3, 10). The SUS has been shown to be reliable (that is, if a study is repeated, the same outcome is can be expected) (10).

The SUS is free, and has been used in a lot of usability evaluations. In the words of the original author: “… [B]ecause it was made freely available has been picked up and used in many, many usability evaluations. (Cheapskates! But I’m glad so many people have found it useful).” (4)

Meta-Studies allow for comparison with other systems, the average SUS score measured by Tullis and Albert is 66 (3). Furthermore, a study by Bangor and colleagues has shown which semantic label people associate to a given SUS score. For example, a score around 20 is associated to an “awful” system and about “90” to the “best imaginable” system (9). Due to the fact that it only has ten items, the  SUS is lightweight and, thus, participants are quickly done with it.

Yet, the SUS has some drawbacks to be considered when it is used. To start with, the SUS focuses just on pragmatic quality. The hedonistic quality (5) is not really in the scope of the evaluation (though, we are not saying that hedonistic aspects do not influence the result!). Furthermore, there is no tested German translation of the scale available – this issue is addressed in this post.

German Translations of the System Usability Scale (SUS)

The original author made  the scale available in English. It was not officially translated to German. Recently there have been approaches towards translation (6,7). The most notable result was based on crowdsourcing the translation (8). We at CoreMedia reviewed the results and decided to develop our own version based on these approaches as we were not completely happy with the wording. This is the solution that we came up with:

  1. Ich denke, dass ich dieses System gerne regelmäßig nutzen würde.
  2. Ich fand das System unnötig komplex.
  3. Ich denke, das System war leicht zu benutzen.
  4. Ich denke, ich würde die  Unterstützung einer fachkundigen Person benötigen, um das System benutzen zu können.
  5. Ich fand, die verschiedenen Funktionen des Systems waren gut integriert.
  6. Ich halte das System für zu inkonsistent.
  7. Ich glaube, dass die meisten Menschen sehr schnell lernen würden, mit dem System umzugehen.
  8. Ich fand das System sehr umständlich zu benutzen.
  9. Ich fühlte mich bei der Nutzung des Systems sehr sicher.
  10. Ich musste viele Dinge lernen, bevor ich  mit dem System arbeiten konnte.

Why should you prefer the suggested German translation of SUS over other ones? Our translated version of the SUS has been used in a comparably large  on-site study with 89 participants. Overall, there were no indicators of problems.

Furthermore, there is an important difference to the version proposed in the crowdsourced translation (8). Item 10, which is formulated “I needed to learn a lot of things before I could get going with this system” in the original SUS, is translated to “Ich musste eine Menge lernen, bevor ich anfangen konnte das System zu verwenden.” in the crowdsourced translation.

From our point of view, this solution is suboptimal: In contrast to the original English version, the translation does not put any focus on the users’ goal. Therefore, we suggest a slightly different translation of this item, namely “Ich musste viele Dinge lernen, bevor ich mit dem System arbeiten konnte.” This puts more focus on the user being able to accomplish a goal with the system. Thus, the semantics of the translation of this item is closer to the original item.

Some Practical Hints for Augmenting the SUS

We augmented the SUS in two respects:
First, we used two free text fields in which the participants were asked to indicate what they liked about the software and what they disliked about the software to obtain qualitative input in addition to the quantitative measurement. The results were really insightful, hence, if a study has a formative aspect (finding problems) in addition to the summative aspect (evaluating an existing solution), augmenting the SUS with free text items should be considered.

Second, we added some in-detail items to evaluate certain aspects of the the software that are not directly covered by the SUS (e.g., quality of icons, microcopy, and visual appeal). This augmentation provided additional valuable information, as well.

Summary

The System Usability Scale (SUS) is a lightweight and scientifically validated questionnaire and an effective tool to assess perceived usability. On the basis of existing attempts for a translation, we created an improved translation, which we successfully used in an on-site study with 89 participants.

As the SUS is a lightweight questionnaire with only 10 items, augmenting it with other items is unproblematic with respect to the time participants need to complete the questionnaire. Such an augmentation is suggested if a study has formative aspects and in circumstances when specific hypothesis need to be tested.

References

(1) Brooke, J.: SUS: A “quick and dirty” usability scale. In: Jordan, P. W., Thomas, B., Weerdmeester, B. A., McClelland (eds.) Usability Evaluation in Industry pp. 189—194. Taylor & Francis, London, UK (1996)
(2) A Comparison of Questionnaires for Assessing Website Usability
(3) Tullis and Albert: Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics. Morgan Kaufmann (2008)
(4) http://ux.stackexchange.com/questions/5211/sus-scores-and-doubts
(5) http://www.flamelab.de/article/pragmatic-and-hedonic-quality-of-software-systems/
(6) http://ux.stackexchange.com/questions/10181/what-is-the-standardized-german-version-of-the-system-usability-scale-sus
(7)  http://www.sapdesignguild.org/resources/sus.asp
(8) http://isitjustme.de/2012/01/crowdsourcing-the-translation-of-sus/
(9) A Bangor, P Kortum, J Miller - Journal of usability studies (2009)
(10) Brooke: SUS: A Retrospective Journal of Usability Studies, Volume 8, Issue 2, pp. 29 – 40 (2013)

Solved: Temporary File Flood in JUnit Tests in Maven Build

Did you ever stumble across unit tests which flood your temporary folder with a lot of temporary files? Perhaps even observing disk space problems because of this? This post will provide a Quick Fix for tests triggered during Maven Build.

Problem

Most tests requiring temporary files rely on java.io.File.createTempFile() to create them. This will create a temporary file at a location defined by the system property java.io.tmpdir which for example on Linux system points to /tmp.

If tests (or their authors) would have been nice they had at least marked the file to be deleted on exit of JVM. But as experience shows (having currently tests at hand creating 70 MB of temporary files) not everyone is aware of this testiquette.

Approaching

The best solution would be to refactor the tests and to force them to use the JUnit Rule TemporaryFolder instead. Gary Gregory has greatly described this in his post JUnit Tip: Use rules to manage temporary files and folders, 2010-01-20. It is also capable of not only creating temporary files but also folders.

But if you have masses of tests to adopt this is not appropriate – and it may not be the solution for all of your problems if it is not only the test creating temporary files but also the software under test (SUT).

To detect which files get created in the temporary folder the Linux tool iwatch was helpful to me. Through the following command line I could easily see which files got created:

$ iwatch -r -e create /tmp

If you are building your workspace through Apache Maven you most likely want to have the temporary files been placed into the target/ folder as this is also the folder which gets cleaned upon mvn clean and is most likely also excluded from being commited to your version control system (VCS).

Solution

Maven runs the tests through the Maven Surefire Plugin. This can be configured to pass system properties to the JVM executing the tests. And you already might have guessed it: You can also set the property java.io.tmpdir in here. My solution goes a little further: I also set the working directory for tests as some SUT and some tests also tend to write to the current directory.

Here is a template for how to configure the Surefire Plugin in your plugin management section:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-surefire-plugin</artifactId>
  <version>2.16</version>
  <configuration>
  <!-- ... -->
    <workingDirectory>${project.build.directory}</workingDirectory>
    <systemPropertyVariables>
      <java.io.tmpdir>${project.build.directory}</java.io.tmpdir>
    </systemPropertyVariables>
  </configuration>
</plugin>

See Also

Reality Check: Legacy Code Kata with Mockito

I recently blogged about the Legacy Code Kata at Softwerkskammer Hamburg Dojo. Now it’s about time that the Kata passes the reality check. So I took some legacy code colleagues cursed about and started to write some tests strongly relying on mocking framework Mockito.

Mockito is great, …

Unlike the Coding Kata I relied on a mocking framework this time. It’s my favorite one: Mockito. So simply I mocked a page and some navigation object like this:

page = Mockito.mock(Page.class);
navigation = Mockito.mock(Navigation.class);
Mockito.when(page.getNavigation()).thenReturn(navigation);

This mocking was required deep in the callstack of the class under test.

but… The Problem

If this code will ever be refactored (actually I know that it soon will be) it might happen that the mocked call is actually not called anymore which makes the assumptions invalid and thus the result of the test is completely meaningless.

The Recommended Solution

If you are ever in need to mock some method calls you should verify that they actually got called. As this is a check of the assumptions made I recommend to place this in front of your test-assertions (as they are meaningless when these checks fail):

Mockito.verify(page, VerificationModeFactory.times(1))
  .getNavigation();

If this fails it is most likely that you have to adopt your test rather than adopt your code.

Note: times(1) could have been skipped here as this is the default. For this example atLeastOnce() would have been more appropriate as it is just important that the mocking got used – not how many times.

Side Note: AssertionError

What is bad (at least in this use case) is that Mockito.verify() raises an assertion error if the verification fails. But an assertion error is perceived as signal that something is wrong with the code under test. It would be better to have a “normal” exception here which would mark the test as “error” rather than “failure” which is more appropriate here as the test did not fail – it just did not match the assumptions.

Convenience

If you have a huge set of stubbed method calls as above you can easily convert a copy of it to a verification using this regular expression replacement:

From: when\(([^.]+)\.(.*(?=\)\.then)).*
To: verify($1, atLeastOnce()).$2;

The above example assumes static import for atLeastOnce().

SOKAHH: Coding Dojo – Legacy Code Kata

Softwerkskammer Hamburg Logo

Yesterday I participated in the monthly SOKAHH meeting – this time we met for a Coding Dojo. The Kata we performed was a so called Legacy Code Kata authored originally by Sandro Mancuso as part of a conference talk: Testing and Refactoring Legacy Code.

What the Kata is about

The central class we were interested in this Kata is the class TripService. Of course untested following the definition of Legacy Code by Michael Feathers (Working Effectively with Legacy Code).

In the workspace we found an empty test stub waiting to be filled. Most obviously the central point of the Kata was to write the test.

Spoiler Warning! Some Results

I think it’s most interesting to perform this Kata without knowing much more. For us it ended in a discussion on the different approaches we followed. So I recommend not to read on if you want to perform this Kata.

It was interesting to see different approaches to deal with TripService. Some started to read the TripService’s code to understand it and to write a test matching their understanding of the code. Others tried the code, i. e. without reading the code but only the interface trying how it would work for example with null values.

The latter one was the approach taken by our pair. We didn’t want to do much mocking or guessing what the code is about. So we just started a first test without actually knowing what will happen by calling the (one and only) method of TripService with the argument null. The result was an exception (no, not NullPointerException). We took this result as the intended behavior. So our first test stated that it expects the exception to be thrown. Actually this is the most central rule I learned about Legacy Code:

Whatever the behavior of the legacy code is – it is the intended one.

The last missing tests were added using Code Coverage to detect which parts of the code have not yet been covered by a test.

All of us learned about the code while testing. Derived from the statements in the discussion no one grasped the idea of the code at first glance when we started. But at the end some of us even dared to refactor the code in that way that it reflects what we had learned the code is about.

Feeling Familiar

Thoughts which came into my mind while doing the Kata were very familiar to me: “What the heck is this code about?” “We get an empty list as result of this test setup? I didn’t expect that.” Thus I recommend this Kata to everyone who has to deal with Legacy Code and I bet you will have some interesting insights.

Follow

Get every new post delivered to your Inbox.

Join 235 other followers