Devoxx – Day two – Caching

2010-12-07 21:22
Day two started with a really good talk on caching architectures by Greg Luck. He first motivated why caching works: Even with SSIDs being available now there is still a huge performance gap between RAM access times and having to go to disk. The issue is even worse in systems that are architected in a distributed way making frequent calls to remote systems.

When sizing systems for typical load, what is oftentimes forgotten is that there is no such thing as typical load: Usually the load distribution observed over one day for a service used mainly in one time zone has the shape of an elephant – most queries are issued during lunch time (head of the elephant) with another but smaller peak during the afternoon. This pattern repeats when looking at the weekly distribution, repeats again when looking at the yearly distribution. When looking at the peak time of the year, at the peak day, at the peak time your lead may be increased by several orders of magnitude compared to average load.

Although query volume may be high in most applications that reach out for caching, these queries usually exhibit a power law distribution. This means that there are just a few queries being issued very frequently, however many queries are pretty seldom. This pattern allows for high cache hit rates thus reducing load substantially even during very busy times.

The speaker went into some more detail concerning different architectures: Usually projects start with one cache located directly on the frontend server. When scaling horizontally and adding more and more frontends this leads to an ever increasing load on the database during one period of lifetime for one cached item. The first idea employed to remedy this setup is to link the different caches to each other increasing cache hit rates. Problem here are updates racing to the various caches when the same query is issued to the backend by more than one frontend. The usual next step is to go for a distributed remote cache such as memcache. Of course this has the draw-back of now having to do a network call for each cache access slowing down response times by several milliseconds. Another problem with distributed caching systems is a theorem well known to people building distributed NoSQL databases: CAP says that you can get only two of the three desired properties consistency, availability and partition-tolerance. Ehcache with a terracotta back end lets you configure where your priority lies.

Devoxx University – Productive programmer, HBase

2010-12-04 21:17
The first day at Devoxx featured several tutorials – most interesting to me was the pragramatic programmer. The speaker also is the author of the equally named book at O'Reilly. The book was the result of the observation that developers today are more and more IDE bound, no longer able to use the command line effectively. The result are developers that are unnecessarily slow when creating software. The goal was to bring usage patterns of productive software development to juniors how grew up in a GUI only environment. However half-way through the book, it became apparent that a book on command line wizardry only is barely interesting at all. So the focus was shifted and now includes more general productivity patterns.
The goal was to accelerate development – mostly by avoiding time consuming usage patterns (minimise mouse usage) and automation of repetitive tasks (computers are good at doing dull, repetitive tasks – that's what they are made for.
Second goal was increasing focus. Two main ingredients to that are switching off anything that disturbs the development flow: No more pop-ups, not more mail notifications, no more flashing side windows. If you have ever had the effect of thinking “So late already?” when your colleagues were going out for lunch – then you know what is meant by being in the flow. It takes up to 20min to get into this mode – but just the fraction of a second to be thrown out. With developers being significantly more productive in this state it makes sense to reduce the risk of being thrown out.
Third goal was about canonicality, fourth one on automation.
During the morning I hopped on and off the Hadoop talk as well – the tutorial was great to get into the system, Tom White went into detail also explaining several of the most common advanced patterns. Of course not that much new stuff if you sort-of know the system already :)

First steps with git

2010-10-30 19:47
A few weeks ago I started to use git not only for tracking changes in my own private repository but also for Mahout development and for reviewing patches. My setup probably is a bit unusual, so I thought, I'd first describe that before diving deeper into the specifc steps.

Workflow to implement

With my development I wanted to follow Mahout trunk very closely, integrating and merging any changes as soon as I continue to work on the code. I wanted to be able to work with two different machines on the client side that are located at two distinct physical locations. I was fine with publishing any changes or intermediate progress online.

The tools used

I setup a clone of the official Mahout git repository on github as a place the check changes into and as a place to publish my own changes.

On each machine used, I cloned this github repository. After that I added the official Mahout git repository as upstream repository to be able to fetch and merge in any upstream changes.

Command set

After cloning the official Mahout repository into my own github account, the following set of commands was used on a single client machine to clone and setup the repository. See also the Github help on forking git repositories.

#clone the github repository
git clone git@github.com:MaineC/mahout.git

#add upstream to the local clone
git remote add upstream git://git.apache.org/mahout.git


One additional piece of configuration that helped make life easier was to setup a list of files and file patterns to be ignored by git.

Each distinct changeset (be it code review, code style changes or steps towards own changes) would then be done in their own branches locally. To share them with other developers as well as make them accessible to my second machine I would use the following commands on the machine used for initial development:

#create the branch
git branch MAHOUT-666

#publish the branch on github
git push origin MAHOUT-666


To get all changes both from my first machine and from upstream into the second machine all that was needed was:

#select correct local branch
git checkout trunk

#get and merge changes from upstream
git fetch upstream
git merge upstream/trunk

#get changes from github
git fetch origin
git merge origin/trunk

#get branch from above
git checkout -b MAHOUT-666 origin/MAHOUT-666


Of course pushing changes into an Apache repository is not possible. So I would still end up creating a patch, submit that to JIRA for review and in the end apply and commit that via svn. As soon as these changes finally made it into the official trunk all branches created earlier were rendered obsolete.

What still makes me stick with git especially for reviewing patches and working on multiple changesets is it's capability to quickly and completely locally create branches. This feature totally changed my so-far established workflow for keeping changesets separate:

With svn I would create a separate checkout of the original repository from a remote server, make my changes or even just apply a patch for review. To speed things up or be able to work offline I would keep one svn checkout clean, copy that to a different location and only there apply the patch.

In combination with using an IDE this workflow would result in me having to re-import each different checkout as a separate project. Even though both Idea and Eclipse are reasonably fast with importing and setting up projects it would still cost some time.

With git all I do is one clone. After that I can locally create branches w/o contacting the server again. I usually keep trunk clean from any local changes - patches are applied to separate branches for review. Same happens to any code modifications. That way all work can happen when disconnected from the version control server.

When combined with IntelliJ Idea fun becomes even greater: The IDE regularly scans the filesystem for updated files. So after each git checkout I'll find the IDE automatically adjust to the changed source code - that way avoiding project re-creation. Same is of course possible with Eclipse - it just involves one additional click on the Refresh button.

For me git helped speed up my work processes and supported use cases that otherwise would have involved sending patches to and fro between separate mailboxes. That way work with patches and changeset seemed way more natural and better supported by the version control system itself. In addition it of course is a great relief to be able to checkin, diff, log, checkout etc. even when disconnected from the network - which for me still is one of the biggest advantages of any distributed version control system.

Update
Lance Norskog recently pointed out one more step that is helpful:
You didn't mention how to purge your project branch out of the github fork. From http://help.github.com/remotes/: Deleting a remote branch or tag

This command is a bit arcane at first glance… git push REMOTENAME :BRANCHNAME. If you look at the advanced push syntax above it should make a bit more sense. You are literally telling git “push nothing into BRANCHNAME on REMOTENAME”. And, you also have to delete the branch locally also.

Scientific debugging

2010-10-28 20:09
Quite some years ago I ready Why programs fail - a systematic guide to debugging - a book written on the art of debugging programs written by Andreas Zeller (Prof. at University Saarbrücken, researcher working on Mining Software Archives, Automated Debugging, Mutation Testing and Mining Models and one of the authors of famous Data Display Debugger).

One aspect that I found particularly intriguing about the book was up to then known to me only from the book Zen and the Art of Motorcycle Maintenance: Scientific debugging as a way to find bugs in a piece of software in a structured way with methods usually known from the scientific method.

When working together with software developers one theme that often occured to me as very strange is other developers - especially Junior level developers' - ways of debugging programs: When faced with a problem they would usually come up with some hypothesis as to what the problem could be. Without verifying the correctness of that hypothesis they would jump directly to implementing a fix for the perceived problem. Not too seldom this led to either half baked solutions somewhat similar to the Tweaking the Code behaviour on "The daily WTF", or - even worse - to no solution at all after days of implementation time were gone.

For the rest of this posting I'll make a distinction between the terms

  • defect for a programming error that might cause unintended behaviour.
  • failure for an observed mis-behaviour of a program. Failures are caused by specific defects.


For all but trivial bugs (for your favourite changing and vague definition of "trivial") I try to apply a more structured approach to identifying causes for failures.


  1. Identify a failure. Reproduce the failure locally whenever possible. To get most out of the process write some automated test that reproduces the failure.
  2. Form a hypothesis. Usually this is done sub-consciously by most developers. This time around we explicitly write down what we think what the underlying defect is.
  3. Devise an experiment. The goal here is to show experimentally that your hunch about the defect was correct. This may involve debug statements, more extensive logging, using your favourite debugger with a break point set exactly at the position where you think the defect lies. However the experiment could also involve using wireshark or tcpdump if you are debugging even simple distributed systems. On the other hand for extremely trivial defects the experiment could just be to fix the defect and see the test run through successfully.
  4. Observe results.
  5. Reach a conclusion. Interpret the result of your experiment. If they reject your hypothesis move on to your next potential cause for the failure. If they don't you can go on to either devise more experiments in support of your hypothesis if the last one didn't convince you (or your boss) or fix the defect just found. In any case you should tidy up any remains of your experiment before moving on in most cases.


When taking a closer look at the steps involved it's actually pretty straight forward. This makes this method so easy to use while still being most effective. When combined with automated testing it even helps when squashing bugs in code that is not written by the person fixing it. One way to make the strategy even stronger is to support the process by manually writing down a protocol of the debugging session with pen and paper. Not only does this help avoid checking the same hypothesis over and over again. It's also a way to quickly note down all hypothesis': In the process of debugging the person doing the analysis might be faster thinking of new possible causes than he can actually check. Noting them down helps keeping one's mind free and open as well as remembering all possible options.

Part 4: Constant evaluation and improvement: Finding sources for feedback.

2010-10-24 08:13
In recent years demand for shorter feedback cycles especially in software development has increased. Agile development, lean management and even Scrum are all for short feedback cycles: Coming from the dark ages when software projects would last for months or even years before any results could be delivered to customers we are transforming development into a process that integrates the customer in the design and evolution of his own product. Developers have learned that planning ahead for years does not work: It's not only customers changing their mind so fast, it's requirements changing quickly. The only achievement from months-long cycles is getting input on your work later, being able to hide deficiencies longer.

However not only for planning and management does it make sense to install fast feedback loops. A few days ago I finished reading the book "Apprenticeship patterns". A book that gives an overview of various patterns that help improve software development skills.

One major theme was about getting fast feedback constantly. On the (agile) development side, automated tests (unit and integration) and continuous integration systems are technical aids that can help. Pair programming and code review take the idea of fast feedback one step further by having humans give you feedback on what cannot possibly be evaluated automatically.

There is just one minor glitch: With really short feedback loops any mistake you make gets revealed instantly. This is not particularly special to agile development. Another area with these kinds of fast feedback loops are projects developing free software - see also the last paragraph in Bertrand's blog on This is how we work at Apache.

There are developers who have a hard time living with exposing their work to a wider audience quickly. However it has the huge benefit of revealing any errors as early as possible - ideally at a point in time where fixing them is still cheap. Examples for options spotting mistakes early in various environments are listed below:


  • Open source development: Mistakes are revealed during code review of patches (or checkins).
  • Scrum: Speeding up error spotting can be implemented by closely integrating the product owner during each sprint. As soon as a feature is done - according to the developer team - it gets reviewed by the product owner this way reducing risk of features getting rejected during the sprint review.
  • In the team: Get each change set either mailed to all developers allowing for fast review of incoming code.


These are all good ways for feedback, however what about non-coding areas? Are there ways to build fast feedback into tasks that do not involve coding? I'll pick just one example to highlight some ways that facilitate fast feedback in a non-coding environment.

From time to time even hard-core coders have to meet face-to-face to discuss new designs, learn more about customers' requirements. This may involve going to conferences, giving talks, organising internal workshops, public meetups or even conferences.

Usually people doing the organisation are too busy to "watch what happens": They already drown in tasks to do and have no time (and are in no good position) to objectively evaluate the conference's quality.

Still there are ways to build feedback loops even into this kind of setup. Most of them have to do with communication:

  • Ask people to give you feedback in an official feedback form: Don't expect more than a fraction of the actual attendees to answer that form. Still it can be a source for honest feedback when done correctly. Include a few open questions, don't ask people to rate each and every task - they'll never find the time to do that anyway. Read the free-form comments - usually they provide way more insight than any rating question anyway.
  • Talk to people, ask for proposals on what you should do differently next time.
  • Watch what people are telling on the net - however keep in mind that those statements usually are a bit biased showing only the extreme ends of the spectrum.


The same applies to people giving presentations: Talk to people from the audience after your presentation is over. If your talk was video-taped, you are in a really great situation, as now you can judge for yourself what should be improved and where there are glitches in your arguments.

According to my experience people very rarely give explicit feedback - except when being really disappointed or extremely positively surprised. However when asked for precise feedback on specific aspects people are usually more than willing to share their experience, tell you where to improve and what to change. Usually it turns out to be a good idea to actively seek out people for evaluation of your projects to get better at what you do, to encourage peers to tell you what you do wrong or even where you could get slightly better.

A Get Together Checklist

2010-10-06 19:38
Still on the list of potentially interesting books: The Checklist Manifesto - explaining why checklists can still be valuable - especially for complex problems and tasks.

Though not very complex, I chose to come up with a checklist for running a Hadoop Get Together in Berlin as an exercise. I'm trying to stick with advise provided by the Checklist for Checklists.

Parties involved


  • Find two to three speakers two months in advance.
  • Find a sponsor for the videos.

Gathering information


  • Double check time and date with all speakers and newthinking store.
  • Get name, title, abstract from the speakers.
  • Get logo and exact conditions from sponsor.

Spreading the word


  • Put together an announcement text including thanks to video and venue sponsors.
  • Publish the event on Upcoming.
  • Publish the event on Xing.
  • Augment the announcement text by the Xing event and Upcoming links.
  • Send a newsletter to the Meetup Xing group.
  • Send the text to the Get Together mailing list, and if appropriate to the Hadoop, HBase, katta, Lucene, Solr and Mahout mailing lists.
  • On event day send a reminder to the Get Together mailing list
  • Create meetup intro slides including thanks for the sponsors, schedule, announcements of future events.

During the meetup


  • Mention newthinking bar during introduction.
  • Self-introduction of all participants.
  • Get mail addresses of future mailing list subscribers.
  • Keep presentations at 30 to 40 minutes.
  • Get speakers' slides.

After the event


  • Publish talks' slides.
  • Publish links to videos.


The more meetups you have run the larger the chance of the main organiser getting sick the day the meetup takes place. To avoid having to re-schedule the event make sure there are people that are capable and willing to take over moderation.

Are devs contributing to OSS happier?

2010-09-24 20:18
When talking to fellow developers or meeting with students it happens from time to time that I get the question of why on earth I spent my freetime working on an open source project? Why do I spend weekends at developers' conferences like FOSDEM? Why do spent afternoons organising meetups? Why is it that I am reviewing and writing code after work for free?

Usually I point people to a post by Shalin explaining some of his reasons to contribute to open source. The post quite nicely summarises most reasons that match well with why I contribute back.

On the Apache Community mailing list Grant Ingersoll asked the question about whether devs who work on or use open source are happier in their employment.

In his response Mike posted a link to a video on what motivates people that adds another piece of information to the question of why work on open source software can be perceived as very rewarding though no money is involved: With people doing cognitively challenging tasks, motivation via payment can get you only so far. There are other motivational factors that might play an equal if not larger role in getting people to perform well on their day-to-day work:


  • Autonomy: If people are supposed to be engaged with their project they need time and freedom to chose how to solve their tasks. Many large engineering driven companies like Google or Atlassian have gone even further by introducing the concept of giving people a day a week to work on what they want how they want provided they share their results. These so-called 20% projects have shown to have high potential of turning into new, creative project ideas but also even into bugs or problems getting fixed.
  • Mastery: Great developers strive to get better at what they do - simply because realizing that you actually learn something and get better at what you do can be very satisfying. One way of achieving that goal is to work together with peers on common projects. The larger the pool of peers to draw from, the higher the probability of you finding mentors to help you out and to point out mistakes you make.

    There is one more factor why working on open source increases your coding level that should not be underestimated. Grant Ingersoll nicely described it in the thread mentioned above: "I was just talking with a friend yesterday, and fellow committer, who said he is a much better programmer since contributing. Of course, it makes sense. If your underwear is on display for all to see, you sure better make sure it is clean!"
  • Purpose: People like to work on projects for a purpose. Be it to make all information accessible to the world or to turn earth into a better place by making cheap calls available to everyone. As a counter example deploying some software only for the purpose of selling a license and not make life of your client better by recommending the best solution to help solve his problem may not be half as satisfying.


There is quite some documentation out there on what drives people who contribute to open source projects. The video shared by Mike nicely summarizes some of the motivations of people that are independent of open source work but are closely related to it.

Apprenticeship patterns (O'Reilly)

2010-09-23 08:17
A few days ago I finished reading the book "Apprenticeship Patterns" - Guidance for the Aspiring Software Craftsman, by
Dave Hoover, Adewale Oshineye. The book is addressed to readers who have the goal of becoming great software devleopers.

One naive question one could ask is why there is a need for such a book at all? Students are trained in computer science at university, then enter some IT departement and simply learn from their peers. So how is software development any different than other professions? Turns out there are a few problems with that approach: At university students usually don't get the slightest idea of what professional software development looks like. After four years of study they still have a long way to go before writing great software. When entering your average IT shop these juniors usually are put on some sort of customer project with tight deadlines. However learning implies making mistakes, it implies having time to try different routes to find the best one. Lucky are those very few who join a team that has a way for integrating and training junior developers. Last but not least at least in Germany tech carrier paths are still rare: As soon as developers excel they are offered a promotion - which usually leads straight into management before they even had a chance to become masters in their profession.

So what can people do who love writing software and want to become masters in their profession? The book provides various patterns, grouped by task:

  • Emptying the cup deals with setting up an attitude that enables learning: To be able to learn new skills the trainee first has to face his ignorance and realise that what he knows already is just a tiny little fraction of what differenciates the master from the junior.
  • In the second chapter "Walking the long road" the book deals with the problem of deciding whether to stick with software development or to go into management. Both paths provide their own hurdles and rewards - in the end the developer himself has to decide which one to go. Deciding for a technical carrier however might involve identifying new kinds of rewards: Instead of being promoted to senior super duper manager, this may involve benefits like getting a 20% project, setting up a company internal user group, getting support for presenting ones projects at conferences. The chapter also deals with motivational side of software development: Let's face it, professional development usually is way different from what we'd do if we had unlimited time. It may involve deadlines that cannot be met, it may invovle customers that are hard to communicate with. One might even have to deal with unmovtivated colleagues who have lower quality standards and no intention to learn more than what is needed to accomplish the task at hand. So there is the problem of staying motivated even if times get rough. Getting in touch with other developers - external and internal - here can be a great help: Attending user groups (or organising one), being part of an open source project, meeting regularly with other developers in one's general geografical area all may help to remember the fun things about developing software.
  • The third group of patterns has been put under the headline "Accurate self-assessment" - as people get better and better it get ever harder to remember that there are techniques out there one does not yet know. Being the best in a team means that there is not more room to learn in that environment. It's time to find another group to get in touch with others again: To be the worst in a team means there is a lot of room for learning, finding mentors helps with getting more information on which areas to explore next. Especially helpful is working on a common project with others - doing pair programming can help even with picking up just minor optimisations in their work environment.
  • The fourth chapter "Perpetual learning" deals with finding opportunities to learn new technologies - either in a toy project that in contrast to professional work is allowed to break and can be used to try and test new techniques and learn new languages. Other sources for learning are the source code itself, tech publications on magazines, books (both new and classic), blogs and mailing lists. Reflecting on what you learned helps remember it later - on option to reflect may involve writing up little summaries of what you read and keeping them in a place where you can easily retrieve them (for me this blog has turned into such a resource - yeah, I guess writing this book summary is part of the exercise, even was a proposal in the book itself). Last but not least one of the best resources for reflection and continued learning is to share knowledge - though you may feel there are others out there way better then you are, you are the one who just went though all the initial loops that no master remembers anymore. You can explain concepts in easy to understand words. Sharing and teaching means quickly finding gaps in your own knowledge and fixing them as you go forward. Last but not least it is important to create feedback loops: It does not help to learn after three years of coding that what you did does not match a customers expectations. As an apprentice you need faster feedback: On a technical level this may involve automated tests, code analysis and continuous integration. On a personal level it involves finding people to review your code. It means discussing your ideas with peers.
  • The last chapter on "Constructing your curriculum" finally dealt with the task of finding a way to remain up to date, e.g. by following re-known developers' blogs. But also studying the classic literature - there are various books in computer science and software development that have been written back in the 60s and 70s but are still highly relevant.


The book does not give you a recipe to turn from junior to master in the shortest possible time. However it successfully identifies situations many a software developer has encountered in his professional life that made him quesion his current path. It provides ideas on what to do to improve one's skills even if the current IT industry may not be best equipped with tools for training people.

My conclusion from the book was that most important is getting in touch with other developers, exchanging ideas and working on common projects. Open source get several mentions in the book, but also for me has turned out to be a great source for getting feedback, help and input from the best developers I've met so far.

In addition meeting people who are working on similar projects face-to-face provides a lot of important feedback as well as new ideas to try out. Talking with someone over a cup of coffee for two hours sometimes can be more productive than discussing for days over e-mail. Hacking on a common project, maybe even in the same location, usually is the most productive way not only to solve problems but also to pick up new skills.

Part 3: A polite way to say no - and why there are times when it doesn’t work.

2010-09-07 23:05
After having shared my thoughts on how to improve focus and how to track tasks eating up time this post will explain how to keep time invested at a more or less constant level. The goal of this exercise is to keep obligations at a reasonable level - be it at work or during ones spare time.

In recent time I have collected a small set of techniques to reduce what gets to my desk - I don't claim this list to be exhaustive. However some of it did help me organise conference and still have a life besides that.

Sharing and delegating tasks



Sharing and delegating are actually two different ways of integrating other people: Sharing for me means working together on a topic. That could be as easy as setting up a CMS or it could be more involved as in publishing articles on Lucene in some magazine. The advantage is that both of you can contribute to the task, possible even learn from each other: When I was doing the article series on Lucene together with Uwe it also was a great learning experience for me to have someone take the time to explain to me - well, not only to me - what flexible indexing, local search and numeric range queries are really all about, as in technically implemented. So it was not only an enormous time-saver for me, as the alternative would have been me reading through documentation, code and mailing lists to get up to date. But it also gave me the unique opportunity to learn from the very developers of these features about how they work and how they are meant to be used.

The disadvantage of sharing is that part of the work still remains on your desk. That's where delegation helps: Take the task, find someone who is capable and willing to solve it and give it to them. There are two problems here: First you have to trust people to actually work on the task. Second you probably cannot avoid checking back from time to time to see if there is progress, if there are any impediments etc. So it means less work than with sharing. But there is more risk in not getting your results and more work to be done for co-ordination. However it is a very powerful technique if applied correctly to scale what can be achieved: Telling people what you need help with and letting them take over some of that work does scale way better than micro-managing people or even trying to be part of every piece of a project. It means giving up some of your control, in return you can turn to other potentially more involved tasks. Note to self: Need to build up more trust in that area.

Both concepts however are not actually about saying no but about being able to say yes even if you already have just very few time left.

Prioritisation



Prioritising tasks can be done on a scale from zero to any arbitrarily large number. Obviously it helps with deciding whom to say no to: It's going to be those projects rated very low. That is those you could easily do without That's the simplest case as it is easiest to explain. The strategy I usually use is to be honest with people: If there are conflicting conferences, it's easy to reject invitations. If some publication does not pay for you, it's easiest to be open and honest with people and tell them. Usually they will understand.

A second reason for a rating of zero is that the task is one of those "Does not belong on my desk" tasks. My advice for those would be to get rid of them as quickly as possible: They draw away your energy without giving back any value. This issue plays nicely with the "patches welcome" theme from open source: People working on open source projects are most successful if they are driven by their own needs. So if you want something implemented, either implement and submit it yourself - or find someone you can pay to do so. People will not work for you. You can jump up and down, complain on the mailing lists - but if the feature you would like to see is something that no-one else in the existing community needs, it won't get done until someone needs it.

Introduce barriers



A nice way of rejecting favours that works at least sometimes is to raise the barrier. The example here would be getting an invitation to give an introductory talk for a closed audience. So what I tried was to raise the bar by asking for funding for travel and accommodation.

Keep in mind though that there is the risk that the one inviting you actually accepts your conditions - no matter how high you think you have set them. Especially the example given above has the problem of being too low a bar in most cases. So be prepared to have to keep your promise. As a result the conditions you set really should lead to the task turning into something that is fun to do.

Cut off early



Imaging you have committed to some task. Later on you realise you won't actually make it: You have no time, priorities have changed, the task is too involved or any other reason you could potentially imaging.

The important way to reduce the load on your desk is to communicate this issue as early as possible. It's clear that people will be more disappointed the later they learn that something they probably depend on won't arrive in time or will never happen: They'll never be extremely happy, however the sooner they learn the more time they have on their part to react. And actually, most people don't react that disappointed at all, simply because they may have counted some risk into the equation when giving you the task - which is not to say you should lower the reliability of your commitments, simply because no-one is expecting you to meet your goals anyway. However usually the amount of trouble expected is way higher than what actually happens. Second note to self: Don't forget about this option.

Patches welcome



At least in open source: If it's nothing that helps make your world better - there are other people out there to help out. Patches being welcome may seem obvious. However in some areas it really is not: If someone asks the project member to be present at some conference, he may himself not consider himself capable of representing the project or even just making an impact by talking to people about it. That is the point where to encourage people that any input is welcome - not only code, but also documentation, communication and marketing work.

Of course as with any Pattern there are boundaries when not to apply it or when applying it would mean too much effort or loss. If that is the case and you have committed and cannot step back, than you should think about what could be a great reward if you went through the tasks: What would it take to make you happily comply and still gain energy through what you are doing? Basically it isn't about doing what you like but about loving what you do (L. Tolstoi).

There is also valuable advice on managing ones energy from the Apache Software Foundation that is specially targeted at new committers. If you have not done so yet take the time to read it.

Part 2: Tracking tasks, or - Where the hack did my time go to last week?

2010-09-03 18:22
After summarising some strategies for not loosing track of tasks, meetings and conferences in the last post, this one is going to focus on the retrospect on achievements. If at some point in time you have asked yourself "Where the hack did time go to?" - maybe after two busy weeks of work this article might have a few techniques for you.

Usually when that happens to me it's either a sign that I've been on vacation (where that is totally fine) or that too many different, sometimes small but numerous tasks have sneaked into my schedule.

Together with Thilo I have found a few techniques helpful in dealing with these kind of problems. The goals in applying them (at least for me) have been:

  • Configure the planned work load to a manageable amount.
  • Make transparent and trackable (to oneself and others) which and how many tasks have been finished.
  • Track over time any changes in number of tasks accomplished per time slot.


After hearing about Scrum and its way of planning tasks I thought about using it not only for software development but for task planning in general. Scrum comprises some techniques that help achieving the goals described above:


  1. In Scrum, development is split into sprints: Iterations of focussed software development that are confined to a fixed length. Each sprint is filled up with tasks. The number of tasks put into one sprint is defined by the so-called velocity of the team.
  2. Tasks are ordered by priority by the product owner. Priority here is influenced by factors like risk (riskier tasks should be attacked earlier than safe ones), ROI (those tasks that promise to increase ROI most should of course be done and launched first) and a few more. After priorisation, tasks are estimated in order - that way those tasks most important to the product owner are guaranteed to have an estimated complexity defined even if there was not enough time to estimate all items.
  3. Complexity here does not mean "amount of time to implement a feature" - it's more like how much time do we need, how much communication overhead is involved, how complex is the feature. A workable way to come up with reasonably sensible numbers is to chose one base item, assing complexity of one to it and estimate all coming items in terms of "is as complex as the base item", "has double the complexity" - and so on - according to the fibonacci series. Fibonacci is well suited for that task as do not increase linearly - similarly humans are better at estimating small things (be it distances or complexities). As soon as items get too big, estimation also tends to be way off the real number.
  4. To come up with a reasonable estimate of what can be done in any week, I usually just look back to past weeks and use that as an estimate. That technique is close enough to the real number to be a working approach.


To track what got done during the past week, we use a whiteboard as Scrum Board. Putting tasks into the known categories of todo, checked out and done. That way when resetting the board after one week and adding tasks for the following week it is pretty obvious which actions ate up most of the time. The amount of work that goes onto the board is restricted to not be larger than what got accomplished during the past week.

So what goes onto the whiteboard? Basically anything that we cannot track as working hours: The Hadoop Get Together can be found just next to doing the laundry. Writing and sending out the long deferred e-mail is put right next to going out for dinner with potential sponsors for free software courses at university.

Now that weekly time tracking is set-up - is there a way to also come up with a nice longer term measure? Turns out, there are actually three:

First and most obviously the whiteboard itself provides an easy measure: By tracking weekly velocity and plotting that against time it is easy to identify weeks with more or less freetime. As a second source of information a quick look into ones calendar quickly shows how many meetings and conferences one attended over the course of a year. Last but not least it helps to track talks given on a separate webpage.

It helps to look back from time to time: To evaluate the benefit of the respective activities, to not loose track of the tasks accomplished, to prioritise and maybe re-order stuff on the ToDo list. Would be great if you'd share some of your techniques of tracking and tracing time and tasks - either in the comments or as a separate blog post.