Apache dinner wrap-up

2010-09-21 22:19
Today three Lucene committers, two Mahout committers (one of them also being committer of Lucene: Hi Karl, was great having you here - see you at the next Hadoop Get Together, or maybe some day for lunch.), several users of Lucene and Hadoop together with their family (including a very cute, unbelievably quiet three weeks old baby) met at Jamerica - a restaurant offering American as well as Jamaican food in Schöneberg.

Daniel promised to organise the next dinner - looks like in October we meet somewhere close to his place in Potsdam. If you are one of the attendees please do feel invited to organise one of these evenings. It's really simple: Setup a doodle with some proposed dates, send a mail to our Apache Dinner Berlin mailing list with the link included. After that we simply vote on the dates, the date with most votes wins. It's then up to you to book a table in your favourite restaurant, send the address and time to the mailing list and that's about it. Don't be shy standing up - this meeting really is intended to be community driven, getting Apache people, friends, relatives at one table. Please also invite any friends that you know are in town - there is not problem adjusting the schedule to visitors.

For those of you who are currently not in Berlin: There will be a Apache Dinner Paris very soon. Would be great if you could let people know if you want to attend - makes booking a table way easier.

Apache Dinner this evening

2010-09-21 08:00
This evening the September Apache Dinner takes place in Jamerica Schöneberg. I have booked a table for ten to fifteen people - we'll see whether that is sufficient this time :)

Looking forward to see you there at 7p.m.

Apache Hadoop Get Together Berlin - October 2010

2010-09-15 07:31
This is to announce the next Apache Hadoop Get Together sponsored by JTeam that will take place in newthinking store in Berlin.

October 7th, 5p.m.
Newthinking store Berlin, Tucholksystr. 48

As always there will be slots of 30min each for talks on your Hadoop topic. After each talk there will be a lot time to discuss. You can order drinks directly at the bar in the newthinking store. If you like, you can order pizza. We will go to Cafe Aufsturz after the event for some beer and something to eat.

Talks scheduled so far:

Max Heimel: "Hidden Markov Models for Apache Mahout"

Abstract: In this talk I will present and discuss an implementation of a powerful statistical tool called Hidden Markov Models for the Apache Mahout project. Hidden Markov models allow to mathematically deduce the structure of an underlying - and unobservable - process based on the structure of the produced data. Hidden Markov Models are thus frequently applied in pattern recognition to deduce structures that are not directly observable. Examples for applications of Hidden Markov Models include the recognition of syllables in speech recordings, handwritten letter recognition and part-of-speech tagging.

Sebastian Schelter: Distributed Itembased Collaborative Filtering with Apache Mahout"

Abstract: Recommendation Mining helps users find items they like. A very popular way to implement this is by using Collaborative Filtering. This talk will give an introduction to an approach called Itembased Collaborative Filtering and explain Mahout's Map/Reduce based implementation of it.

View Larger Map

Please do indicate on Upcoming if you are coming so we can more safely plan capacities.

JTeam is looking for Java developers and search enthusiasts. Check out their jobs page for more info!

As always a big Thank You goes to newthinking store for providing the venue for free for our event.

Looking forward to seeing you in Berlin as well,

Part 3: A polite way to say no - and why there are times when it doesn’t work.

2010-09-07 23:05
After having shared my thoughts on how to improve focus and how to track tasks eating up time this post will explain how to keep time invested at a more or less constant level. The goal of this exercise is to keep obligations at a reasonable level - be it at work or during ones spare time.

In recent time I have collected a small set of techniques to reduce what gets to my desk - I don't claim this list to be exhaustive. However some of it did help me organise conference and still have a life besides that.

Sharing and delegating tasks

Sharing and delegating are actually two different ways of integrating other people: Sharing for me means working together on a topic. That could be as easy as setting up a CMS or it could be more involved as in publishing articles on Lucene in some magazine. The advantage is that both of you can contribute to the task, possible even learn from each other: When I was doing the article series on Lucene together with Uwe it also was a great learning experience for me to have someone take the time to explain to me - well, not only to me - what flexible indexing, local search and numeric range queries are really all about, as in technically implemented. So it was not only an enormous time-saver for me, as the alternative would have been me reading through documentation, code and mailing lists to get up to date. But it also gave me the unique opportunity to learn from the very developers of these features about how they work and how they are meant to be used.

The disadvantage of sharing is that part of the work still remains on your desk. That's where delegation helps: Take the task, find someone who is capable and willing to solve it and give it to them. There are two problems here: First you have to trust people to actually work on the task. Second you probably cannot avoid checking back from time to time to see if there is progress, if there are any impediments etc. So it means less work than with sharing. But there is more risk in not getting your results and more work to be done for co-ordination. However it is a very powerful technique if applied correctly to scale what can be achieved: Telling people what you need help with and letting them take over some of that work does scale way better than micro-managing people or even trying to be part of every piece of a project. It means giving up some of your control, in return you can turn to other potentially more involved tasks. Note to self: Need to build up more trust in that area.

Both concepts however are not actually about saying no but about being able to say yes even if you already have just very few time left.


Prioritising tasks can be done on a scale from zero to any arbitrarily large number. Obviously it helps with deciding whom to say no to: It's going to be those projects rated very low. That is those you could easily do without That's the simplest case as it is easiest to explain. The strategy I usually use is to be honest with people: If there are conflicting conferences, it's easy to reject invitations. If some publication does not pay for you, it's easiest to be open and honest with people and tell them. Usually they will understand.

A second reason for a rating of zero is that the task is one of those "Does not belong on my desk" tasks. My advice for those would be to get rid of them as quickly as possible: They draw away your energy without giving back any value. This issue plays nicely with the "patches welcome" theme from open source: People working on open source projects are most successful if they are driven by their own needs. So if you want something implemented, either implement and submit it yourself - or find someone you can pay to do so. People will not work for you. You can jump up and down, complain on the mailing lists - but if the feature you would like to see is something that no-one else in the existing community needs, it won't get done until someone needs it.

Introduce barriers

A nice way of rejecting favours that works at least sometimes is to raise the barrier. The example here would be getting an invitation to give an introductory talk for a closed audience. So what I tried was to raise the bar by asking for funding for travel and accommodation.

Keep in mind though that there is the risk that the one inviting you actually accepts your conditions - no matter how high you think you have set them. Especially the example given above has the problem of being too low a bar in most cases. So be prepared to have to keep your promise. As a result the conditions you set really should lead to the task turning into something that is fun to do.

Cut off early

Imaging you have committed to some task. Later on you realise you won't actually make it: You have no time, priorities have changed, the task is too involved or any other reason you could potentially imaging.

The important way to reduce the load on your desk is to communicate this issue as early as possible. It's clear that people will be more disappointed the later they learn that something they probably depend on won't arrive in time or will never happen: They'll never be extremely happy, however the sooner they learn the more time they have on their part to react. And actually, most people don't react that disappointed at all, simply because they may have counted some risk into the equation when giving you the task - which is not to say you should lower the reliability of your commitments, simply because no-one is expecting you to meet your goals anyway. However usually the amount of trouble expected is way higher than what actually happens. Second note to self: Don't forget about this option.

Patches welcome

At least in open source: If it's nothing that helps make your world better - there are other people out there to help out. Patches being welcome may seem obvious. However in some areas it really is not: If someone asks the project member to be present at some conference, he may himself not consider himself capable of representing the project or even just making an impact by talking to people about it. That is the point where to encourage people that any input is welcome - not only code, but also documentation, communication and marketing work.

Of course as with any Pattern there are boundaries when not to apply it or when applying it would mean too much effort or loss. If that is the case and you have committed and cannot step back, than you should think about what could be a great reward if you went through the tasks: What would it take to make you happily comply and still gain energy through what you are doing? Basically it isn't about doing what you like but about loving what you do (L. Tolstoi).

There is also valuable advice on managing ones energy from the Apache Software Foundation that is specially targeted at new committers. If you have not done so yet take the time to read it.

Part 2: Tracking tasks, or - Where the hack did my time go to last week?

2010-09-03 18:22
After summarising some strategies for not loosing track of tasks, meetings and conferences in the last post, this one is going to focus on the retrospect on achievements. If at some point in time you have asked yourself "Where the hack did time go to?" - maybe after two busy weeks of work this article might have a few techniques for you.

Usually when that happens to me it's either a sign that I've been on vacation (where that is totally fine) or that too many different, sometimes small but numerous tasks have sneaked into my schedule.

Together with Thilo I have found a few techniques helpful in dealing with these kind of problems. The goals in applying them (at least for me) have been:

  • Configure the planned work load to a manageable amount.
  • Make transparent and trackable (to oneself and others) which and how many tasks have been finished.
  • Track over time any changes in number of tasks accomplished per time slot.

After hearing about Scrum and its way of planning tasks I thought about using it not only for software development but for task planning in general. Scrum comprises some techniques that help achieving the goals described above:

  1. In Scrum, development is split into sprints: Iterations of focussed software development that are confined to a fixed length. Each sprint is filled up with tasks. The number of tasks put into one sprint is defined by the so-called velocity of the team.
  2. Tasks are ordered by priority by the product owner. Priority here is influenced by factors like risk (riskier tasks should be attacked earlier than safe ones), ROI (those tasks that promise to increase ROI most should of course be done and launched first) and a few more. After priorisation, tasks are estimated in order - that way those tasks most important to the product owner are guaranteed to have an estimated complexity defined even if there was not enough time to estimate all items.
  3. Complexity here does not mean "amount of time to implement a feature" - it's more like how much time do we need, how much communication overhead is involved, how complex is the feature. A workable way to come up with reasonably sensible numbers is to chose one base item, assing complexity of one to it and estimate all coming items in terms of "is as complex as the base item", "has double the complexity" - and so on - according to the fibonacci series. Fibonacci is well suited for that task as do not increase linearly - similarly humans are better at estimating small things (be it distances or complexities). As soon as items get too big, estimation also tends to be way off the real number.
  4. To come up with a reasonable estimate of what can be done in any week, I usually just look back to past weeks and use that as an estimate. That technique is close enough to the real number to be a working approach.

To track what got done during the past week, we use a whiteboard as Scrum Board. Putting tasks into the known categories of todo, checked out and done. That way when resetting the board after one week and adding tasks for the following week it is pretty obvious which actions ate up most of the time. The amount of work that goes onto the board is restricted to not be larger than what got accomplished during the past week.

So what goes onto the whiteboard? Basically anything that we cannot track as working hours: The Hadoop Get Together can be found just next to doing the laundry. Writing and sending out the long deferred e-mail is put right next to going out for dinner with potential sponsors for free software courses at university.

Now that weekly time tracking is set-up - is there a way to also come up with a nice longer term measure? Turns out, there are actually three:

First and most obviously the whiteboard itself provides an easy measure: By tracking weekly velocity and plotting that against time it is easy to identify weeks with more or less freetime. As a second source of information a quick look into ones calendar quickly shows how many meetings and conferences one attended over the course of a year. Last but not least it helps to track talks given on a separate webpage.

It helps to look back from time to time: To evaluate the benefit of the respective activities, to not loose track of the tasks accomplished, to prioritise and maybe re-order stuff on the ToDo list. Would be great if you'd share some of your techniques of tracking and tracing time and tasks - either in the comments or as a separate blog post.

Berlin Scrumtisch - open discussion

2010-08-25 21:29
This evening the Berlin Scrumtisch took place in Friedrichshain. More than thirty participants followed Marion's invitation for discussions on Scrum, wine and pizza at Vecchia Trattoria.

As there were several new participants, Felix started out with a very brief summary of the very core concepts of Scrum itself: Most important to know is the basic assumption of Scrum, that is planning ahead of time in a very detailed way is impossible. Defining goals and letting those who do the acutal work take the decisions on how to reach that goal is way easier and more promising. The whole process relies on fast feedback loops enabling developers and business people to run experiments on how to improve their work in a controlled environment.

Scrum comes with three roles: The development team responsible for delivering quality software, the product owner responsible for defining development goals that maximise return on investment and the scrum master as the moderator and facilitator who takes care that the roles and rituals are not broken.

Scrum comes with three plus one rituals: The daily standup (about 15min) used by the development team to get everyone up to date on a daily basis on everyone's status, the Scrum Review and the Scrum Planning. In addition very important each sprint includes a retrospective that serves the purpose of improving the scrum team's processes.

Scrum comes with three artifacts: The sorted product backlog of all user stories, the sprint backlog and the burndown chart showing the team's progress.

However Scrum is just a framework - it tells you more on the goals, but less on exactly how to reach them. It should serve as a basis to adapt one's processes to the project's needs.

In the usual meetup planning phase we collected potential topics for discussions and ranked them by voting on them in the end. The topics proposed were:

  • Applications of Scrum for non-software-development projects.
  • How to convince teams of Scrum?
  • Awareness of the definition of done.
  • How to integrate testers in a team, extended by discussing the values of cross-functional teams.
  • How to be a tech PO.
  • Adding agile/XP to Scrum.
  • How to keep the team on focus.
  • Decision making in self organising teams.
  • Bonus HR in Scrum.

The topics rated highest were on raising the awareness for the definition of done and on decision making in self organising teams.

Definition of done

To introduce the topic, Felix repeated the goal of Scrum teams to deliver potentially quality - ahem - shipable software. ;) The problem of the guest and his co-workers was described as follows: Teams have definitions that are inconsistant not only across teams but also within teams.

Ultimately the goal of the definition of done is to enable teams to produce shipable software. One option to make the team aware of the need for better quality software might be to make them feel the pain their releases cause. It does not help to dictate a company-wide definition of done: It's up to the team to define it. However to learn more on what shipable means the team must be allowed to make mistakes. They will fail - but learn from that failure as soon as they feel and see what gets influenced by their mistakes. As a resulst, they will refine their definition of done.

As the person to make happy in Scrum iterations is the PO, this could mean that the PO simply does not accept features, after all he is the one to define what shipable means. One factor that is a pre-condition for teams to be able to learn is to keep them stable. Learning needs time - teams need to be allowed to evolve. If yesterday's team does not feel the pain their mistakes caused just because the team does not exist anymore or has been reconfigured - how would people be able to learn at all?

Decision making in self organising teams

The person proposing this question has the problem that in his teams some developers turn into leaders dictating the way software gets implemented. Other team members rarely join into that discussion and close to never take decisions. The result are endless discussions w/o real results.

The first idea that came up was for the Scrum Master to act as moderator. Marion came up with the proposal to use well known mediation techniques. She promised to share the links - would be great to have them published on the Scrumtisch Berlin blog as well. Thilo mentioned there are courses on mediation and moderation that can help him play that role.

As for long discussions: Felix mentioned a few typical patterns (or anti-patterns) that tend to lead to developers discussing endlessly:

  • Fear: fear of punishment for taking the wrong decision usually leads developers to avoid decisions altogether. Fix for that would be establishing and open culture that allows for failure and that enables people to learn from failure.
  • Striving for the 100% solution: Developers are not used to incremental thinking and try to solve all problems at once. Fix would be to teach them they get time for refactoring and are thus not punished for adhering to the YAGNI principle.
  • Personal conflicts in teams can lead to the described situation as well and can only be fixed by double-checking the team configuration, potentially changing it.

There is a very good book by Cohen on "Succeeding with agile" that has a whole chapter on what makes a good Scrum Master. Checking these properties against your chosen Scrum master might help as well.

When discussing this topic we soon discovered one problem with the team configuration as-is: Scrum masters used to be system architects or senior software developers - that is, highly respected, influencial people. Maybe simply re-configuring teams might help already.

Thanks to Marion for organising the evening - and thanks to all attendees for your questions and input on discussion topics. Looking forward to the next edition of the Scrumtisch.

Disclaimer: I usually just take notes on an old-fashioned paper-notebook, typing stuff into the blog after the meeting is over. Only reason I do it the same evening is the goal of keeping the list of draft postings as short as possible.

Apache Dinner DUS

2010-08-17 19:10
the evening after FrOSCon - that is on August 22nd 2010 at 7:30p.m. CEST - a combined "FSFE Fellowship meetup/ Apache dinner*" takes place in Tigges in Düsseldorf (Brunnenstraße 1, at Bilker S-Bahnhof). Given it doesn't rain, we'll be sitting outside.

Would be great to meet you there for tasty food, interesting discussions on Apache in general, as well as projects like Lucene, Hadoop or Tomcat in particular. Anyone interested in either the FSFE or Apache is welcome to join us.

One personal request: Somehow, Rainer (Kersten, FSFE) talked me into preparing a talk on what the ASF is all about - would be really great to have more people around share their experience.

See you in Düsseldorf

Scrumtisch August Berlin

2010-08-14 16:26
Just seen it - the next Scrumtisch Berlin has been scheduled for 25th August 2010 at 18:30 Uhr. So far, no official talk has been scheduled, so please expect two topics on Scrum and its application to be selected for discussion according to Marion's agile topic selection algorithm.

Please talk to Marion Eickmann if you would like to attend the next meetup.

Some statistics

2010-08-11 20:03
Various research projects focus on learning more on how open source communities work:
  • What makes people commit themselves to such projects?
  • How much involvement from various companies is there?
  • Do people contribute during working hours or in their spare time?
  • Who are the most active contributors in terms of individuals and in terms of companies?

When asked to fill out surveys, especially in cases where that happens for the n-th time with n being larger than say 5, software developers usually are not very likely to fill out these questionairs. However knowing some of the processes of open source software development it soon becomes obvious there are way more extensive sources for information - albeit not trivial to evaluate and prone to at least some error.

Free software tends to be developed "in the open": Project members with various backgrounds get together to collaborate on a common subject, exchanging ideas, designs and thoughts digitally. Nearly every project with more then two members at least has mailing list archives and some sort of commit log to some version control system. Usually people also have bugtrackers that one can use as a source for information.

If we take the ASF as an example, there is a nice application to create various statistics from svn logs:

The caveats of this analysis are pretty obvious: Commit times are set according to the local of the server, however that may be far off compared to the actual timezone the developer lives in. Even when knowing each developer's timezone there is still some in-accuracy in the estimations as people might cross timezone bounderies when going off for vacation. Still the data available from that source should already provide some input as to when people are contributing, how many projects they work on, how much effort in general goes into each project etc.

Turning the analysis the other way around and looking at mailing list contributions, one might ask whether a company indeed is involved in free software development. One trivial, naive first shot could be to simply look for mailinglist postings that originate from some corporate mail address. Again the raw numbers displayed below have to be normalised. This time company size and fraction of developers vs. non-developers in a company has to be taken into consideration when comparing graphs and numbers to each other.

Yet another caveat are mailinglists that are not archived in the mail archiving service that one may have choosen as the basis for comparison. In addition people may contribute from their employer's machines but not use the corporate mail address (me personally I am one of these outliers, using the apache.org address for anything at some ASF project).





Lucid Imagination








Easily visible even from that trivial 5min analysis however is general trending of involvement in free software projects. In addition those projects are displayed prominently projects that employees are working with and contributing to most actively - it comes as no surprise that for Yahoo! that is Hadoop. In addition if graphs go back in time far enough, one might even see the timeframe of when a company changed its open source strategy (or was founded (see the graph of Lucid), or got acquired (see Sun's graph), or acquired a company with a different stategy (see Oracle's graph) ;) ).

Sort of general advise might be to first use the data that is already out there as a starting point - in contrast to other closed communities free software developers tend to generate a lot of it. And usually it is freely available online. However when doing so, know your data well and be cautious to draw premature conclusions: The effect you are seeing may well be caused by some external factor.

NoSQL summer Berlin - this evening

2010-08-11 06:38
This evening at Volkspark Friedrichshain, Café Schoenbrunn the next NoSQL summer Berlin (organised by Tim Lossen) is meeting to discuss the paper on Amazon's Dynamo "Dynamo: Amazon's Highly Available Key-value Store". The group is planning to meet at 19:30 for some beer and discussions on the publication.