Apache Dinner August Berlin recap

2010-08-09 21:54
This evening yet another Apache Dinner took place in Berlin (this time Schöneberg), location booked by Simon Willnauer. As it was announced less then a week ago (see post below) we were expecting no more then some 7 people ... we ended up being a group of 15 attendees: There was Michi Busch from Twitter together with Tanja, Uwe Schindler from Bremen joined us. With Matt and Josh some of our local Hadoop users from Nokia joined our group. We had Sebastian Schelter from Mahout. In addition there were the usual suspects, that is Jan Lehnardt, Simon Willnauer and Torsten Curdt.

Indian food at Yogi Haus was great and very tasty - though we should introduce a sharing algorithm for the various dishes next time around. Speaking of next time: If you would like to be part of the dinner, subscribe to our Apache Dinner mailing list. Best way to make the location suit your needs is to simply send out the next proposal yourself.

As usual the Lucene guys are the last to leave: Currently they are on their way to X-Berg for further drinks, some food and lots of fun. Looking forward to the pictures you promised, Simon ;)

Update: Images added. Thanks for forwarding them.

Apache Dinner Berlin - August 2010

2010-08-04 14:52
Simon (Willnauer) just sent around the following e-mail. If you have some time left next Monday evening, come join us in Yogihaus: For tasty Indian food, geeky discussions and a generally beautiful evening.

Unlike the other dinner mails this one is not a poll, it's an announcement. Some Apache folks are in town next monday (9th of August) so we decided to have a Apache Dinner with a short term notice. If you plan to come please shoot me a quick heads-up and I count you in!

We will meet at http://www.restaurant-yogihaus-berlin.de/ on Monday 9th of August at 7:30 pm. I will reserve a table for about 14 people (the average size of the last two meetings while gstein and his gang wasn't counted :D)

Looking forward to meet you there on Monday!!


Looking forward to seeing you on Monday evening next week. Please do not forget to give Simon a quick heads-up if you are coming: Would be nice if our estimated number of guests would at least be close to the real number this time (instead of somewhere at 50% ;) ).

In the unlikely event that you can't make it next Monday, please subscribe to our Apache Dinner Mailinglist to recieve further announcements. If you are not living in Berlin but are still interested in dropping in from time to time: Don't worry we do take into account that schedules of people travelling here are tight and organise meetings accordingly.

Update: Corrected year - must have mixed that up with another conference's kick off meeting that takes place today...

Part 1: Travelling minds

2010-08-03 06:00
In the last post I promised to share some more information on techniques I came across and found useful under an increasing work load. Instead of taking a close look at my professional calendar I decided to use my private one as an example - first because spare time is even more precious then working hours, simply because there is so few of it and secondly because I am free to publicly scrutinize not only the methods for keeping it in good shape but also the entries in it.

I am planning to split the article in four pieces as follows as keeping all information in one article would lead to a text longer then I could possibly expect to be read from beginning to end:

  1. Part 1: Traveling minds - how to stay focussed in an always-on community.
  2. Part 2: Tracking tasks, or: Where the hack did my time go to last week?
  3. Part 3: A polite way to say no - and why there are times when it doesn't work.
  4. Part 4: Constant evaluation and improvement: Finding sources for feedback.
  5. Part 5: A final word on vacation.

Several years ago, I had no problem with tasks like going out reading a book for hours, working on code for hours, answering mails only from time to time, thinking about one particular problem for days. As the number of projects and tasks grew, these tasks became increasingly hard to accomplish: Writing code, my mind would wander off to the mailing list; when reviewing patches my mind would start actually thinking about that one implementation that was still lingering on my hard disk.

There are a few techniques for getting back to that state of thinking about just one thing at a time. One article I found very insightful was an essay by Paul Graham. He gave a pretty good analysis of thoughts that can bind your attention and draw them away from what should actually be the thing you are thinking about. According to his analysis a pretty reliable way to discover ideas that steal your attention is to observe what thoughts your mind wanders to when you are taking a shower (I would add cycling to work here, basically anything that lets your mind free to dream and think): If it is not in line with what you would like to think about, it might be a good time to think about the need to change.

There are a few ways to force your mind to stay "on-topic". Some very easy ones are explained in a recent blog post on attention span (Thanks to Thilo for the link):

  • Organising your virtual desktops such that applications are sorted according to tasks (one for communication, one for coding project x, another one for working on project y) helps to switch off distraction that would otherwise hide in plain sight. Who wants to work on code if TweetDeck is blinking at you next to your editor? In contrast to the original author I would not go so far to switch off multiple monitors: Its great to have your editor, some terminals, documentation in the browser open all at the same time in one workspace. However I do try to keep everything that has do with communication separate from coding etc.
  • Train to work for longer and longer periods of time on one task and one task only: The world does not fall apart, if people have to wait for an answer to your mail for longer than 30min - at least they'll get used to it. You do not need to take your phone to meetings: If anything is starting to melt down there will be people who know where you are and who will drag you out of the meeting room in no time. Anything else can well wait for another 60min.
  • When working with tabbed browsing: Don't open more tabs then you can easily scan. You won't read those interesting blog post you found four weeks ago anyway. In modern browsers it is possible to detach tabs. That way you can follow the first hint of keeping even the web pages sorted on desktops according to activity: You do not need your time tracking application next to your editor. Having only documentation and testing application open there does help.
  • Keep your environment friendly and supportive. Who has ever shared an office (or a lecture at university back when I was a student) with me knows that close to my desk the probability of finding sweets, cookies, drinks and snacks approaches one. Being hungry when trying to fix a bug does not help, believe me.

One additional trick that helps staying just focussed enough for debugging complex problems is to make use of systematic debugging by Andreas Zeller (also explained in Zen and the Art of Motorcycle Maintenance). The trick is to explicitly track you thoughts on paper: Write down your hypothesis of what causes the problem. Then identify an experiment to test the hypothesis - you should know how to use your debugger, when to use print statements, which unit tests to write and when to simply take a very close look at the code and potentially make it simpler for that. Only when your experiment confirms that you have found the cause of the problem you really have identified what you need to fix.

There are a few other techniques for getting things off of your head that are just there to distract you: If you ever have read the book "Getting things done" or seen the Inbox zero presentations you may already have an idea of what I am hinting at.

By now I have a calendar application that works like a charm: It reminds me of meetings ahead of time, it warns me in case of conflicts, it accepts notes, it has an amazing life span of one year and is always available (provided I do not forget it at home):
- got mine here ;) That's for organising meetings, going to conferences, getting articles done in time and not forgetting about family birthdays.

For week to week planning we tend to use Scrum including a scrum board. However that is not only for planning as anyone using Scrum may have expected already.

For my inbox the rule is to filter any mailing list into its own folder. Second rule is to keep the number of messages in my inbox to something that fits into a window with less than 15 lines: Anything I need for further reference (conference instructions, contacts, addresses that did not yet go into my little blue book, phone numbers not yet stored in my mobile phone) goes into its own folder. Anything that needs a reply is not allowed to stay in the inbox for longer than half a week. For larger projects mail gets sorted into their own project folders. Anything else simply goes to an archive: There are search indexes available, even Linux supports desktop search, search is even integrated in most mail clients. Oh and did I mention that I managed to search for one specific mail for an hour just recently, though it was filed into its own perfectly logical folder - simply because I had forgotten which folder it was?

To get rid of things I have to do "some time in the near future but not now" I keep a list in my notebook - just so my mind knows the note is there for me to review and it knows I don't forget about it. So to some extend my notebook is my personal swap space. One thing I learnt at Google was to not use loose paper for these kinds of notes - a bound book is way better in that it keeps all notes in one place. In addition you do not get into danger of throwing notes away too early or mis-place them.

The only thing missing is a real product backlog that keeps track of larger things to do and projects to accomplish - something like "I really do need to find a weekend to drive these >250km north to the eastbaltic sea (Thanks to Astro for pointing out the typo to me - hey, that means there is at least one guy who actually did read that blog post from beginning to end - wow!) and relax" :)

Series: Getting things done

2010-07-30 07:07
Probably not too unusual for people working on free software mostly (though no longer exclusively) in their spare time, the number of items that appear in my private calendar have increased steadily in the past months and years:

  • Every three months I am organising the Apache Hadoop Get Together in Berlin.
  • I have been asked (and accepted the offer) to publish articles on Hadoop and Lucene in magazines.
  • There are various conferences I attend - either as speaker or simply as participant: FOSDEM, Froscon, Apache Con NA, Devoxx, Chemnitzer Linuxtag - to name just a few.
  • For Berlin Buzzwords I did get quite a bit of time for organisation, still some issues leaked over to what others would call free time.
  • I am mentoring one of Mahout's GSoC students which is a lot of fun.
  • At least I try to spend as much time as possible on the Mahout mailing lists keeping up with what is developed and discussed there.

There are various techniques to cope with increased work load and still find enough time to relax. Some of them involve simply remembering what to do at the right time, some involve prioritization, others deal with measuring and planning what to do. In this tiny series I'll explain the techniques I employ - or at least try to - in the hope of getting your feedback, and comments on how to improve the system. After all, the most important task is to constantly improve ones own processes.

Apache Hadoop in Debian Squeeze

2010-07-17 12:04
After using Mandrake for quite a while (still blaming my boyfriend Thilo for infecting not only my computer but also myself first with that system, then with the more general idea of Free Software - but that's another story.) after finishing my master's thesis I started using GNU Debian Linux (back then in the version code-named Woody). Since I always had a GNU Debian on my private box as my main operating system - even installed it on my MacBook following the steps in the Debian Wiki.

As I am also an Apache Mahout committer, closely related to the Apache Hadoop project, I always found it kind of sad that there were no Hadoop packages in the official Debian repositories. I tried multiple times to find some time to get into Debian packaging myself, I learned what "debian/rules" is all about and discovered some of the intricacies of packaging Java based software. However I have to admit that I never was able to find enough time to really finish that task.

A few weeks before this year's FOSDEM I learned on the Apache Hadoop as well as on the Debian Java lists that a guy called Thomas Koch was working on solving bug 535861 - ITP to package Hadoop. We met at FOSDEM where I tried to raise some attention in the audience for Thomas' plans (back then he was in need for help with a few last missing pieces). In addition I invited him for Berlin Buzzwords to get in touch with other Hadoop developers and users for further input.

I am really happy that by now Hadoop has made it into the official Debian package repositories - as soon as Debian Squeezeapt-get install [Hadoop component you need]: Debian package search.


If you want to speed up the process of Squeeze being released as stable version: Help fixing the remaining bugs in that distribution. There are various Debian Bug Squashing Parties being organised around the world. Next one in Berlin is on next Monday, the one for Munich is running this weekend. Just got the information that Fefe posted in his blog a link to the Mozilla bug bounty:

The packages are based on the upstream Apache Hadoop distribution, being comparably new they are intended for development machines at the moment. If you are using Debian and want to work with Hadoop - this is a great opportunity to help making the packages more stable by simply using them and reporting your experiences back to the Debian community.

In addition Debian now also provides packages for Zookeeper as well as HBase - though the HBase version is not yet production ready as the HDFS-append patch is still missing.

To follow the general state and progress of these packages feel free to follow the packages pages for Hadoop, HBase, Zookeeper respectively.

Thomas currently plans to work more closely with upstream e.g. to tidy up the chaos in the start-up scripts and other minor glitches. So watch out for further improvements.

In addition I just saw another interesting ITP in the Debian bugtracker: Wishlist: katta. I am sure there are quite a few others as well.

Apache Lunch in Portugal

2010-07-15 13:37
Just read on the Apache community mailing list that inspired by our Apache Dinner Berlin people in Porto are organising an Apache Lunch event. As with the dinner here in Berlin, anyone who is interested in Apache is welcome to join - no need to be a committer or even ASF member ;)

If you are living close to Porto, or always wanted to visit the city - after all it's a very beautiful place, there is a beach close by, there are many tasty restaurants - don't hesitate to get in touch with the organisers:

My xmpp is: fdmanana@gmail.com. Feel free to add me.

People interested in coming, let us known your availability during the 2
first weeks of August.

So, if you are interested in Apache head over to Filipe - I'd love to be there, however my summer vacation ended one week ago. Wish you guys a lot of fun.

Teddy in Portugal

2010-07-08 20:32
During the past two weeks my teddy was on vacation. As destination he chose to fly to Portugal. One day was reserved for a visit to Lisboa, the capital city of the country. He also took a few really nice pictures there:

On his return, he was no longer alone. Seems like he found a cute little portugese girl friend:

In addition he brought the following image. However he promised that he was not in California, but explained that the bridge actually does exist in Lisboa, being constructed by the same company according to the same blue prints that already were used for Golden Gate bridge:

Scrum - prepare your meetings

2010-06-27 15:45
"But Scrum involves so many meetings - we already have meetings like all day and don't get to coding anything." - "However we do need transparency and communication to build great software." - "Actually scheduling and re-prioritising items during scrum planning takes so much time." Does that sound familiar to you?

What if you could fit Sprint Review and Planning I for a team of three people doing three week sprints into one hour? Impossible? Well, not quite so. As always there is a very easy trick: Be prepared.

Before the review re-visit all items you have accomplished. Get the application into a state that makes it easy to demonstrate all new features to your product owner. If you are a team working on internal projects with many internal clients - get one of them to test our the new features early on (as in way before review) and get their input onto the table.

Get a list of all items up on a whiteboard. Then with the PO work you way through these items, either demonstrating them or referring to what the "real client" said about the feature. After that compute the velocity of your past sprint.

Then go over to the list of still to be done items. (You did estimate them in separate estimation meetings already, didn't you? You also got the prioritised by your PO beforehand, didn't you?) According to the computed velocity simply check out what you can do in the upcoming sprint.

After that team and PO are done. Guess what: Working with pen, whiteboard and paper did speed up our process considerably. After all, these are meetings of at least four people: Don't want to waste working hours of four people by not using the fastest and most flexible planning method.

So - who's responsible for getting all this information up? And - if using a digital planning tool, who is supposed to get it back into the tool? Trivial: It's the scrum master's job to provide all help and tools to the team to make it a high performing team. It's way better to spend 2 extra hours preparing and "tidying up" the sprint planning than have four people spend 2 hours (total 6 vs. 8 hours) in a sub-optimal meeting.

Together with estimation meetings for me this means that each sprint one to two days go into organisational work. Still this is very low overhead compared to what the team is able to accomplish in that time.

Using Scrum for software development

2010-06-25 15:31
A few months ago I entered a new team of until then two software developers. Being so small and with a rather busy product owner, until then people had followed the rituals of Scrum only loosely. When starting to work on a new component the three of us decided to change a few thing several weeks ago:

  • We would setup infrastructure to be somewhat similar to what we knew from Apache projects: All issues to be accomplished were to be tracked in our issue tracker. We would have a dev list that mirrors whatever goes to JIRA, a commit list to mirror whatever goes into svn and a user list.
  • We would try to follow Scrum rituals more closely: Dailies were re-introduced, we setup estimation meetings and got a cooking timer to stick with 60min per meeting, review and planning was supposed to be prepared and done with paper and pens on a whiteboard. After each planning we would have a planning II as well as a retrospective to improve our processes.

After the first weeks of following these ideas we did notice several improvements that are going to be described in upcoming blog posts.

Lets start with the first (trivial) change we made: Introducing a commit mailing list. With developers sitting all together in one room, communicating openly and regularly this may seem like a huge overkill. However there are a few changes that brought to how we develop:

People would suddenly start to think about what goes into svn: Being very obviously publicly readable, commit messages became much more readable, explaining what's going on. Changes would be checked-in only if they belong to one logical change set. In return others would review what was checked in sort of automatically - spotting problems or questionable code and configuration very early. Changes that got done were made transparent to other team members very easily.

Of course the information is available anyway. However to be honest - I never really closely check each change set that got committed after issuing an svn up. Getting the stuff pushed to my inbox hugely improved the situation while still keeping a faster development cycle than when working with a review-before-commit model. This is especially important for cases where only smaller adjustments are being made. Where larger refactoring steps were necessary we would still get the code reviewed by our colleagues before checking it in.


2010-06-23 11:17
For Berlin Buzzwords we concentrated pretty heavily on scalable systems and architectures: We had talks on Hadoop for scaling data analysis; HBase, Cassandra and Hypertable for scaling data storage; Lucene and Solr for scaling search.

A recurring pattern was people telling success stories involving project that either involve large amounts of data or growing user numbers. Of course the whole topic of scalability is extremely interesting for ambitious developers: Who would not be happy to solve internet-scale problems, have petabytes of data at his fingertips or tell others that their "other computer is a data center".

There are however two aspects of scalability that people tend to forget pretty easily: First of, if you are designing a new system from scratch that implements a so far unknown business case - your problem most likely is not scalability. It's way more likely that you have to solve marketing tasks, just getting people to use your cool new application. Only after observing what users actually do and use you have the slightest chance of spotting the real bottlenecks and optimising with clear goals in mind (e.g. reduce database load for user interaction x by 40%).

The second issue people tend to forget about scalability is that the term is about scaling systems - some developers easily mix that up with high performance. The goal is not to be able to deal with high work load, but to build a system that can deal with increasing (or decreasing) work load. Ultimately this means that not only your technology must be scalable: Any architecture can only scale to a certain load. The organisation building the system must be willing to continuously monitor the application they built - and be willing to re-visit architectural decisions if the environment changes.

Jan Lehnardt had a very interesting point in his talk on CouchDB: When talking about scalability, people usually look into the upper right corner of the application benchmark. However to be truely scalable one should also look into the lower left corner: Being scalable should not only mean to be able to scale systems up - but also to be able to scale them down. In the case of CouchDB this means that not only large installations at BBC are possible - but running the application on mobile devices should be possible without problems as well. It's an interesting point in the current "high scalability" hype.