Part 1: Travelling minds

2010-08-03 06:00
In the last post I promised to share some more information on techniques I came across and found useful under an increasing work load. Instead of taking a close look at my professional calendar I decided to use my private one as an example - first because spare time is even more precious then working hours, simply because there is so few of it and secondly because I am free to publicly scrutinize not only the methods for keeping it in good shape but also the entries in it.

I am planning to split the article in four pieces as follows as keeping all information in one article would lead to a text longer then I could possibly expect to be read from beginning to end:

  1. Part 1: Traveling minds - how to stay focussed in an always-on community.
  2. Part 2: Tracking tasks, or: Where the hack did my time go to last week?
  3. Part 3: A polite way to say no - and why there are times when it doesn't work.
  4. Part 4: Constant evaluation and improvement: Finding sources for feedback.
  5. Part 5: A final word on vacation.

Several years ago, I had no problem with tasks like going out reading a book for hours, working on code for hours, answering mails only from time to time, thinking about one particular problem for days. As the number of projects and tasks grew, these tasks became increasingly hard to accomplish: Writing code, my mind would wander off to the mailing list; when reviewing patches my mind would start actually thinking about that one implementation that was still lingering on my hard disk.

There are a few techniques for getting back to that state of thinking about just one thing at a time. One article I found very insightful was an essay by Paul Graham. He gave a pretty good analysis of thoughts that can bind your attention and draw them away from what should actually be the thing you are thinking about. According to his analysis a pretty reliable way to discover ideas that steal your attention is to observe what thoughts your mind wanders to when you are taking a shower (I would add cycling to work here, basically anything that lets your mind free to dream and think): If it is not in line with what you would like to think about, it might be a good time to think about the need to change.

There are a few ways to force your mind to stay "on-topic". Some very easy ones are explained in a recent blog post on attention span (Thanks to Thilo for the link):

  • Organising your virtual desktops such that applications are sorted according to tasks (one for communication, one for coding project x, another one for working on project y) helps to switch off distraction that would otherwise hide in plain sight. Who wants to work on code if TweetDeck is blinking at you next to your editor? In contrast to the original author I would not go so far to switch off multiple monitors: Its great to have your editor, some terminals, documentation in the browser open all at the same time in one workspace. However I do try to keep everything that has do with communication separate from coding etc.
  • Train to work for longer and longer periods of time on one task and one task only: The world does not fall apart, if people have to wait for an answer to your mail for longer than 30min - at least they'll get used to it. You do not need to take your phone to meetings: If anything is starting to melt down there will be people who know where you are and who will drag you out of the meeting room in no time. Anything else can well wait for another 60min.
  • When working with tabbed browsing: Don't open more tabs then you can easily scan. You won't read those interesting blog post you found four weeks ago anyway. In modern browsers it is possible to detach tabs. That way you can follow the first hint of keeping even the web pages sorted on desktops according to activity: You do not need your time tracking application next to your editor. Having only documentation and testing application open there does help.
  • Keep your environment friendly and supportive. Who has ever shared an office (or a lecture at university back when I was a student) with me knows that close to my desk the probability of finding sweets, cookies, drinks and snacks approaches one. Being hungry when trying to fix a bug does not help, believe me.

One additional trick that helps staying just focussed enough for debugging complex problems is to make use of systematic debugging by Andreas Zeller (also explained in Zen and the Art of Motorcycle Maintenance). The trick is to explicitly track you thoughts on paper: Write down your hypothesis of what causes the problem. Then identify an experiment to test the hypothesis - you should know how to use your debugger, when to use print statements, which unit tests to write and when to simply take a very close look at the code and potentially make it simpler for that. Only when your experiment confirms that you have found the cause of the problem you really have identified what you need to fix.

There are a few other techniques for getting things off of your head that are just there to distract you: If you ever have read the book "Getting things done" or seen the Inbox zero presentations you may already have an idea of what I am hinting at.

By now I have a calendar application that works like a charm: It reminds me of meetings ahead of time, it warns me in case of conflicts, it accepts notes, it has an amazing life span of one year and is always available (provided I do not forget it at home):
- got mine here ;) That's for organising meetings, going to conferences, getting articles done in time and not forgetting about family birthdays.

For week to week planning we tend to use Scrum including a scrum board. However that is not only for planning as anyone using Scrum may have expected already.

For my inbox the rule is to filter any mailing list into its own folder. Second rule is to keep the number of messages in my inbox to something that fits into a window with less than 15 lines: Anything I need for further reference (conference instructions, contacts, addresses that did not yet go into my little blue book, phone numbers not yet stored in my mobile phone) goes into its own folder. Anything that needs a reply is not allowed to stay in the inbox for longer than half a week. For larger projects mail gets sorted into their own project folders. Anything else simply goes to an archive: There are search indexes available, even Linux supports desktop search, search is even integrated in most mail clients. Oh and did I mention that I managed to search for one specific mail for an hour just recently, though it was filed into its own perfectly logical folder - simply because I had forgotten which folder it was?

To get rid of things I have to do "some time in the near future but not now" I keep a list in my notebook - just so my mind knows the note is there for me to review and it knows I don't forget about it. So to some extend my notebook is my personal swap space. One thing I learnt at Google was to not use loose paper for these kinds of notes - a bound book is way better in that it keeps all notes in one place. In addition you do not get into danger of throwing notes away too early or mis-place them.

The only thing missing is a real product backlog that keeps track of larger things to do and projects to accomplish - something like "I really do need to find a weekend to drive these >250km north to the eastbaltic sea (Thanks to Astro for pointing out the typo to me - hey, that means there is at least one guy who actually did read that blog post from beginning to end - wow!) and relax" :)

Series: Getting things done

2010-07-30 07:07
Probably not too unusual for people working on free software mostly (though no longer exclusively) in their spare time, the number of items that appear in my private calendar have increased steadily in the past months and years:

  • Every three months I am organising the Apache Hadoop Get Together in Berlin.
  • I have been asked (and accepted the offer) to publish articles on Hadoop and Lucene in magazines.
  • There are various conferences I attend - either as speaker or simply as participant: FOSDEM, Froscon, Apache Con NA, Devoxx, Chemnitzer Linuxtag - to name just a few.
  • For Berlin Buzzwords I did get quite a bit of time for organisation, still some issues leaked over to what others would call free time.
  • I am mentoring one of Mahout's GSoC students which is a lot of fun.
  • At least I try to spend as much time as possible on the Mahout mailing lists keeping up with what is developed and discussed there.

There are various techniques to cope with increased work load and still find enough time to relax. Some of them involve simply remembering what to do at the right time, some involve prioritization, others deal with measuring and planning what to do. In this tiny series I'll explain the techniques I employ - or at least try to - in the hope of getting your feedback, and comments on how to improve the system. After all, the most important task is to constantly improve ones own processes.


2010-06-23 11:17
For Berlin Buzzwords we concentrated pretty heavily on scalable systems and architectures: We had talks on Hadoop for scaling data analysis; HBase, Cassandra and Hypertable for scaling data storage; Lucene and Solr for scaling search.

A recurring pattern was people telling success stories involving project that either involve large amounts of data or growing user numbers. Of course the whole topic of scalability is extremely interesting for ambitious developers: Who would not be happy to solve internet-scale problems, have petabytes of data at his fingertips or tell others that their "other computer is a data center".

There are however two aspects of scalability that people tend to forget pretty easily: First of, if you are designing a new system from scratch that implements a so far unknown business case - your problem most likely is not scalability. It's way more likely that you have to solve marketing tasks, just getting people to use your cool new application. Only after observing what users actually do and use you have the slightest chance of spotting the real bottlenecks and optimising with clear goals in mind (e.g. reduce database load for user interaction x by 40%).

The second issue people tend to forget about scalability is that the term is about scaling systems - some developers easily mix that up with high performance. The goal is not to be able to deal with high work load, but to build a system that can deal with increasing (or decreasing) work load. Ultimately this means that not only your technology must be scalable: Any architecture can only scale to a certain load. The organisation building the system must be willing to continuously monitor the application they built - and be willing to re-visit architectural decisions if the environment changes.

Jan Lehnardt had a very interesting point in his talk on CouchDB: When talking about scalability, people usually look into the upper right corner of the application benchmark. However to be truely scalable one should also look into the lower left corner: Being scalable should not only mean to be able to scale systems up - but also to be able to scale them down. In the case of CouchDB this means that not only large installations at BBC are possible - but running the application on mobile devices should be possible without problems as well. It's an interesting point in the current "high scalability" hype.

Teaching Free Software Development

2010-06-20 12:55
In Summer last year I was invited to give a presentation on Apache Mahout at TU Berlin. After the talk was over some of the research group members asked me to design and give a course on scalable machine learning with open source software during the winter semester.

The project attracted four to five students - not very many - but then again it is a course people can take voluntarily. During the first semester participants were asked to integrate Mahout to build a system that crawls web pages, assigns them to clusters and makes the content searchable with Lucene. The intention was to get students to publish any patches they have to make at Mahout. In addition the code behind the system was supposed to be published after the project was over.

This setup turned out to be sub-optimal: The participants never grew confident enough to publish not only their ideas and design on the mailinglist but also send in the access data to the SCM system that hosted the project source code.

Some similar setup was run at HPI Potsdam by Christoph Böhm: He let students implement various information retrieval and machine learning algorithms on top of Apache Hadoop. After the course was over he tried to motivate students to publish their code at Apache Mahout. So far I have seen no submissions.

Being aware of these problems next time I setup the course for the summer semester at TU I chose a slightly different model: Having only four students who do not have enough free cycles to work on the project full time I set the goal to implement an HMM - including tests, example and documentation. Being roughly aligned with GSoC I asked students to publish their timeline in JIRA. As soon as coding started I urged them to publish even incremental progress and ask the community for feedback.

Now we do have an open JIRA issue with a patch attached to it. People also got some code review feedback already. Having Berlin Buzzwords in town while the course was still running I used my chance to get students in touch with other Mahout developers. Looks like at least one of them is planning to stay with the project for a little longer. For me it would be a great success if at least one student could be turned into a longer term contributor to the project.

So far it looks like applying the general principle of releasing code early and often helps people do integrate into some project. My own lesson learned from those experiences however is to urge students early on to get in touch and release their code: It was not particularly easy to get them to send e-mails to public mailing lists. However if they had done this just once, feedback usually was very positive - and surprised by how friendly and helpful in the free software community generally are.

Linus Torwalds on the Linux kernel community

2010-06-15 18:10
A few days ago, Linus send a very interesting mail on why he considers C the programming language that is most suitable for the Linux kernel. Despite the language specific arguments, the text contains quite a few insights on how the Linux kernel community works and communicates that might be interesting to non-kernel-hackers as well:

People working for free still doesn't mean that it's fine
to make the work take more effort - people still work for
other compensation, and not feeling excessively
frustrated about the tools (including language) and getting
productive work done is a big issue.

When attending Open Source conferences or contributing to free software projects I have made the exact same observation multiple times: Developers may not get money out of contributing to a free software project (though often they may do) there are other rewards that keep them working on any particular project: Learning from excellent peers usually is one reason. Being able to work on a topic you like at any time that suits you may be another one.

Developing free software is largely different than your usual professional work: People work voluntarily, putting as much effort into the code as is needed for them to be satisfied with the end-result. There may not be any deadlines fixed by contracts, still developers honour release cycles with the goal of providing a reliable product to their end users.

In the end it all boils down to being passionate about what you work on. To be involved in any open source project takes a huge amount of energy - but usually you get more in return than you are even able to invest. However if passion is a pre requisite to working on any free software this also means it is extremely hard to pay developers to work on any free software project: You just cannot buy passion or love for money.

But the thing is, "lines of code" isn't even remotely close
to being a measure of productivity, or even the gating
issue. The gating issue in any large project is pretty much
all about (a) getting the top people and (b) communication.

Can only quote that - totally agree with the analysis. This applies to any software project - free or proprietary. So if you own a software development business: What is your strategy for getting the top people and facilitating communication in your company? What is your measure of productivity?

Getting a Ubuntu Laptop setup for my Mum

2010-05-17 19:15
With DSL contracts getting ever cheaper in recent years in Germany – even outside larger cities – my mom decided to get a faster internet connection (compared to the former 56k modem) including a telephone landline flatrate.

As sitting in the garden while surfing the internet is way cooler than only having a dedicated computer in an office we decided to get a notebook while at it. As both Thilo and myself are very familiar with Linux, the plan was to get a Linux-compatible netbook, install Ubuntu on it, get wireless up and running, pre-configure the necessary applications and hand it over after a short usage introduction.

Well – first idea: Mom is living close to Chemnitz, so we drove to the Media Markt in Chemnitz Center. They had a nice, not too small and not too large Acer netbook. Only question that was open: Does that thing perform well with Linux? Easily solved: We had a bootable USB stick with the latest Ubuntu version with us. We asked one of the shop assistants for permission to boot Linux from the netbook – telling him that we wanted to buy the notebook, only making sure everything works fine. Answer: “No, sorry, that is not possible. There could be a virus on that stick.” Knowing from my favourite Mac shop in Berlin that there are hardware suppliers that allow testing their products, we went out of Media Markt – disappointed, but with the plan to repeat the experiment at various other suppliers in Berlin.

Monday afternoon the following week Thilo went to a MediMax in Berlin. Experience was way different: The assistant was most helpful, offering various machines to try out – unfortunately none of them had an Intel graphics card – that is, none could be run with a free graphics driver.

End of the same week we went to Media Markt in Steglitz: Asking the assistant there for permission to boot linux from our USB stick actually made him happy. As the machine not only matched our target specifications but was even cheaper than the one in Chemnitz and did work well with Ubuntu we finally bought the notebook (Acer Timeline 3810T). Yeah: Finally not only a working machine (with 8 hours of battery time) but also a shop that cares about its custormers.

For two weeks now mom is now happy user of the Ubuntu netbook edition – step by step learning how to write e-mails, chat and use the internet. As usual first thing we tried out was searching for vacation destinations, but also for at least my name. The latter searches seemed to be most interesting – at least at Google, YouTube, flickr ... ;)

Chemnitzer Linuxtage

2010-03-05 12:32
Title: Chemnitzer Linuxtage
Location: Chemnitz
Link out: Click here
Start Date: 2010-03-13
End Date: 2010-03-14

Next week the Chemnither Linuxtage take place in - well - Chemnitz. It is the second largest Linux event after Linuxtag Berlin. However only obvious for speakers and exhibitors: It is one of those events that are known for its fantastic organisation. Nearly no problems, be it WiFi, admission to the exhibitors area, food or any help in general.

I will be at the event again. You can find me at the FSFE booth, telling people what the FSFE is all about and trying to convince them to become fellows (and yes, since last summer, I am a fellow myself and own one of those really cool green crypto cards).

FSFE Happy Valentine

2010-02-14 07:05
Today I got woken up with a friendly hug and roses waiting for me:

I do not really care about presents for sort-of-artificial celebration days like valentines day. However, FSFE had a very nice idea: The proposal was to use valentines day to show your love for free software. The website proposed to e.g. hug a free software developer, to make a gift to a team of free software developers:

I love Free Software!

Happy Valentine!

FOSDEM 2010 - part 3

2010-02-10 21:02
Sunday started in Janson with Andrian Bowyer's talk on RepRap machines, that is devices that can be used as manufacturing devices and are able to replicate themselves. After that I went over to the Mono dev room to listen to Miguel de Icaza on Mono Edge. A great talk on the history of Mono, the way the community interacts with Microsoft, the C# language itself and special features only available in Mono.

After this talk we went over to Janson for Andrew Tanenbaum's talk on Minix. We knew quite a bit of the talk already from Froscon two years ago, however Andrew is an awesome speaker, so it's always fun to catch up on the news on Minix.

The scalability talk started with an introduction to Hadoop by myself and continued with a talk on the facebook infrastructure by David Recordon. According to feedback I got after the talk, laughing with Thilo helped quite a bit to get myself calm. Before the talk I received one very good recommendation of one of the audio guys: Imagine you are giving the talk to one of your best friends - and forget about the microphone. Though I had way more slides than minutes to talk, we had enough time for the Q&A session after the talk. I started the talk by learning more about the audience - however this time not by handing the microphone to those listening (room too large) - I just asked them "have you heard about Hadoop?" - half of the audience. Are you Hadoop users: one quarter maybe. How large are your clusters? - 10 to 100 nodes mostly. Have you heard of Zookeeper? - some, Hive - some more, Pig - a few, Lucene - a lot, Solr - a little less, Mahout - maybe 5, Mahout users: 1.

Turns out the Mahout user in the audience was Olivier: It's so nice to meet people you know are active on the mailing lists for real and have a chat with them. Hope to see you more often on the lists - and meet you face to face again.

I used the chance to announce the Berlin Buzzwords 2010, a two day event on search and scalability buzzwords like cloud computing, Hadoop, Lucene, NoSQL and more. It takes place on June 7th and 8th in the center of Berlin. Follow this blog for further information. Judging from the input I got after the announcement there is quite some need for such a conference in Europe.

The slides of my talk are soon to be available online.

After my talk I could stay in Janson: A talk on the Facebook infrastructure (not only the Hadoop side of things) followed. After that I met Lars George at the NoSQL dev room - unfortunately I did not manage to actually talk to Steven Noels, who organised the room.

The afternoon was reserved for Greg Kroah-Hartman on how to "Write and submit your first Linux Kernel Patch" - my personal conclusion: git is really awesome. I really, really need to find a few spare minutes to learn how to effectively use it.

In the evening we met with Pieter Hintjens for dinner - and to finalize an awesome weekend in Brussels and a great 10th anniversary FOSDEM. A huge Thank You to all volunteers and organisers of FOSDEM - you did a great job this year putting together an awesome schedule, you did a fantastic job making the now pretty huge event (with 306 talks and about 5000 hackers attending) run smoothly. Even the wireless was working from minute one. See you again at FOSDEM 2011.

FOSDEM 2010 - part 2

2010-02-09 21:00
The event itself featured 306 talks - so pretty hard to choose what to watch on two days. This time, not only the main tracks were awesome, but also several dev rooms featured very interesting talks by well known FOSS developers.

Saturday started with a FOSDEM birthday dance done by all attendees. The first keynote speaker Brooks Davis explained his experiences promoting open source methods at a large company. After that Richard Clayton gave an amazing talk on the evil on the internet. He explained not only how phishing works on a technical level but also included an explanation of the economics behind these attacks, explained how the money flow from victims to attackers works.

On the afternoon Bernard Li gave an introduction to the cluster monitoring tool Ganglia. Directly after that Lindsay Holmwood gave an overview of the monitoring and notification tools flapjack and cucumber-nagios.

The evening was filled with the speakers dinner. Thanks for the organisers for providing that. We had a really nice evening together with some of the organisers, Andrew Tanenbaum and Elena Reshetova at our table.