Open Source Expo

2009-10-29 07:38
Title: Open Source Expo
Location: Karlsruhe
Link out: Click here
Description: There will be a booth at Open source expo introducing interested visitors to the Apache projects Lucene and Mahout. Of course we are also happy to answer any questions on the ASF in general.
Start Date: 2009-11-15
End Date: 2009-11-16

Apache Con US - Program up

2009-10-29 07:35
The final program is available for download over at The schedule is packed with interesting talks on Hadoop, Lucene, Tomcat, httpd, web services, osgi. For those less tech-savvy there is a business track explaining how to best use open source software in an entreprise environment. There is also a community track explaining what makes open source projects successful.

Looking forward to seeing you in Oakland.

Lucene 2.9 White Paper

2009-10-28 21:51
Lucid recently published a white paper that explains the changes and improvements that the new 2.9 release incorporates. Interesting for all who are thinking about upgrading to the new lucene version or generally want to know what is going on at Lucene.

NoSQL Berlin Meetup

2009-10-23 13:24
Yesterday evening the NoSQL Berlin Meetup took place in newthinking store, Berlin Mitte. We had planned for some 50 to 70 people. It quickly became clear that the room would be full - at startup I counted about 80 guests interested in NoSQL topics both locally from Berlin but also traveling here from New York.

Some pictures are available on flickr - thanks to @langalex for sending the url to me:

The meetup started with an introduction to basic principles on consistancy and agreement protocols that are the basis of many scalable storage solutions, including Scalaris. Monika Moser explained, why one can have only two of the three goals of consistency, availability and partition tolerance. After that she gave an introduction to Paxos - a scalable, partition tolerant agreement protocol.

In the second talk, Mathias Meyer introduced Redis - a wicket fast key value store that supports strings, lists and sets as values. It is implemented in C, comes with a persistence mechanism. Only problem: All the data stored in Redis needs to fit in memory for this store to work.

After a short break Jan Lehnardt gave an overview of building P2P applications with CouchDB. He showed how CouchDB can be scaled to large deployments with modules that build distribution and sharding on top of CouchDB. But CouchDB can also be scaled down to run on mobile devices. As synchronization is so simple with that DB it is a perfect fit for Ubuntu One - the initiative of Canonical that brings a personal cloud to everyone for sharing and distributing your data.

Martin Scholl gave an overview of Riak - a highly distributed key-value store with support for map-reduce style queries, sharding of data and a rest-Interface.

The last session included a talk by Mathias Stearn on MongoDB - a key-value store that does not come with json formatted documents but uses bson for document encoding. This makes it easy to support for compact and fast object (de-)serialization.

The final talk was given by Prof. Stefan Edlich on object oriented databases.

After the event, speakers and attendees switched over to Cafe Aufsturz for some drinks, beer and food - and of course for further discussions.

Big thanks goes to the sponsors (Versant, Peritor (drinks at newthinking), StudiVZ (videos), Sociomantic (drinks at Aufsturz), Soundcloud (food at Aufsturz). Another big thanks to Jan Lehnardt and Thomas Nicolai for helping me set up this event.

Looking forward to seeing you guys either in Oakland this November or probably next year at the next NoSQL conference in Berlin.

Videos are up

2009-10-22 07:31
As of yesterday the videos of the last Apache Hadoop Get Together Berlin are available online.

Thanks to the speakers for providing insight in their projects and thanks to Cloudera for sponsoring the videos.

The next meetup will be announced soon - three talks have already been proposed. In addition, StudiVZ offered to sponsor video taping of the next Get Together. Looking forward to seeing you in Berlin in December.

Scrumtisch with Mary and Tom Poppendiek

2009-10-12 15:07
Yesterday evening the Scrumtisch Berlin hosted a talk by Mary Poppendiek on Lean Development. Mary started the session with a talk on what lean development is all about and why it goes further than Scrum ever did. Some of the core principles she explained:

The first goal of every lean project should be to strive for customer satisfaction. The low hanging fruit is to do exactly what the customer wanted to have. The second step is to give features he took for granted as well - think performance, think extensibility. The ultimate goal should be to fulfill what the customer never knew he even wanted. Customers don't want software - they want their problems solved, the only way they are aware of to solve their problem is through software. Show them how to really solve their problems: Before the iPhone was invented, no one could have predicted what a great smart phone could look like.

Show technical excellence: Testing is not optional. To be a good developer you should always be able to prove that your code is correct. As proving the correctness of software in the mathematical sense is hard, if not impossible today, the recommendation was to use unit testing and integration testing as a poor man's proof for correctness. Without (automated) testing, any agile project is doomed to develop into a big ball of mud that cannot be easily extended.

The third principle is to deliver reliably, to design the system such that it meets its constraints. Constraints here are not only technical requirements - constraints include budget, time, features, stability and scalability.

The fourth observation was that no development process can fix the problems that Junior-level-only teams experience. Every lean organisation must provide for mentoring. The mentor's job is to climb the same path as the mentee and pull him along - instead of pushing. In contrast to many setups, not only should new hires be mentored, but basically every employee should have a mentor. Only that way learning - which is essential in current project (not only it) environments - is made possible.

A very interesting observation came from the audience: Every single advise given above is well known and appreciated by any reasonably good software engineer. Problem is: Why don't we adopt these rules in practice? The guy asking the question gave the answer himself: "The fish starts smelling from its head" (German proverb). So the goal of lean developers in the end would be to talk to their management: Do short iterations of a few weeks. Deliver early, deliver often, deliver in time. Usually that alone is enough to exceed expectations. Choose the best solution (both from business and technical perspective). Your results should speak for themselves to step by step change the way your organisation works.

Slides are online on the Scrumtisch blog.

katta @ Berlin

2009-10-10 20:46
After finishing the slides for next week's Mahout course at TU Berlin (if you are not subscribed yet: Subscribe now!) I spent half of the day in Tierpark Berlin: Watching ice-bears, taking pictures of tigers. On my way through the park I met those cute little guys:

The plate next to the bawn gave them away as ... kattas - so that is what they look like!

Oh - just in case you were searching for the real distributed lucene katta ... that is available over on Sourceforge and not to be confused with those little animals ;)

Lucene 2.9 @ Heise

2009-10-06 18:13
After last week's Hadoop Get Together heise published an in-depth article on the changes and improvements that come with the latest Lucene 2.9 release.

Thanks to Simon Willnauer for helping me write this article and patiently explaining several new features. Thanks also to Uwe Schindler for kindly proof-reading the article before it was sent out to Heise.

Scrum Day Düsseldorf

2009-10-06 05:24
On 01. to 02. December the Scrum Day is going to take place in Düsseldorf.

If you are working in a non-Scrum company and would like to use agile methods both for development and management, I would like to recommend going to the talk by Thilo Fromm: Scrum in a waterfall. He explains how he transformed his project to an agile way in a waterfall environment.

Getting Hadoop trunk up and running from source

2009-10-04 20:18
Having told Thilo about the possibility to write Hadoop jobs in Python with Dumbo, we spent some time getting Dumbo 0.21 up and running over the past weekend. The first option the wiki proposes is to take a pre-0.21 release and patch that to work with the current Dumbo release. The second option described takes the not-yet-released version of Hadoop that can be used w/o any patches.

We decided to follow the latter suggestion. After the latest split of the project, we downloaded common, hdfs and mapreduce. Building each project was easy - assuming that ant, Sun JDK 6 (for Hadoop), Forrest (for the documentation pages) and Sun JDK 5 (for forrest) is installed.

Deviating from the documentation, the distributed filesystem as well as map reduce are now started from separate scripts ( instead of These scripts are located in the common project. In addition the variables HADOOP_HDFS_HOME and HADOOP_MAPRED_HOME must be set to point to respective projects for cluster setup to work. Other than that the setup currently is identical to the previous version.