Students begin coding for their GSoC projects

2009-04-05 09:54
Title: Students begin coding for their GSoC projects
Link out: Click here
Description: GSoC time.
Date: 2009-05-26

Apache Con Europe 2009 - part 3

2009-03-29 19:56
Friday was the last conference day. I enjoyed the Apache pioneers panel with a brief history of the Apache Software Foundation as well as lots of stories on how people first got in contact with the ASF.

After lunch I went to the testing and cloud session. I enjoyed the talk on continuum and its uses by Wendy Smoak. She gave a basic overview of why one would want a CI system and provided a brief introduction to continuum. After that Carlos Sanchez showed how to use the cloud to automate interface tests with Selenium: The basic idea is to automatically (initiated through maven) start up AMIs on EC2, each configured with another operating system and run Selenium tests against the application under development in these. Really nice system for running automated interface tests.

The final session for me was the talk by Chris Anderson and Jan Lehnardt on CouchDB deployments.

The day ended with the Closing Event and Raffle. Big Thank You to Ross Gardler for including the Berlin Apache Hadoop Get Together in newthinking store in the announcements! Will sent the CfP to concom really soon, as promised. Finally I won one package of caffeinated sweets at the Raffle - does that mean less sleep for me in the coming weeks?

Now I am finally back home and had some time to do a quick writeup. If you are interested in the complete notes, go to http://fotos.isabel-drost.de (default login is published in the login window). Looking forward to the Hadoop User Group UK on 14th of April. If you have not signed up yet - do so now: http://huguk.org

Apache Con Europe 2009 - part 2

2009-03-29 19:42
Thursday morning started with an interesting talk on open source collaboration tools and how they can help resolving some collaboration overhead on commercial software projects. Four goals can be reached with the help of the right tools: Sharing the project vision, tracking the current status of the project, finding places to help the project and documenting the project history as well as the reasons for decisions along the way. The exact tool used is irrelevant as long as it helps to solve the four tasks above.

The second talk was on cloud architectures by Steve Loughran. He explained what reasons there are to go into the cloud, what a typical cloud architecture looks like. Steve described Amazon's offer, mentioned other cloud service providers and highlighted some options for a private cloud. However his main interest is in building a standardised cloud stack. Currently choosing one of the cloud provides means vendor lock-in: Your application uses a special API, your data are stored on special servers. There are quite a few tools necessary for building a cloud stack available at Apache (Hadoop, HBase, CouchDB, Pig, Zookeeper...). The question that remains is how to integrate the various pieces and extend where necessary to arrive at a solution that can compete with AppEngine or Azure?

After lunch I went to the Solr case study by JTeam. Basically one great commercial for Solr. They even brought the happy customer to Apache Con to talk about the experience of Solr from his point of view. Great work, really!

The Lightning Talk session ended the day - with a "Happy birthday to you" from the community.

After having spent the last 4 days from 8a.m. to 12p.m. at Apache Con I really did need some rest on Thursday and went to bed pretty early: At 11p.m. ...

Apache Con Europe 2009 - part 1

2009-03-29 18:41
The past week members, committers and users of Apache software projects gathered in Amsterdam for another Apache Con EU - and to celebrate the 10th birthday of the ASF. One week dedicated to the development and use of Free Software and the Apache Way.

Monday was BarCamp day for me, the first BarCamp I ever attended. Unfortunately not all participants proposed talks. So some of the atmosphere of an unconference was missing. The first talk by Danese Cooper was on "HowTo: Amsterdam Coffee Shops". She explained the ins and outs of going to coffee shops in Amsterdam, gave both legal and practical advise. There was a presentation of the Open Street Map project, several Apache projects. One talk discussed transfering the ideas of Free Software to other parts of life. Ross Gardler started a discussion on how to advocate contributions to Free Software projects in science and education.

Tuesday for me meant having some time for Mahout during the Hackathon. Specifically I looked into enhancing matrices with meta information. In the evening there were quite a few interesting talks at the Lucene Meetup: Jukka gave an overview of Tika, Grant introduced Solr. After Grant's talk some of the participants shared numbers on their Solr installations (number of documents per index, query volumn, machine setup). To me it was extremely interesting to gain some insight into what people actually accomplish with Solr. The final talk was on Apache Droids, a still incubating crawling framework.

The Wednesday tracks were a little unfair: The Hadoop track (videos available online for a small fee) was right in parallel to the Lucene track. The day started with a very interesting keynote by Raghu from Yahoo! on their storage system PNUTS. He went into quite some technical detail. Obviously there is interest in publishing the underlying code under an open source license.

After the Mahout introduction by Grant Ingersoll I changed room to the Hadoop track. Arun Murthy shared his experience on tuning and debugging Hadoop applications. After lunch Olga Natkovich gave an introduction to Pig - a higher language on top of Hadoop that allows for specifications of filter operations, joins and basic control flow of map reduce jobs in just a few lines of Pig Latin code. Tom White gave an overview of what it means to run Hadoop on the EC2 cloud. He compared several options for storing the data to process. Today it is very likely that there will soon be quite a few more providers of cloud services in addition to Amazon.

Allen Wittenauer gave an overview of Hadoop from the operations point of view. Steve Lougran finally covered the topic of running Hadoop on dynamically allocated servers.

The day finished with a pretty interesting BOF on Hadoop. There still are people that do not clearly see the differences of Hadoop based systems to database backed applications. Best way to find out whether the model fits: Set up a trial cluster and do experiment yourself. Noone can tell which solution is best for you except for yourself (and maybe Cloudera setting up the cluster for you :) ).

After that the Mahout/UIMA BOF was scheduled - there were quite a few interesting discussions on what UIMA can be used for and how it integrates with Mahout. One major take home message: We need more examples integrating both. We developers do see the clear connections. But users often do not realize that many Apache projects should be used together to get the biggest value out.

Cloud Camp Berlin

2009-03-23 13:29
Title: Cloud Camp Berlin
Link out: Click here
Date: 2009-04-30

FSFE booth at the Chemnitzer Linux Tage

2009-03-23 08:30
This year for the 11th time the "Linux Tage" were organized at the university of Chemnitz. Each year in March this means two days devoted to the topic of open and free software. It means an event that is very well organized by a pretty professional team of volunteers.

For the third time the FSFE had its booth at the event - this time run by Rainer Kersten, Uwe Zemisch and me. Recurring questions at the booth were

  • "What the hack is FSFE and in which ways do you actually support free software?"
  • "I am already fellow, you keep telling me there are these great fellowship meetups. Do you know whether there is one near my town? How do these events start? How are they organized?"


It was interesting to see that FSFE is one of the few organizations that try to fill the gap between those writing open source software and those actually making decisions that are relevant to the developers but know nothing of writing software whatsoever.

Besides running the booth there was some time left for a few talks. I decided to go to the OpenMP talk. The idea is to develop a highlevel API for marking code passages for parallel execution. It is not designed for parallel programming on clusters but on multi core machines. Somewhat related to the Java concurrency package but far more high level.

The second talk I went to was on personal data protection laws in Germany. One funny piece of information: Even the ministry of justice was sued sucessfully for storing to much information on the visitors of its webpage.

Last talk I went to was on Google Android. To me it looks like a nice mix of completly open source (like Open Moko) and completely closed source. If you need a phone you can use for making phone calls but still want to play with it and be root on the phone (399$ for the dev phone, sim unlocked, only available for registered developers, registration is 25,-$), Android G1 propably is the way to go. The phone is highly integrated with Google applications. The assumption when building it seems to have been, that people are online all the time with that phone.

For coding: Only a Java API is available, no C or C++. SDK is available for Lin/Mac/Win. The emulator does work, only thing it does not reflect is the real speed of the device itself. Each app gets its own VM, Dalvik supports process memory sharing that makes that less expensive. In case of memory shortage apps are killed in order of user impact (empty/precreated, background, service (mp3 player), visible apps, foreground apps). Idea is to kill those apps that are least visible to the user. The programmer needs to take care that apps constantly store state so restarting them gets them up in the same state they were in when killed.

More information online: http://www.htc.com; adoid.git.kernel.org;

All in all: Really nice weekend. Looking forwared to return next year.

Books I found particularly helpful

2009-03-12 18:44
During the last few years I have quite a few books that one could easily file under the category "Hacking books". Some of them were particularly interesting to me and have influenced the way I write code. The following list certainly is not complete at all - but it is a nice starting point.


  • Effective C++ - I have comparably little experience with C++ but this book really helped understand some of the particularities.
  • Effective Java - even though I have been developing in Java since a few years reading and revisiting Effective Java helps understanding and dealing with some of the quirks of the JVM.
  • Mythical Man Month - although classical literature for people dealing with software projects, although very well known, although easy to understand it is scaring to see that the exact same mistakes are still common in today's software projects.
  • Concurrent programming in Java - quick start on concurrent programming patterns - primarily focussed on Java. Fortunately no collection of recipes but thorough background information.
  • Working effectively with legacy code - I really like to have a look into this book from time to time. Shows great ways of untangling bad code, refactoring it and making it testable.
  • XP books by Kent Beck - if you ever had any questions on what XP programming is and how you should implement it: These are the books to read. Don't trust what people call XP in practice as long as they are not willing to refine and improve their "agile processes". Keep on working on what stops you from delivering great code.
  • Why programs fail - a guide to systematic debugging - If you ever had to debug complex programs - and I bet you had - this is the book that explains how to do this systematically. How to even have fun along the way.
  • Zen and the art of motorcycle maintenance - Not particularly on Software Development but the techniques described match stunningly well on software development.
  • Release It! - just about to read that one. But already the first few pages are not only valuable and interesting but also entertaining.
  • Implementation Patterns - forgot that yesterday.
  • Presentation Zen - another one I forgot. Really helped me to make better presentations.


There are still quite a few good books on my list. If you have any recommendations - please leave them in the comments.

There are a few other book lists online in various blogs. Two examples are the ones below:
http://www.codinghorror.com/blog/archives/000020.html
http://www.joelonsoftware.com/navLinks/fog0000000262.html

Scrum Roundtable Berlin

2009-03-09 14:51
Title: Scrum Roundtable Berlin
Location: DiVino Restaurant, Grünberger Str. 69, Friedrichshain
Link out: Click here
Start Time: 18:00
Date: 2009-04-22


The next Scrum Roundtable is scheduled already. Thoralf, will present his speech from the Orlando ScrumGathering


  • Agile Creation of Multi-Product Solutions
  • Motivation for Network Solutions and their Agilility
  • Scaling Single Product Creation
  • Product Solutions using Scrum
  • Customizing Projects
  • Outlook


      Please let Marion know if you are coming to be able to organise the space.

      Please find more information on upcoming events and the organization of the Scrum roundtable at: http://www.agile42.com/cms/blog/categories/scrumtisch/

Erlang User Group - Scala

2009-03-09 12:25
What: Scala Presentation by Stefan Plantikow.
Where: Cockpit of the Box119 http://boxhagener119.de/ (Ring at UPSTREAM)
When: Wednesday, 11.03.2009, 8:00 p.m.

Yesterday the Erlounge, organised by Jan Lehnardt, took place in the Cockpit of Box119 in Berlin. Topic of the evening was an introduction to Scala.

Scala is a functional language that compiles to Java Bytecode and runs on the JVM. It tries to combine the best from two worlds: Object oriented languages and functional programming. So every function is an object and every object is a function.

Some interesting bits of information:

  • Scala is a statically typed language - but you can omit the types most of the times as type inference in the compiler is pretty good.
  • Everything is an object - there is no difference in primitives and objects.
  • There are packages for distributed computing - spawning processes and sending messages is not as fast as in Erlang there is still room for improvement.
  • The developers are currently about to tidy up the syntax and take care of corner cases.
  • It is easy to start with Scala as you can start out with a subset of the language and extend your knowledge as you need.
  • Scala means Scalable language. Scalable in terms of projects and tasks you can accomplish with it.


If you want to see a second nice presentation that is slightly less focussed on comparing Scala to Erlang you might also find this year's FOSEM presentation interesting: http://www.slideshare.net/Odersky/fosdem-2009-1013261 (video should be up soon as well).

Basic statistics of a set of values

2009-03-09 11:21
Just in order to find that when searching for it yet another time:

Problem: You have a set of values (for instance time it took to process various queries). You want a quick overview of how the values are distributed.

Solution: Store the values in a file separated by newline, read the file with R and output summary statistics.

R: times Read 30000 records
R: summary(times[[1]])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   6.00   12.00   13.00   16.75   14.00 8335.00

That's it.