GSoC - one day to go for your application

2010-04-08 14:37
If you are a student interested in participating in Google Summer of Code: Registration closes tomorrow (as in "April 9, 19:00 UTC"). You hopefully published and discussed your proposal at your favourite project already so you have a clear plan of where to go and which milestones to achieve in summer.

If you are interested in Apache Mahout: Yes, as last years, we are again looking for students willing to work on awesome student projects this summer. Several core Mahout developers have signed up as mentors for GSoC. With Robin one of our former GSoC students now has turned into a mentor: It's always amazing to watch students stick with the project and continue contributing valuable input.

So in case you would love to learn more on machine learning, train your software development skills and work with great people on your favourite problem, do not forget to submit your project proposal until tomorrow.

Coaching self-organising teams

2010-03-30 21:58
Today, the Scrumtisch organised by Marion Eickmann from Agile 42 met in Berlin Friedrichshain. Though no talk was scheduled for this evening the room was packed with guests from various companies and backgrounds interested in participating in discussions on Scrum.

As usual we started collecting topics (timeboxed to five minutes). The list was rather short, however it contained several interesting pieces:

  • (6) Management buy-in
  • (6+) CSP - Certified Scrum Professional - what changes compared to the practitioner?
  • (4) Roles of Management in Scrum - how do they change?
  • (13) Coaching self-organising teams - team buy in.


Team buy-in


As prioritised by the participants the first topic discussed was on coaching self organising teams - with a heavy focus on team buy-in. The problem described dealt with transforming water fall teams that are used to receiving their work items into self organising teams that voluntarily accept responsibility for the whole project instead of just their own little work package.

The definition of self organising here really is about teams, that have no (and need no) dedicated team leader. On the contrary: leadership is automatically transferred to the person who - based on his skills and experiences - is best suited for the user story that is being implemented at any given time.

The problem the question targets is about teams, that really are not self organising, where developers do not take responsibility for the whole project, but just for their little pieces: They have their gardens - with fences around that protect them from others but also protect themselves from looking into other pieces of the project. Even worse - they tend to take these fences with them as soon as work items change.

Several ways to mitigate the problem were discussed:

  • Teams should work in a collaborative environment, should have clear tasks and priorities, whould get some pressure from the outside to get things done.
  • Some teams need to learn what working in a team - together - really means. It may sound trivial, but what about solving problems together: Spending one day climbing hills?
  • Committments should not happen on tasks (which by definition are well defined and small) but rather on Features/ user stories. Task breakdown should happen after the committment.
  • There are patterns to break user stories that are too large into multiple user stories. (Marion: Would be great, if I could add a link here ;) )
  • Teams need to be coached - not only the scrum master should get education, but the complete team. There are people interested in management that tend to read up on the topic after working hours - however these are rather rare...
  • Teams must be empowered - they must be responsible for the whole project and for the user stories they commit to. In return they must get what the need to get their tasks done.
  • Newly formed teams inexperienced with Scrum have to get the chance to make mistakes - to fail - and to learn from hat.


A great way to explain Scrum really is as a two-fold process: First half is about getting a product done, reviewing quality by the end of each sprint during the review. Second half is about improving the process to get the product done. Meeting to review the process quality is called retrospective.

Management buy-in



The second topic discussed was on the role of management in scrum - and how to convince management of Scrum. To some extend, Scrum means loosing power and control for management. Instead of micro-manageing people it's suddenly about communicating your vision and scope. To get there, it helps to view lean management as the result of a long transformation:

  • First there is hierarchical management - with the manager at the top and employees underneath.
  • Second there is shared management - with the manager sitting between his employees enabling communication.
  • Third there is collaborative management - here the manager really is part of the team.
  • Fourth comes empowering management - this time the manager is only responsible for defining goals.
  • Last but not least there is lean management - where managers are merely coordinating and communicating the vision of a project.


To establish a more agile management form, there are several tasks to keep in mind: First and foremost, do talk to each other. Explain your manager what you are doing and why you are working in pairs, for instance. Being a manager, do not be afraid to ask questions - understanding what your developers do, helps you trust their work. Scrum is established, however there needs to be a clear communication of what managers loose - and what they win instead.

Scaling can only be done via delegation - however people need to learn how to delegate tasks. In technology we are used to learning new stuff every few years. In management this improvement cycle is not yet very common. However especially in IT it should be.

Being able to sell Scrum to customers is yet another problem: You need good marketing to sell Scrum to your customers. "Money for nothing change for free" is a nice to read on formulating agile contracts. Keep in mind, that the only way to really win all benefits is by doing all of Scrum - cherry picking may work to some extend, however you won't get the full benefit from it. In most cases it works worse than traditionally managed projects.

After two very interesting and lively discussions moderated by Andrea Tomasini we finally had pizza, pasta and drinks - taking some of the topics offline.

Looking forward to seeing you in F-Hain for the next Scrumtisch in April.

Some pictures

2010-03-25 11:00
Uwe and Simon were so kind to take some pictures of the last Hadoop Get Together in Berlin:

Image Hadoop Get Together Berlin

Image Hadoop Get Together Berlin

Image Hadoop Get Together Berlin

Image Hadoop Get Together Berlin

Image Hadoop Get Together Berlin


Thanks for the pictures.

Bob Schulze on Tips and patterns with HBase

2010-03-24 03:41
At the last Hadoop Get Together in Berlin Bob Schulze from eCircle in Munich gave a presentation on “Tips and patterns with HBase”. The talk has been video recorded. The result is now available online:

HBase Bob Schulze from Isabel Drost on Vimeo.



Feel free to share and distribute the video. Thanks to Bob for an awesome talk on eCircle’s usage of HBase - and on providing some background information on how HBase was applied to solve your problems.

Another thanks to Nokia for sponsoring the video taping - and to newthinking for providing the location for free.

Looking forward to Berlin Buzzwords in June. Early registration is open already. Several great talk proposals have been submitted already. If you are a Hadoop Get Together visitor (or even speaker) and would like to have a community ticket, please contact me.

Dragan Milosevic on Product Search and Reporting with Hadoop

2010-03-19 20:30
At the last Hadoop Get Together in Berlin Dragan Milosevic from zanox in Berlin gave a presentation on "Product Search and Reporting powered by Hadoop". The talk has been video recorded. The result is now available online:

Hadoop Dragan Milosevic from Isabel Drost on Vimeo.



Feel free to share and distribute the video. Thanks to Dragan for a fantastic talk on Zanox' usage of Hadoop - and on providing some background information on why and how you introduced Hadoop into your systems.

Another thanks to Nokia for sponsoring the video taping - and to newthinking for providing the location for free.

One more video to go. It will be available early next week.

Apache Mahout 0.3 released

2010-03-18 15:22
This week, Apache Mahout 0.3 was released. First of all thanks to all committers and contributors who made that possible: Thanks for all your hard work on making the code even faster and integrating even more algorithms.

To the highlights:
  • New: math and collections modules based on the high performance Colt library
  • Faster Frequent Pattern Growth(FPGrowth) using FP-bonsai pruning
  • Parallel Dirichlet process clustering (model-based clustering algorithm)
  • Parallel co-occurrence based recommender
  • Parallel text document to vector conversion using LLR based ngram generation
  • Parallel Lanczos SVD(Singular Value Decomposition) solver
  • Shell scripts for easier running of algorithms, utilities and examples


      ... and much much more: code cleanup, many bug fixes and performance improvements. Check out the new release and watch for further news on Apache Mahout to come in the next days and weeks.


      Details on what's included can be found in the release notes.

      Downloads are available from the Apache Mirrors

Seminar on scaling learning at DIMA TU Berlin

2010-03-17 21:10
Last Thursday the seminar on scaling learning problems took place at DIMA at TU Berlin. We had five students give talks.

The talks started with an introduction to map reduce. Oleg Mayevskiy first explained the basic concept, than gave an overview of the parallelization architecture and finally showed how jobs can be formulated as map reduce jobs.

His paper as well as his slides are available online.

Second was Daniel Georg - he was working on the rather broad topic of NoSQL databases. Being too fuzzy to be covered in one 20min talk, Daniel focussed on distributed solutions - namely Bigtable/HBase and Yahoo! PNUTS.

Daniel's paper as well as the slides are available online as well.

Third was Dirk Dieter Flamming on duplicate detection. He concentrated on algorithms for near duplicate detection needed when building information retrieval systems that work with real world documents: The web is full of copies, mirrors, near duplicates and documents made of partial copies. The important task is to identify near duplicates to not only reduce the data store but to potentially be able to track original authorship over time.

Again, paper and slides are available online.

After a short break, Qiuyan Xu presented ways to learn ranking functions from explicit as well as implicit user feedback. Any interaction with search engines provides valuable feedback about the quality of the current ranking function. Watching users - and learning from their clicks - can help to improve future ranking functions.

A very detailedpaper as well as slides are available for download.

Last talk was be Robert Kubiak on topic detection and tracking. The talk presented methods for identifying and tracking upcoming topics e.g. in news streams or blog postings. Given the amount of new information published digitally each day, these systems can help following interesting news topics or by sending notifications on new, upcoming topics.

Paper and slides are available online.

If you are a student in Berlin interested in scalable machine learning: The next course IMPRO2 has been setup. As last year the goal is to not only improve your skills in writing code but also to interact with the community and if appropriate to contribute back the work created during the course.

Chris Male on spatial search with Lucene

2010-03-16 20:42
Last week the March 2010 Hadoop Get Together took place in Berlin. Last speaker was Chris Male on spatial search with Lucene and Solr. The video is now available online:

Lucene Chris Male from Isabel Drost on Vimeo.



Feel free to share and distribute the video to anyone who might be interested. Thank you Chris, for traveling over from Amsterdam for an awesome talk on spatial search.

If you want to learn more on what people over at Lucene and Solr are currently working one, head over to Berlin Buzzwords - a conference on scalable search, storage and data analysis. If you yourself have interesting projects - feel free to submit a talk.

Thanks to Nokia for sponsoring the video taping - and again as always thanks to newthinking for providing the location for free.

Definition of a Blogger

2010-03-12 19:17
While at lunch yesterday the topic of what Bloggers do, how they earn money and most important of all - what the hack a blogger really is - came up. Well, some criteria those who went to a restaurant nearby came up with the following criteria:


  • Blog is read by more than 5 people. (Well, in my opinion a very low barrier, really.)
  • Bloggers tend to get invited to give talks at conferences. (Yeah, well, not only people with blogs get those invitations?)
  • Over time bloggers tend to get contracts, do consultancy and the like to earn money. (Hmm, yeah, blogs do help to get visibility...)
  • Bloggers tend to be involved with traditional media people. (Phew - finally something that disqualifies myself as a blogger. Though, come to think of it - no having published one or two articles does not count. Period.)
  • They are those people you tend to see in cafes with Mac books surfing the web. (Oh, well, who hasn't done that once in a while?)


Judging from that very unscientific case-study even though taking into account the very informal nature, the result still appeared to be very scary to me: I had fought the temptation of actually creating a blog for years until publishing content for the Hadoop Get Together the very old-fashioned way (vim + scp) became too much of a burden. Now I have to realise that against my own judgement that I do tend to use this blog not exactly particularly seldom. Still trying to avoid to become one of these funny new media types and remain a typical free software developer :)

Third Apache Dinner Berlin

2010-03-11 22:04
Today the third Dinner for Apache committers and friends took place in Berlin. We met in Schöneberg at Marcello Berlin for pizza, pasta, wine, beer ... and lots of discussions.

It always surprises me to see how many Apache related people there are in Berlin. This time we had Peter from Tomcat, Daniel and Simon with Vera from Lucene, Eric from http components, four guys from the svn project (Welcome again at the ASF), Oswald and myself and Thilo.

We scheduled the meetup comparably early - at about 6:30p.m. - giving the parents among us the chance to attend with their children: Looks like some projects recruit their new project members pretty early ;)

Next time we will meet some time before or after Apache Retreat in April. Then again organised (as in "set a date", "reserve a table at your favourite restaurant", "call your friends") by another Apache committer in Berlin. If you would like to join us, or are a committer yourself interested in finding out about other people in town with the same affiliation, do not hesitate to contact me: I'll make sure you are included in the next vote on the dinner date.

PS: Why on earth do user meetups in Berlin always have the tendancy of growing and growing and growing? ;)