Building a Hadoop Job Jar with Maven

2010-03-11 19:16
Put here as a reminder, so I do not forget about it. There is a really nice tutorial online on Building Hadoop Job with Maven.

Call for presentations - Berlin Buzzwords

2010-03-11 15:09

Call for Presentations Berlin Buzzwords
Berlin Buzzwords 2010 - Search, Store, Scale
7/8 June 2010

This is to announce the opening of the Berlin Buzzwords 2010 call for presentations. Berlin Buzzwords is the first conference on scalable and open search, data processing and data storage in Germany, taking place in Berlin.

The event will comprise presentations on scalable data processing. We invite you to submit talks on the topics:

  • Information retrieval, search - Lucene, Solr, katta or comparable solutions
  • NoSQL - like CouchDB, MongoDB, Jackrabbit, HBase and others
  • Hadoop - Hadoop itself, MapReduce, Cascading or Pig and relatives

Closely related topics not explicitly listed above are welcome. We are looking for presentations on the implementation of the systems themselves, real world applications and case studies.

Important Dates (all dates in GMT +2)

  • Submission deadline: April 17th 2010, 23:59
  • Notification of accepted speakers: May 1st, 2010.
  • Publication of final schedule: May 9th, 2010.
  • Conference: June 7/8. 2010.

High quality, technical submissions are called for, ranging from principles to practice. We are looking for real world use cases, background on the architecture of specific projects and a deep dive into architectures built on top of e.g. Hadoop clusters.

Proposals should be submitted at no later than April 17th, 2010. Acceptance notifications will be sent out on May 1st. Please include your name, bio and email, the title of the talk, a brief abstract in English language. Please indicate whether you want to give a short (30min) or long (45min) presentation and indicate the level of experience with the topic your audience should have (e.g. whether your talk will be suitable for newbies or is targeted for experienced users.)

The presentation format is short: either 30 or 45 minutes including questions. We will be enforcing the schedule rigorously.

If you are interested in sponsoring the event (e.g. we would be happy to provide videos after the event, free drinks for attendees as well as an after-show party), please contact us.

Follow @hadoopberlin on Twitter for updates. News on the conference will be published on our website at

Program Chairs: Isabel Drost, Jan Lehnardt, and Simon Willnauer.

Schedule and further updates on the event will be published on

Slides are available

2010-03-11 00:49
Slides for the last Hadoop Get Together are available online:

Videos will follow as soon as the are ready. Watch this space for further updates.

Apache Hadoop Get Together March 2010

2010-03-11 00:40
Today (or more correctly, yesterday) the March 2010 Hadoop Get Together took place in newthinking store. I arrived rather early to have some time to do some planning for Berlin Buzzwords - got there nearly one hour before the meetup. However it did not take very long until first guests came to the store. So I quickly got my introductory slides in place - Martin from newthinking already had the room setup, camera in place and audio working.

When starting the meetup the room was already packed with some 60 people - we ended up having over 70 people interested in the mix of talks on Hadoop, HBase and Spatial search with Lucene and Solr. Doing the regular "Who are you"-round, we learned that there were people from nurago, Xing, StudiVZ, *lots and lots* of people from Nokia, Zanox, eCircle, and many others.

The meetup was kindly supported by newthinking store (venue for free) and Nokia (sponsored the videos). Steffen Bickel took his chance during the introduction to give a brief overview of Nokia and - guess - explain, that Nokia is a great place to work and yeah - they are hiring!

The first talk was given by Bob Schulze who joined the meetup coming from eCircle in Munich. Given his previous experience with scaling their infrastructure from a regular database/ datawarehouse setup he explained how HBase helped when processing really large amounts of data. Being an e-mail marketing provider, eCircle does have quite a bit of data to process. And yes, eCircle is hiring.

Second talk was by Dragan Milosevic from Zanox on scaling product search and reporting with Hadoop. Just as eCircle, Zanox came from a regular RDMS setup that became too expensive and too complex too scale before switching over to a Hadoop/Lucene stack. He used his chance to make the Lucene developers aware of the fact that there are users who would were actually using Lucene's compression features. Zanox, as well, is looking for people to hire.

Last talk was by Chris Male from JTeam in Amsterdam on the developments in Lucene and Solr to support for spatial search. There are various development routes being followed: Cartesian tiers as well as numeric range searches. He also explained that most of the features are still under heavy development. He finished his talk with a demo on what can be done with spatial search in Lucene/ Solr. You already guessed so, JTeam is hiring as well ;)

After the talks we went to Cafe Aufsturz for beers, drinks and some food. People enjoyed talking to each other exchanging experiences. A Lucene focussed table quickly formed - main topics: Spatial search, Lucene/Solr merge threads, heavy committing, Mike McCandless (is this guy real or just an alter-ego of the Lucene community?).

At some time around 11p.m. the core of the guests (well - the Lucene part of the meetup, that is Simon, Uwe and the guys from JTeam) moved over to a bar close by next to cinema central for some more beer and drinks. At about 1a.m. it finally was time to head home.

I'd like to say thanks: First of all to the speakers. Without you the meetup would not be possible. Second to newthinking and Nokia for their support. And of course to all attendees for having grown the meetup to its current size.

I had a really nice evening with people from the Hadoop, HBase and Lucene community. Special thanks to you guys from JTeam for traveling 6h to Berlin just for a "little", though no longer that tiny, Hadoop meetup. Promise stands, to visit one of your next Lucene meetups in Amsterdam and present Mahout there - however I need some help finding affordable accomodation ;)

Hope to see you all in June at Berlin Buzzwords.

Google Summer of Code starting

2010-03-10 19:10
As published on the Google Open Source blog the application period for mentoring organizations for GSoC starts now. The ASF is already in the process of applying. If you are a student, looking for an interesting project to work on during the coming summer - you might consider participating in GSoC. It does give you are great opportunity to get in touch with successful free software projects, learn how to work in global teams, improve your communication skills and last but not least show and publish your fantastic coding skills.

If you want to learn more on Why you should contribute to open source, the article by Shalin Shekhar Mangar is a great summary of some of the reasons why people work on open source projects.

Learning to Rank Challenge

2010-03-09 19:49
In one of his recent blog posts, Jeff Dalton published an article on currently running machine learning challenges. Especially interesting for those working on search engines and interested in learning new rankings from data should be the Yahoo! Learning to Rank Challenge to be held in conjunction with this year's ICML 2010 in Haifa, Israel. The goal is to show that your algorithm does not only scale on real-world data provided by Yahoo!. Tasks are split in two. The first one focusses on traditional learning to rank procedures, the second one on transfer learning. Tracks are open to participants from industry and research.

A second challenge was published by the machine learning theory blog. The challenge is hosted by Yahoo! as well and deals with Key scientific challenges in statistics and machine learning.

Both programs look pretty interesting - would be great to lots of people from the community participating and comparing their systems.

Early bird registration for Berlin Buzzwords on June 7th/8th open

2010-03-09 18:34
Silently registration was opened in the past days for Berlin Buzzwords - a conference on scaling search, data processing and storage taking place on June 7th/8th in Berlin/ Germany. First 100 tickets will be sold for 250 Euros + tax. Registration is possible at later dates as well, however expect prizes to rise shortly before the conference starts.

If you clicked on it earlier this week and were wondering what those strange German terms were all about: We have put online an English version as well, so language shouldn't be much of a problem anymore.

To avoid any confusion: Conference talks will be in English - no German language skills needed for that. It is perfectly well possible to get around in Berlin w/o speaking German, however knowing a few words as always will make it easier to make friends with people in shops and hotels ;)

Chemnitzer Linuxtage

2010-03-05 12:32
Title: Chemnitzer Linuxtage
Location: Chemnitz
Link out: Click here
Start Date: 2010-03-13
End Date: 2010-03-14

Next week the Chemnither Linuxtage take place in - well - Chemnitz. It is the second largest Linux event after Linuxtag Berlin. However only obvious for speakers and exhibitors: It is one of those events that are known for its fantastic organisation. Nearly no problems, be it WiFi, admission to the exhibitors area, food or any help in general.

I will be at the event again. You can find me at the FSFE booth, telling people what the FSFE is all about and trying to convince them to become fellows (and yes, since last summer, I am a fellow myself and own one of those really cool green crypto cards).

Mahout at Berlin ignite

2010-03-01 22:24
This evening the first Berlin ignite event took place in the "Festsaal" in Berlin X-Berg. Organiser of the event was Matt Biddulph from Nokia Gate 5. We had eleven fantastic talks (ok, to be more precise: At least ten fantastic ones, my own can only be judged by the audience ;) ).

Topics included things you can learn when starting to collect data, themes from (agile) project management, RepRap machines (see also the Rep Rap FOSDEM 2010 talk), bots and robots. The talks finished with a presentation of a Part time scientist's vision of getting to the moon - an article on the project is available on heise newsticker.

The room was filled with more then 120 people resulting in a location packed with interested attendees. It was great seeing the talks on such diverse topics. Hope to have more events of this format here in Berlin. Thanks go to Matt, all speakers and everyone involved in generally making the event a big success.

For those who didn't make it to the event, slides and audio should go online soon. At least the slides on Mahout are available online.

Preliminary schedule online for ignite Berlin

2010-02-23 19:13
Today first talks scheduled for ignite Berlin were published. If you yourself would like to give a talk: Submission seems to still be open.