Music in Berlin early June

2012-04-18 18:20
A little bit of inspiration on what to do the weekend before and after Buzzwords in Berlin:

With just a tiny bit of luck there is no need to pre-book your tickets - in most cases there are several seats left even an hour before the official starting time. Pre-ordering tickets does have an advantage though when it comes to prizing. One easy way to get your ticket it to book via

If you happen to be younger than thirty consider buying yourself a Classic Card - it costs 15 Euros but allows you access to several locations for 8 Euros only (no pre-booking, tickets can be purchased only an hour before the official start).

Berlin Buzzwords scheduling - behind the scenes

2012-04-17 21:23
Since roughly a week the Berlin Buzzwords schedule is available online. Tickets are still available at the regular rate - make sure to book your ticket now - you've got another three weeks to purchase tickets at the regular rate, last minute rate will up the prize by another 100 Euros starting May 20th.

I thought it might be interesting to share some background on how Berlin Buzzwords scheduling worked out this year. We changed it quite a bit - adding more people to the conference committee, upping the acceptance rate while at the same time reducing speaking time for quite a few talks. This is to share some background information on some of the reasons and provide some detail on how rating was done.

Let me first state some constraints:

  • We are hosting the conference in a venue where we can have 3 tracks at most - there aren't any other large rooms. We don't want to do another round of well- or rather not-so-well-informed random guessing of which talks will be un-popular stashing them in the small room. Switching schedule during the conference itself really isn't particularly professional nor is it very simple to do when you have to move about 200 people around to have them go to a different room than what the printed schedule says.
  • We are trying to keep the prize for the conference as low as possible to be able to attract the average developer who is not able to pay some 1.5k Euros to go to a conference. We are tech focused, no business involved - our attendees don't have big budgets for travelling to expensive conferences. With current attendee numbers for each day every attendee has to pay roughly 50% of the current regular ticket prize to make the budget work out. That means two things: a) We need all of you to pay for all days to make the budget work. b) If you would like to add another conference day because talks are so interesting, add another 50% of the current ticket prize and decide whether you'd be willing to pay that extra money. c) Increasing the number of tracks obviously means increasing the ticket prize which we would rather avoid.
  • Berlin Buzzwords was established as an event for professionals - quality of talks is high, attendees joining the conference know what they are talking about, we are happy to have students as well (did you notice there's a student ticket?) However that focus means that we are different from pure-open-source-community events. If you think there is too few coverage on scalability topics at existing community-only events please talk to them to increase that coverage or lead the effort of establishing such an event yourself - that isn't easy, but neither is it impossible. You could get started by hosting one of our meetups/workshops/hackathons - or alternatively run e.g. one of FOSDEM's DevRooms.
  • Buzzwords is organised by a team of several people. On the one hand there are volunteers (as in people not making a profit from the conference, working on it during working hours donated by their employer at best - Thank You Nokia**! Thank You Searchworkings!). They are familiar with what's going on in the search/store/scale space - you can find them on the program committee page. All administrative work is being done by newthinking communications - they have people very dedicated to what they are doing (there's even one girl who joined a Ruby-On-Rails getting started course last weekend to learn more on what Buzzwords people are working on*) - their main focus is that the whole conference runs as smoothly as possible.

Some of the assumptions above mean that we have to limit the number of talks we accept. Acceptance rate of last year was roughly 30%. Doing that again this year would have meant sending out decline mails to quite a few vital developers - many of them committers on the project they were talking about. That's not because the talks were bad or anything, it's just that there were way too many good talks. So we did an experiment this year: We upped that acceptance rate to 50% - but in turn had to reduce the length of many of the talks that were submitted as 40min versions. The result was that in order to fit more talks into the same space and time we had to shorten quite a few submissions. I did a bit of math this morning, of those reduced to 20min we would have had to reject 70% had we gone with a different schedule format w/o shortening submissions.

Talks selection was done according to a very simple algorithm:

Each talk was reviewed by at least three members of our program committee. Talk to reviewer assignment was done according to a pseudo random number generator - more precisely this one. Reviewers assigned scores ranging from 5 (want to have and am going to fight for it) to 1 (don't want to see and am going to fight against). After looking at the schedule constraints we decided to accept n talks in total, x of which would be 40min, y of which would be 30 and z of which would be 20.

We sorted all talks by mean score and selected the top n for acceptance. Of those we took the first x/3 tagged as search, x/3 tagged as scale and x/3 from store to be accepted as 40 min talks. Same was done for the 30 and for the 20min slots. A mixture sort, grep, awk, head, and cut was quite helpful here and gave us n - 2 talks accepted. In our list of scores the following 5 talks had equal score, so we chose 2 of those at (pseudo-) random. Finally acceptance notification were sent out (Thanks to the Python mail support - that made things easier!). We asked speakers to confirm that they would still be available. Most got back right away, about 12 needed another nag mail or sms a week later to actually confirm.

Scheduling itself was done in a purely analog way: Take a pen, write all n talks on little pieces of paper, add information on track and length. After that those pieces of paper were arranged into the pre-defined schedule grid on a kitchen table: Re-arranging paper is just so much faster than anything you can do digitally - if only it wasn't for the creation of post-it notes beforehand ;)

Finally the schedule went out earlier this week - together with an appropriate press release, tweet etc. Again Buzzwords is a two day only conference. Most likely we won't grow the main conference beyond that any time soon. However in effect you yourself can extend that conference to any length you want. We have asked local companies to provide us with meeting space for at least 20 people each for free. We have several community members organise workshops, meetups, hackathons, code-retreats and barcamps in these areas already. If you think your topic is not covered well enough at the main conference, you'd like to learn more on a particular topic - please talk to us on how to organise one of those meetups yourself. You don't need to talk there if you don't want to - all you need to do is get an interesting schedule together that draws people to your meetup. Also if you think your talk should have been accepted - talk to us to get a meetup going on your topic and related themes to get them covered.

The main goal of Berlin Buzzwords is to involve you. We are very open to any ideas on how to collaborate or grow the conference. We do have several partner events throughout Europe this year. We offer companies the option to co-located and co-promote their trainings after Buzzwords. We offer community members the option to co-locate and co-promote their meetup with the conference. However we do need your time and dedication to make this work. Or to use a phrase that is well-known at least in the Apache world: Patches welcome!

* Her conclusion: Even w/o prior coding knowledge the course was easy enough to follow and at least made clear to her the difference between frontend and backend work. Observation: Buzzwords is very clearly backend. :)

** In particular Hannes Kruppa and the whole search recommendations team!

Clojure Berlin - March 2012

2012-03-07 22:37
In today's Clojure meetup Stefan Hübner gave an introduction to Cascalog - a Clojure library based on Cascading for large scale data processing on Apache Hadoop without hassle.

After a brief overview of what he is using the tool for to do log processing at his day job for Stefan went into some more detail on why he chose Cascalog over other project that provide abstraction layers on top of Hadoop's plain map/reduce library: Both Pig and Hive provide easy to learn SQL-like languages to quickly write analysis jobs. The major disadvantage however comes when in need for domain specific operators - in particular when these turn out to be needed just once: Developers end up switching back and forth between e.g. Pig Latin and Java code to accomplish their analysis need. These kinds of one-off analysis tasks are exactly where Cascalog shines: No need to leave the Clojure context, just program your map/reduce jobs on a very high level (Cascalog itself is quite similar to datalog in syntax which makes it easy to read and simple to forget about all the nitty-gritty details of writing map/reduce jobs).

Writing a join to compute persons' age and gender from a trivial data model is as simple as typing:

;; Persons' age and gender
(? [?person ?age ?gender]
(age ?person ?age)
(gender ?person ?gender)

Multiple sorts of input generators are implemented already: Reading text files, using files in HDFS as input are both common use cases. Of course it is possible to provide your own implementation for that as well to integrate any type of data input in addition to what is available already.

In my view Cascalog combines the speed of development that was brought by Pig and Hive with the flexibility of being able to seemlessly switch to a powerful programming language for anything custom. If you yourself have been using or even contributing to either Cascalog or Cascading: I'd love to see your submission to Berlin Buzzwords - remember, the submission deadline is this week on Sunday *MEZ*.

Visiting Berlin Buzzwords - where to go for drinks and food

2012-03-07 19:39
There are literally hundreds of bars and restaurants in easy walking distance to the conference venue. And if that is now enough for you, hop on U-Bahn and head east to either Kreuzberg or Friedrichshain to find more. For inspiration check out Tip Berlin - they have a decent, reliable restaurant list.

For quick orientation: Berlin is no one city center but many districts that all have their own look and feel to them. Those most interesting for eating and drinking:

  • Schöneberg is a bit more calm, well suited for eating out until late evening. The two areas that are most interesting are around Akazien-/Golzstr (head north from Hauptstraße up until Nollendorfplatz), Crellestraße, as well as the area around Bayrischer Platz.
  • Friedrichshain is the area to go for drinks in the evening and to see the young, urban Berlin. Get lost in the famous "Simon-Dach" quarter ("Simon-Dach-Kiez" as we say in Berlin) with its cobble stone streets, wide sidewalks, bars, restaurants and cool little shops. If the weather is as nice as it has been on the weekend, it might be worth walking or cycling a little farther to Holzmarktstraße. Between the streets "An der Schillingbrücke" and "Michaeliskirchstr." (see there are a few really nice outdoor beach bars right on the banks of the River Spree.
  • Kreuzberg comes in at least two flavours: For coffee and food head over to Bergmannstraße, for drinks at night go see Oranienstraße, for young and vibrant head over to Wrangelstrasse (do not miss Heinz Minki, Freischwimmer and Club der Visionäre), for a relaxed "down by the river" evening head over to Maybachufer (do not miss Van Loon, also check out Bethanien close by).
  • Prenzlauer Berg - young, family friendly, slowly being turned into a German Kleinstadt ;)
  • Mitte - a bit more fancy, gentrified, great if you love culture, museums, ballet, concerts. Remember to explore the city by boat. If you are hungry head over to Linienstrasse and explore the little streets around. There is tasty cheese fondue available at Nolas am Weinberg. Go dance at Clärchens Ballhaus, get a coffee and code while drinking at web2.0 cafe Sankt Oberholz.

Two special recommendations for breakfast:

On the weekend before the conference days are best started with a long and tasty brunch. My personal recommendation if you love tea is to head down to TTT - apart from serving best tea in town you can also get really tasty food there. And best of all, buy tea, tea cups and pots. I tend to take keynote speakers to that place - so far none has complained ;)

Another option is to start your day on top of Bundestag - enjoy the view of the city, take an audio, tour, eat breakfast in the Käfer Restaurant and maybe add a brief lecture on German legislation afterwards. Make sure to book about a month in advance!

For burgers there is no better place than Burgermeister in Kreuzberg. Best Falafel is on sale at Habibi. Judging on where to get the best ice cream actually is a bit harder: Aldemir is the location in Kreuzberg, Pinguin Club is the location in Schöneberg (Inka Eis beats that only if you are more for unusual types of ice cream), if you are in Mitte close to Brandenburger Tor consider visiting Der Eisladen - lots of different types and really tasty.

When it comes to cocktails there are various locations - large and small that people tend to frequent. Some places to start and feel welcome: Salut, Green Door and Stagger Lee.

Walking through Berlin

2012-03-06 18:27
Ever made the mistake of booking a flight to a city and trying to decide on what to do only after you arrived? That type of planning does work for Berlin - though you may end up with quite a different schedule than originally intended.

The only thing that needs a bit of planning ahead (about a month) is visiting the Bundestag - fast way to discover it is to just go up to it's dome. You can book a table at the restaurant up there if you want to have breakfast above Berlin. In addition the visitor service offers various presentations for free that can be booked from their web page.

Some hints in addition to visiting a tourist information after your arrival:

When I have guests I usually recommend to either buy a day (or week) BVG ticket - you can use public transport as often as you like with these tickets. That includes S-Bahn, tram, busses, tube and ferries (but not the tourist roundtrip boats with moderation). If you know you'll be going to several museums, a Welcome ticket might be worth it's prize. Alternatively just get a bike - unless you want to reach destinations outside the s-bahn-ring or want to visit in winter (don't) all distances should be easy to do by bike. To plan your trips use - they know road conditions to e.g. let you exclude larger streets or prefer green routes.

Your best bet to see most of the attractions for less than five Euro is to take the regular bus line 100 from the Bhf. Zoo train station down to Alexanderplatz and line 200 back. Though no audio guide is known to me there should be guides available for sale in local tourist information offices.

For guide books: Lonely Planet is a good start. If you speak German the city box might serve you well. It contains 30 cards with proposed walking tours including brief explanations. Also the book "Die schönsten Berliner Stadtspaziergänge" has been great to discover areas that are less known.

The city has two bi-weekly magazines that feature lists of concerts (both modern and classical), exhibitions, markets and more: For one there is Zitty, the other one Tip Berlin. Both are quite good, which one to prefer depends on personal taste. In addition both publish restaurant guides, books on where to go shopping, special issues on where to go and what to do. In addition their online restaurant reviews are quite decent.

Two final hints: If you happen to know locals (or anyone who moved their a while ago) - make sure to ask them for recommendations. Also, try to stay at one of the many B&B locations - in general you host will know several local recommendations.

Berlin Hadoop Get Together - videos are up

2012-03-02 20:08

Apache Hadoop Get Together - February 2012

2012-02-23 00:14
Today the first Hadoop Get Together Berlin 2012 took place - David got the event hosted by and at Axel Springer who kindly also paid for the (soon to be published) videos. Thanks also to the unbelievable Machine company for the tasty buffet after the meetup. Another thanks to Open Source Press for donating three of their Hadoop books.

Today's selection was quite diverse: The event started with a presentation by Markus Andrezak who gave an overview of Kanban and how it helped him change the development workflow over at eBay/mobile. Being well suited for environments that require flexibility Kanban is well suited to decrease risk associated with any single release by bringing the number of features released down to an absolute minimum. At Mobile his team got release cycles down to once a day. More than ten times a day however aren't unheard of as well. The general goal for him was to reduce the risk associated with releases by reducing the number of features released per release, reducing the number of moving parts in one release and as a result reducing the number of potential sources for problems: If anything goes wrong, rolling back is no issue - nor is narrowing down on the potential sources of bugs in the changed software that were introduced in that particular release.

This development and output focused part of the process is complemented by an input focused Kanban cycle for product design: Products are going from idea to vision to a more detailed backlog to development and finally live the same as issues in development itself move from Todo to in progress, under review and done.

With both cycles the main goal is to keep the number of items in progress as low as possible. This will result in more focus for each developer and greatly reduce overhead: Don't do more than one or two things at a time. Only catch: Most companies are focused on keeping development busy at all times - their goal is to reach 100% utilization. This however is in no way correlated to actual efficiency: By having 100% utilization there is not way you can deal with problems along the way, there is no buffer. Instead the idea should be to concentrate on a constant flow of released and live features instead.

Now what is the link of all that to Hadoop? (Hint: No, this is no pun on the project's slow release cycle.) The process of Kanban allows for frequent releases, it allows for frequent feedback. This enables a model of development that starts out from a model of your business case (no matter how coarse that may be), start building some code, measure your performance with that code based on actual usage data and adjust the model accordingly. Kanban lets you iterate very quickly on that loop getting you ahead of competitors eventually. In terms of technology one strong tool in their toolbox to really do data analytics on their incoming data is to use Hadoop and scale up analysing business data.

In the second talk Martin Scholl started out by drawing a comparison from music vs. printed music sheets to the actual performance of musicians in a concert: The former represents static, factual data. The latter represents a process that may be recorded, but may not by copied itself as it lives by the interactions with the audience. The same holds true for social networks: Their current state and the way you look at them is deeply influenced by your way of interacting with the system in realtime.

So in addition to data storage solutions for static data, he argues, we also need a way to process streaming data in an efficient and fault tolerant way. The system he uses for that purpose is Storm that was open-sourced by Twitter late last year. Built on top of zeroMQ it allows for flexible and fault tolerant messaging. Example applications mentioned are event analysis (filtering, aggregation, counting, monitoring), parallel distributed rpc based on message passing.

Two concrete examples include setting up a live A/B testing environment that is dynamically reconfigurable based on it's input as well as event handling in a social network environment where interactions might trigger messages being sent by mail and instant message but also trigger updates in a recommendation model.

In the last talk Fabian Hüske from TU Berlin introduced Stratosphere - an EU founded research project that is working on an extended computational model on top of HDFS that provides more flexibility and better performance. Being developed before the rise of Apache Hadoop YARN unfortunately essentially what they did was to re-implement the whole map/reduce computational layer and put their system into that. Would be interesting to see how a port to YARN performs and what sort of advantages it gives in production.

Looking forward to seeing you all in June for Berlin Buzzwords - make sure to submit your presentation soon, call for presentations won't be extended this year.

HowTo: Meetups in Berlin

2012-02-14 20:23
I get that question once in a while - and need the list below myself every now and then: How to actually setup a meetup in Germany. Essentially it all boils down to three questions: Which channels to use for PR? Where to do the meeting? What other benefits to offer to attendees?

When it comes to PR there are several options:

  • Announce the meetup on relevant mailing lists
  • Use social networking sites relevant to your project - in Germany Xing works best, Twitter, Facebook, Linked.In and Google+ are other options
  • Ask anyone you know personally for help with spreading the word
  • If you have one post information on your personal blog

Where to go for the meetup:

The venue usually is the biggest question mark. After deciding on how big you'd like to shoot for initially you can start looking for a location. For your first meetup don't rent a room - with a bit of creativity there are lots of options that are free of charge.

  • If you are a student or have active relations to any university going there usually is the cheapest and least complicated version.
  • Another option is to just book a table in a restaurant that has a reasonably large room. Simply choose your favourite one - knowing the owner helps in getting extra space.
  • Third option is to go to any co-working space that also has a meeting area. In general they are very open to hosting community events - co-up Berlin, Betahaus are just two options.
  • If you are planning a less formal event, your local hacker spaces might be an option: c-base Berlin, in Berlin e.V. are two Berlin examples. Hackers Dojo and Noisebridge are two Bay Area examples.
  • Last but not least look out for local startups that are currently hiring new people: They tend to be very open to hosting events. See Berlin Buzzwords Hackathon providers list for some examples.

What else?

  • Make sure attendees can register themselves - xing works for that, so do Google forms
  • Setup a mailing list or some other notification service to help people track future events (Google Groups works, so does a dedicated Twitter Account)
  • Provide some background online - works but does charge a small fee. Setting up a blog on wordpress or blogger works as well, though it is not quite as interactive as the site.
  • Get in touch with attendees and local companies - usually they are quite happy to provide some financial support to your meetup for free drinks or videos.
  • If you want videos: Recording audio is trivial, putting it online is extremely simple if you use soundcloud's app. Recording video also is rather simple but can be time consuming. Finding sponsors to pay for them if you offer to brand the videos is reasonably simple. For the Hadoop Get Together we usually hire Martin Schmidt. Sites to put videos online: Vimeo works but has rather low upload limits, is a bit better in this respect.
  • Sponsoring in general: Companies looking for developers related to the meetup's technology as well as those providing consulting for that technology tend to be open to supporting local events. What works best is to contact people you already know there - they will know best who to ask internally.

One final note: Being the organiser of such a meetup puts you at the center of a local community. Over time people will start remembering your face and name. Make sure you do the same - you should at least be able to remember faces, affiliations and names of your regular attendees.

Happy Valentine

2012-02-14 06:24
Free Software developers can be very critical: Every single line of code gets scrutinized, every design is reviewed by several often opinionated people. Even the way communities are supposed to work sometimes gets restricted. Sometimes a simple Thank You can make all the difference for any contributor or committer.

I love Free Software!

FSFE proposed a really nice campaign: Celebrate the "I love Free Software" - Day on February 14th. In the hope that some of the readers of this blog actively develop or contribute to free software projects - this is a thank you for you! It's your contributions that make all the difference - be it code, documentation, help for users or code reviews.

February 14th: "I love free software day"

2012-02-13 21:07
This year FSFE is once again running their I love free software campaign on February 14th: The goal they put up is to have more love reports, hugs and Thank You messages sent out than bug reports filed against projects.

They have put online a few ideas on what to do that day. I'd like to add one additional option: If you are using any free software and you feel the urgent need to file a bug report on that day, use the opportunity to submit a patch as well: Make sure to not only describe what is going wrong but add a patch that contains a test to show the issue and a code modification that fixes the issue, is compatible with the project's coding guidelines, doesn't break anything else in the project. Any other contribution (documentation, increasing test coverage, help to other users) welcome as well of course.