Dear lazy web: Travel recommendations

2011-10-21 21:21
Finally getting to spend some days of vacation in the bay area soon - so I thought I might try to leverage the lazy web on recommendations on what not to miss while there. So far on our list:


  • Rent a canoo in Sausalito
  • Take a tour to Alcatras
  • Get to drive down along the coast towards Half Moon Bay
  • Spend one or two days strolling through the city to take photos
  • Do one of these touristic bike trips across the Golden Gate bridge
  • Spend a day or two to get hiking in one of the red wood tree forests
  • Got a recommendation to drive up to Lake Tahoe for two days
  • Get to see Napa valley
  • Spend a day on whale watching


So that already makes up for ten days - however if you spot anything important and beautiful missing, please let me know. If you have any recommendation on where to spend Halloween - please also get in touch. If you are up to meeting for a cup of coffee or dinner and talk about Mahout, HBase, Hadoop and friends - we certainly can fit that into the schedule somewhere.

Night Trains

2011-10-21 05:40
It was several years ago that a frequent traveller told me about it being more comfortable and time saving to travel mid-sized distances (e.g. from Berlin to Amsterdam) by train - night train that is - instead of flying. Back then without decent knowledge of which combination of booking early and discount card make prizes of trips by train somewhat comparable to those offered by airlines that wasn't really an option for me.



It took me years and a conference in Vienna to re-visit to his proposal: Trying to find ways to reduce the amount of times I fly I looked for alternatives. Going by car clearly is not an option as it's more time consuming and on such long distances also more stressful. Going by regular train seemed like a waste of time as well. So I re-checked the offers for night trains. The idea of going to bed in Berlin and waking up at my destination just seemed too good.

Currently sitting in a CNL to Amsterdam I have to admit that for now this is the most relaxing way to travel I've found so far: Get on board, sleep heavenly (though not as comfortable as in your preferred hotel, beds are still ok), get your breakfast brought to bed in the morning. Mix in meeting friendly people (so far maybe I was just lucky) that are either discovering Europe as backpackers (met one from Australia and one from Canada yesterday) or happen to be taking that train as part of their weekly commute.

I think at least for most European cities I'm converted now :)

GoTo Con

2011-10-10 20:49
Location: Amsterdam

Link out: Click here
Start Date: 2011-10-12

End Date: 2011-10-14


This week late Tuesday night I am going to leave for GoTo con in Amsterdam. Train tickets are already booked - this is going to be my first trip with City Night line, will see how great they are.

GoTo Amsterdam features a special Apache track as well as several talk on scaling up, searching, but also includes stuff in general architectural decisions. If you have not registered yet - use dros200 as promotion code to get a discount on the registration prize.

Looking forward to seeing you in Amsterdam later this week.

Design Thinking @ Scrumtisch Berlin

2011-09-26 21:38
This evening Mary Poppendieck gave a presentation on Design Thinking in Berlin. "What does the US Military have in common with Apple?" In her talk Mary went into more detail about how increasing complexity calls for a need to rethink design. Slides are available online. Thanks for sharing them.

Mary started with a question for the audience on who in your company decides what is to be developed: Is it the PO? Is it a "them" or is it "us"? In a world of growing complexities and faster innovation cycles, thinking of them doing the design more and more becomes a huge business risk. It's time to start thinking of development teams as groups of people solving a business need - forgetting about the "that's none of my business" attitude.

Lets first dig deeper into the three steps of design (according to the book The design of design) the three steps of design are:


  • Understanding the problem: Actually finding out what is the problem, what is to be designed is the hardest part. It's best done in a small team as opposed to by a single person.
  • Design the solution: This step involves uncovering specific requirements and identifying alternative solutions - before actually being able to judge whether an alternative is a viable way of solving a given problem, the harder part is identifying possible alternative solutions.
  • Implementing the design: This step must involve getting users and developers of the resulting system work closely together.


To come up with a truly successful design these three steps most likely need to be iterated as what we learn in later steps has to influence earlier steps.

When it comes to design, at traditional corporations there are various job titles for people involved with design. The graph distinguishes teams developing products vs. supporting internal processes. It makes a difference between software and hardware development. In addition there is a difference between people deciding on what to design (denoted in red) and those deciding how to develop (denoted in green). When looking at Scrum all of these roles are combined in the role of the product owner. When taking a closer look this seems to be a large over-simplification - especially when taking into account that the product owner in general is realized not as a role that has to be provided for but as a job description of one single person. It seems like this is too much work and too many hats for one single person.

In contrast Mary advocates a far more hollistic definition of the designers role: In current teams design is mostly being done up-front, in a very detailed way with results in the form of small user stories being handed over to the team afterwards. In contrast to that developers should be integrated in the design process itself - they need to participate, provide their input to come up with really innovative solutions.

Step one: Understanding the problem



The presentation went into more detail on three different approaches to solving the first step in design - that is understanding the problem. The first way was coined as the military approach. In the The operations process a chapter on design was added to cope with complex situations that do not match the expected configuration.



In such situations a combination of collaboration, dialog, critical thinking and creative thinking seems more important than blindly following command. According to the US military's procedures design again is an iterative three step process: From framing the problem, to experimenting and building prototypical solutions, to making sense of the situation and possibly going back to earlier steps based on what we have learned. One important feature of successful design teams is that the are composed of people with diverse backgrounds to not miss details about the situation.

The second approach to design is the Ethnographic approach. Mary showed a video introducing design company IDEO. The interesting take away for me was that again teams need to be diverse to see all details, that designers need not only find statistics on existing products but also go out to the customer using the product and listen to those "experts". The need to work under time constraints, work in a very focused manner, build prototypes - not to deploy them but to learn from them, merge them and throw away what does not work early in the process.

Coming from a data driven, Hadoop-y background, the third approach was the one most familiar to me: The data based approach focuses on finding success metrics and optimising those. When going back to the "Mythical man month" example of IBM building the IBM PCjr - the team spending 2 years developing a product that was a horrible failure, a product that was judged very badly in common it magazines: What could they have changed to do better? Well the solution almost is in the question: If there are people writing articles to judge products, people that have criteria for judging product success - why not optimise for their criteria? Another example would be Toyota developing the Lexus: Early in the process they decided to optimise for a top score on each of the car magazine reviews. Turned out though hard it was, it was not impossible. And lead to huge success on the sales side.

So instead of being caught in our little development bubble maybe it's time to think bigger - to think not only about what makes our project successful but to think about what makes business successful. And while at it - why not measure success or failure of a piece of software by how successful it is monetarily? This leads to a whole new way of thinking about development:


  • Instead of a product roadmap we start thinking about a business model canvas: What metric gives us information on the business success of a new feature? How can we measure that? This can be as easy as click conversion rate. It's closely tied to the way you make money.
  • Instead of a product vision we turn to measuring a product to market fit.
  • Instead of a release plan we start thinking about the minimal viable product: What is the least you can do to generate more revenue?
  • Instead of an on-site customer we start having a on-site developer. This is about getting developers out of the building, get them to talk to your users and get this experience back in. Note: To some extend this is even possible by looking over the user's shoulder by analysing your log files. Ever thought of taking a closer look at your search engines queries that return no results? Maybe instead of cursing the users that are too stupid to use your interface it's time to re-think your interface design or add more features to your search engine.
  • Instead of iterations we have a loop of building - measuring - learning about our product.
  • Instead of an iteration backlog at the end of each loop we have to decide about whether to preserve the current development direction or pivot it a bit.
  • Instead of a backlog of things to do we have a list of things to learn about our users.
  • Instead of detailed tiny user stories we have hypothesis of how our product is used that are validated or falsified by actual data.
  • Instead of continuous integration you have continuous deployment that is much more interesting.
  • Instead of acceptance tests we may have split tests, A/B tests or similar.
  • Instead of a definition of done including only development it suddenly also comprises validating the data - did we actually convert more users?
  • Instead of costumer feedback we are having a cohort-based metric, we start watching our users to learn - they are rarely willing to give explicit feedback, but their actions are very telling.
  • Instead of a product owner we have entrepreneurs that have the business as a whole in mind.


Design a solution



Good industrial design is made up of ten principles: It's innovative, makes the product useful, understandable, aesthetic, unobtrusive, honest (does not fool users), long lasting, thorough through, environmentally friendly - and most important of all there is as few design in there as possible.

As one example Mary mentioned the flight booking site Hipmunk.com that gives users the chance to sort flights by least agony. It optimises for the best flying experience - not the best buying experience.

In the end it's all about building what the user needs - not selling what your build. It's about watching your users and learning about their needs.

Implementing design



One grave warning: Do not divorce design from development: Ford was a driver and racer himself, making him a great car designer. Wright flew the aircrafts he designed. Tom Edison thought through the whole process. Also in the tech industry the companies that are nowadays most successful were founded by people who are or at least were themselves very actively using the stuff they are building - think Bill Gates, Steve Wozniak and Steve Jobs, Sergey Brin and Larry Page. Takeaway is to do the job you are trying to automate, to have incremental development and delivery.

In the end it all boils down to how fast a learner you are: Not the most intelligent nor the strongest will survive. It's the one that is fastest to adapt to change that will be successful. One commonly used analogy for that is the so-called OODA cycle coming from John Boyd: The tighter your cycle of observing, orienting, deciding and acting is, the likelier you are to be successful among your competitors.

Speaking of release cycles Mary gave a very nice graph of various release cycle lengths.



In her experience teams that have deployment cycles of six months and above are likely to spend one third of their time hardening their software. If you are down to a quarter it's likely that the bottleneck is your business model. Maybe it is weird contracts or business partners expecting support for the exact version you shipped years ago. When you are down to a month per release you likely are running a software as a service model. As clarified later after a question from the audience for Mary this does include setups where customers basically buy a software including all future updates for a set period of time - including an automated roll-out. If you are down to daily or even just weekly releases you are likely running the product yourself. Mose likely then everyone is part of the team.

After her talk Mary answered various questions from the audience: According to her experience Scrum defines the team way too narrow by focusing on development only - with Scrum we are possibly getting perfect at doing the wrong thing very well - forgetting all about design and business.

One very general advise that leads to an interesting conclusion is to not waste your life, to not spend years of your life working on projects that are not needed by anyone. In the end this leads to realizing that failed projects are also caused by the developers not raising concerns and issues.

Mary is a huge proponent of trusting your team: Do not hand them detailed lists of user stories. You have hired great, senior people (or haven't you??) - give them an idea what you want and let them figure out the details by talking to the right people.

When asked for tools to support the lean startup process in a startup Mary emphasized the need for communication. Tools are likely to only slow you down: Get people to communicate, to sit at one table. Have a weekly or bi-weekly meeting to get everyone in sync. Otherwise let developers figure out their way to collaborate.

The talk was kindly hosted by Immobilien Scout, organised by agiliero with some tickets handed over to the Berlin Scrumtisch. Thanks guys.

Talking people into submitting patches

2011-09-21 20:22
In November I am going to attend Apache Con NA. This year I decided to do a little experiment: I sumitted a talk on talking people into contributing to free software projects. The format of the talk is a bit unusual: Drawing from my - admittedly limited and very biased - experience explaining free software to others and talking people into contributing patches this talks tries to initiate a discussion on methods to get awesome developers to consider contributing their work back to free software projects.

As a precursor to the talk I have created a public Google docs document - it already contains the title of each slide I will use in my presentation. Of course the content does not get disclosed ahead of time.

If any of the readers of this blog post has experience with either explaining why they contribute, how to contribute, issues and questions new users have - please feel free to fill them into above document. I'll try to integrate as much feedback as possible into my final slides.

Machine learning problem settings

2011-08-06 07:10
Together with Sebastian Schelter I held a Nokia sponsored (Thank you!) lecture on large scale data analysis and data mining during the past few months.

After supervising a few successful university projects based on Apache Mahout the goal of this lecture was to introduce students to some of the basic concepts and problems encountered today in a world where huge datasets are generally available and are easy to process with Apache Hadoop. As such the course is targeted at an entry level audience - thorough treatment of the mathematical background of latest machine learning technology is left to the machine learning research groups in Potsdam, at TU Berlin and the neural information processing group at TU.

Slides and exercises are available online via git. Please let me know if you want to re-use them in your lecture.



The very first problem that users of machine learning algorithms usually come across is mapping their application problem to one of the various machine learning problems. In 2010 Michael Br├╝ckner gave a lecture on Intelligent Data Analysis with Matlab (slides and videos in German) including a simple taxonomy of algorithms:

According to


  • the types of input data an algorithm can handle (either independent instances, also called examples, sequences or graphs of instances)
  • the type of training data available (e.g. instances with assigned nominal target attribute, no labels at all, a partial sorting of sets of instances)
  • and the learning goal
algorithms can be nicely partitioned by the learning problem that they solve. Based on that very first step of identifying exactly what the problem setting is, deciding which algorithm to use becomes much easier. Based on that taxonomy I came up with the above graphic giving a first overview of which tasks can be solved with machine learning:

Boxes in dark blue are what in general is called supervised learning, yellow unsupervised and light blue semi supervised - based on the amount of labeled training data available. Red boxes indicate settings with the goal of knowledge discovery. Green are any ranking problems.

Apache Mahout Hackathon Berlin

2011-03-21 21:39
Last year Sebastian Schelter from Berlin was added to the list of committers for Apache Mahout. With two committers in town the idea was born to meet some day, work on Mahout. So why not just announce that meeting publicly and invite others who might be interested in learning more about the framework? I got in touch with c-base - a hacker space in Berlin well suited to host a Hackathon - and quickly got their ok for the event.

As a result the first Apache Mahout Hackathon took place at c-base in Berlin last weekend. We had about eight attendees - arriving at varying times: I guess 11a.m. simply is way too early to get up for your average software developer on a Saturday. I got a few people surprised by the venue - especially those who were attending a Hackathon for the very first time and had expected c-base to be some IT company ;)

We started the day with a brief collection of ideas that everyone wanted to work on: Some needed help to use Mahout - topics included:

  • How to use Apache Mahout collaborative filtering with complex models.
  • How to use Apache Mahout via a web application?
  • How to use classification (mostly focussed on using Naive Bayes from within web applications).
  • Is HBase a solution for scalable graph mining algorithms?
  • Is there a frequent itemset algorithm that respects temporal changes in patterns?


Those more into Mahout development proposed a slightly different set of topics:

  • PLSI and Map/Reduce?
  • Build customisable sampling strategies for distributed recommendations.
  • Come up with a more Java API friendly configuration scheme for Mahout clusterings.
  • Complete the distributed SVD recommender.


Quickly teams of two to three (and more) people formed. First several user side questions could be addressed by mixing more experienced Mahout developers with newbie users. Apart from Mahout specifics also more basic questions of getting involved even by simply contributing to the online documentation, answering questions on the mailing lists or just providing structured access to existing material that users generally have trouble finding.

Another topic that is being overlooked all too when asking users to contribute to the project is the process of creating, submitting, applying and reviewing patches itself: Being deeply involved with free software projects dealing with patches, integration of issue tracker and svn with the project mailing lists all seems very obvious. However even this seemingly basic setup sometimes looks confusing and complex to regular users - that is very common but not limited to people who are just starting to work as software developers.

Thanks to Thilo Fromm for taking the group picture.

In the evening people finally started hacking more sophisticated tasks - working on the first project patches. On Sunday only the really hard core developers remained - leading to a rather focussed work on Mahout improvements which in the end led to first patches sent in from the Mahout Hackathon.

Apache Mahout Hackathon Berlin

2011-03-21 21:39
Last year Sebastian Schelter from Berlin was added to the list of committers for Apache Mahout. With two committers in town the idea was born to meet some day, work on Mahout. So why not just announce that meeting publicly and invite others who might be interested in learning more about the framework? I got in touch with c-base - a hacker space in Berlin well suited to host a Hackathon - and quickly got their ok for the event.

As a result the first Apache Mahout Hackathon took place at c-base in Berlin last weekend. We had about eight attendees - arriving at varying times: I guess 11a.m. simply is way too early to get up for your average software developer on a Saturday. I got a few people surprised by the venue - especially those who were attending a Hackathon for the very first time and had expected c-base to be some IT company ;)

We started the day with a brief collection of ideas that everyone wanted to work on: Some needed help to use Mahout - topics included:

  • How to use Apache Mahout collaborative filtering with complex models.
  • How to use Apache Mahout via a web application?
  • How to use classification (mostly focussed on using Naive Bayes from within web applications).
  • Is HBase a solution for scalable graph mining algorithms?
  • Is there a frequent itemset algorithm that respects temporal changes in patterns?


Those more into Mahout development proposed a slightly different set of topics:

  • PLSI and Map/Reduce?
  • Build customisable sampling strategies for distributed recommendations.
  • Come up with a more Java API friendly configuration scheme for Mahout clusterings.
  • Complete the distributed SVD recommender.


Quickly teams of two to three (and more) people formed. First several user side questions could be addressed by mixing more experienced Mahout developers with newbie users. Apart from Mahout specifics also more basic questions of getting involved even by simply contributing to the online documentation, answering questions on the mailing lists or just providing structured access to existing material that users generally have trouble finding.

Another topic that is being overlooked all too when asking users to contribute to the project is the process of creating, submitting, applying and reviewing patches itself: Being deeply involved with free software projects dealing with patches, integration of issue tracker and svn with the project mailing lists all seems very obvious. However even this seemingly basic setup sometimes looks confusing and complex to regular users - that is very common but not limited to people who are just starting to work as software developers.

Thanks to Thilo Fromm for taking the group picture.

In the evening people finally started hacking more sophisticated tasks - working on the first project patches. On Sunday only the really hard core developers remained - leading to a rather focussed work on Mahout improvements which in the end led to first patches sent in from the Mahout Hackathon.

Apache Mahout Hackathon Berlin

2011-03-21 21:39
Last year Sebastian Schelter from Berlin was added to the list of committers for Apache Mahout. With two committers in town the idea was born to meet some day, work on Mahout. So why not just announce that meeting publicly and invite others who might be interested in learning more about the framework? I got in touch with c-base - a hacker space in Berlin well suited to host a Hackathon - and quickly got their ok for the event.

As a result the first Apache Mahout Hackathon took place at c-base in Berlin last weekend. We had about eight attendees - arriving at varying times: I guess 11a.m. simply is way too early to get up for your average software developer on a Saturday. I got a few people surprised by the venue - especially those who were attending a Hackathon for the very first time and had expected c-base to be some IT company ;)

We started the day with a brief collection of ideas that everyone wanted to work on: Some needed help to use Mahout - topics included:

  • How to use Apache Mahout collaborative filtering with complex models.
  • How to use Apache Mahout via a web application?
  • How to use classification (mostly focussed on using Naive Bayes from within web applications).
  • Is HBase a solution for scalable graph mining algorithms?
  • Is there a frequent itemset algorithm that respects temporal changes in patterns?


Those more into Mahout development proposed a slightly different set of topics:

  • PLSI and Map/Reduce?
  • Build customisable sampling strategies for distributed recommendations.
  • Come up with a more Java API friendly configuration scheme for Mahout clusterings.
  • Complete the distributed SVD recommender.


Quickly teams of two to three (and more) people formed. First several user side questions could be addressed by mixing more experienced Mahout developers with newbie users. Apart from Mahout specifics also more basic questions of getting involved even by simply contributing to the online documentation, answering questions on the mailing lists or just providing structured access to existing material that users generally have trouble finding.

Another topic that is being overlooked all too when asking users to contribute to the project is the process of creating, submitting, applying and reviewing patches itself: Being deeply involved with free software projects dealing with patches, integration of issue tracker and svn with the project mailing lists all seems very obvious. However even this seemingly basic setup sometimes looks confusing and complex to regular users - that is very common but not limited to people who are just starting to work as software developers.

Thanks to Thilo Fromm for taking the group picture.

In the evening people finally started hacking more sophisticated tasks - working on the first project patches. On Sunday only the really hard core developers remained - leading to a rather focussed work on Mahout improvements which in the end led to first patches sent in from the Mahout Hackathon.

Apache Mahout Hackathon Berlin

2011-03-21 21:39
Last year Sebastian Schelter from Berlin was added to the list of committers for Apache Mahout. With two committers in town the idea was born to meet some day, work on Mahout. So why not just announce that meeting publicly and invite others who might be interested in learning more about the framework? I got in touch with c-base - a hacker space in Berlin well suited to host a Hackathon - and quickly got their ok for the event.

As a result the first Apache Mahout Hackathon took place at c-base in Berlin last weekend. We had about eight attendees - arriving at varying times: I guess 11a.m. simply is way too early to get up for your average software developer on a Saturday. I got a few people surprised by the venue - especially those who were attending a Hackathon for the very first time and had expected c-base to be some IT company ;)

We started the day with a brief collection of ideas that everyone wanted to work on: Some needed help to use Mahout - topics included:

  • How to use Apache Mahout collaborative filtering with complex models.
  • How to use Apache Mahout via a web application?
  • How to use classification (mostly focussed on using Naive Bayes from within web applications).
  • Is HBase a solution for scalable graph mining algorithms?
  • Is there a frequent itemset algorithm that respects temporal changes in patterns?


Those more into Mahout development proposed a slightly different set of topics:

  • PLSI and Map/Reduce?
  • Build customisable sampling strategies for distributed recommendations.
  • Come up with a more Java API friendly configuration scheme for Mahout clusterings.
  • Complete the distributed SVD recommender.


Quickly teams of two to three (and more) people formed. First several user side questions could be addressed by mixing more experienced Mahout developers with newbie users. Apart from Mahout specifics also more basic questions of getting involved even by simply contributing to the online documentation, answering questions on the mailing lists or just providing structured access to existing material that users generally have trouble finding.

Another topic that is being overlooked all too when asking users to contribute to the project is the process of creating, submitting, applying and reviewing patches itself: Being deeply involved with free software projects dealing with patches, integration of issue tracker and svn with the project mailing lists all seems very obvious. However even this seemingly basic setup sometimes looks confusing and complex to regular users - that is very common but not limited to people who are just starting to work as software developers.

Thanks to Thilo Fromm for taking the group picture.

In the evening people finally started hacking more sophisticated tasks - working on the first project patches. On Sunday only the really hard core developers remained - leading to a rather focussed work on Mahout improvements which in the end led to first patches sent in from the Mahout Hackathon.