On Reading Code

2012-08-02 15:14

“If you don’t have time to read, you don’t have the time or the tools to write.” –Stephen King


Quite a while ago GeeCon published the video taped talk of Kevlin Henney on "Cool Code". This keynote is great to watch for everyone who loves to read code - not the one you encounter in real world enterprise systems - but the one that truely teaches you lessons:

GeeCON 2012: Kevlin Henney - Cool Code from GeeCON Conference on Vimeo.


Apache Con returns to Europe

2012-08-01 20:41
In November Apache Con will come back to Europe. The event will take place in Sinsheim inviting foundation members, project committers, contributors and users to meet, discuss and have fun during the one week event.



Several meetups will be held the weekend before the main conference kicks off, watch out for announcements on your favourite project mailing list.

ApacheCon is still open for submissions until August 3rd - head over to the Call for submissions for more information. The conference is split into several tracks that are being handled individually: Apache Daily - Tools frameworks and components used on a daily basis, Apache Java Enterprise projects, Big Data, Camel in Action, Cloud, Linked Data, Lucene, Modular Java Applications, NoSQL Database, OFBiz (The Apache Enterprise Automation project), Open Office and finally Web Infrastructure (covering HTTPD, TomCat and Traffic Server, the heart of many Internet projects).

Make sure to mark the date in your calendar to meet with the people behind the ASF projects, learn more on how the foundation works and what makes Apache projects so particular compared to others. Join us for a week of fun and dense talks on all things Apache.


The Apache Feather logo is a trademark of The Apache Software Foundation.

FrOSCon 2012

2012-07-31 20:25
On August 25th/26th the Free and Open Source Conference (FrOSCon) will again kick off in Sankt Augustin/ Germany.



The event is completely community organised, hosted by the FH Sankt Augustin. It covers a broad range of free software topics like Arduino microcontrollers, git goodies, politics, strace, open nebula, wireshark and others.

Three highlights that are on my schedule:



Looking forward to interesting talks and discussions at FrOSCon.

O'Reilly Strata coming to London

2012-07-30 20:05
O'Reilly Strata is coming to London. The first edition of Strata back in 2011 brought Big Data developers, designers, scientists and decision makers together to discuss all things scalable. This year in October the conference comes to Europe: O'Reilly Strata EU will take place in London.

Date: October 1st - 2nd 2012

Venue: Hilton London Metropole, 225 Edgware Road, London W2 1JU, UK

The schedule covers a great deal of use cases and war stories that involve big data and data driven development. Both days are packed with both deep technical but also strategy level presentations that can help drive your projects.

Having been on the program committee I got a glimpse of the diversity and high quality of the submissions received. Choosing the best wasn't easy, but there's only so much content you can sqeeze in two conference days.

Looking forward to London.

PS: If you have any interesting war stories and anti-patterns involving big data to share consider adding your input online.

Book: Search Patterns

2012-07-28 20:41
I got the book months ago during FOSDEM - the O'Reilly book table always is a pretty dangerous place as a meeting point for me: Search Patterns - Design for Discovery is one of those small, deceivingly beautiful books that manages to explain effective search engine design by focusing on the end user needs but going into some detail concerning the basics of search engine backends as well.

We use them on a daily basis not only for finding content on the web but also for navigating shopping sites, discovering news content and even finding articles on blogs and open source project pages. Many discovery tasks can be easily expressed as a search problem and as a result tackled with by now standard off the shelve software like Apache Lucene - or event the commercial counterparts from the enterprise search market. Still oftentimes search is perceived as being made up of simple a small box that users type (typically one or two term) queries into and that as a result show a list of some ten links.

After setting the stage for search in the first chapter the book goes into some more detail in "The anatomy of search". In a very approachable way it explains all the components from user constraints, graphical interface, the basics of retrieval and evaluating search performance in terms of precision and recall. The third chapter shows some bahavioural patterns that make discovery easier for users - from incrementally constructing the answer, progessively disclosing more and more detail up to being predictable.

Finally the design patterns as identified by the authors are introduced. Pretty obvious to those working in the field but well explained to those not intimately familiar with the topic:


  • Though perceived as a mere convenience to type less by users, autocomplete can actually help guide the user's search in case of ambiguities and can help avoid imprecise results.
  • Expected as it might be by users, presenting the best result first actually goes a long way when building credibility for a search engine. Having more precise queries to guide e.g. as a result of autocomplete helps here. So does having strong ranking criteria to build up a compelling ranking function that is used by default (even though others might be offered as an alternative for users to explore more and different results).
  • Federated search has both - advantages (integrating otherwise isolated silos of knowledge) but also disadvantages (it's speed being dominated by the slowest connected search engine).
  • Facetted navigation is pretty much standard for any major search engine - giving the user the option to start with a broad query that returns an overwhelming amount of results but guiding the user when refining the query is one major way of driving searches.
  • Offering personalisation tends to be one beloved feature though it is particularly hard to implement and needs a good deal of user data to work well. Usually there are features that require less work to get done that are more promising to start with.
  • Pageination is as much standard to be expected by users - though its implementation can differ: Though we are used to clicking the next button, this actually may not make much sense and just lead to interrupting the user's flow. Much more appealing - but sometimes also confusing - can be interfaces that allow for simply extending the result page when scroling to it's end.
  • Structured results provide a way to give the user more than just an outlink - triggered by specific searches it may be possible to directly answer the user's question instead of linking to content that answers it.
  • Actionable results are a way for the user to get active - either by voting on results, bookmarking them or sharing them with others.
  • Unified discovery is about accepting that search always plays a role in a bigger context and has to play well with the discovery mode the user is in: When searching for "apple" while browsing the category "electronics" it's rather unlikely that I am looking for the fruit. Similarly search should take context into account and support me seamlessly when switching from discovery to directed search and back to discovery mode.


The book concludes by going into some detail on example search engines and presenting some features that are not yet commonplace but might change the world by employing search in new and creative ways.

Easy to read, well written, several nice examples to make the technical points simpler to understand. Definitely a good read for domain experts planning to build a search engine, designers trying to understand the basics of building effective search engines and engineers struggling for words to explain why a seemingly little box can cause a whole lot of pain when done wrong but a whole lot of joy when done right.

Teddy in Sweden

2012-07-25 19:22
Some picture taken all the way up in northern Sweden:



Those picture were taken mid-June. That means what looks like Teddy sitting in the afternoon sun actually was taken 20min before midnight some 40km south of the arctic circle at Camp Frevisören - an incredible spot to start the day on a canoo:


View Larger Map

(That little half-isle that stretches into the ocean.)

If you ever travel that far north, make sure to stop by at Hulkoff.se. We got the tip only a few days before we left asking a Swedish friend where to go to see midsummer. Though he is not from that very area he recommended going there if we get a chance - and that turned out a beautiful idea: Not only is the restaurant/ conference venue nicely located just a few km before Finland - they also serve most tasty meals!

PS: In case you're wondering what that monkey on the pictures is - it's Teddy's new friend "Herr Nielson" - the little squirrel monkey that is the best friend of the strongest girl in the world.

Recsys meetup Berlin

2012-07-25 01:31
Planning a meetup in Berlin: 8 people register, a table for 14 people is booked, 16+ people arrive - all of that even if no pre-defined topic or talk is announced. Seems like building recommender systems is a hot topic currently in Berlin.

Thanks to Zeno Gantner from MyMedialight for organising the event - looking forward to the next edition.

Apache Hadoop Get Together Berlin

2012-07-23 20:41
As seen on Xing - the next Apache Hadoop Get Together is planned to take place in August:

When: 15. August, 18 p.m.

Where: Immobilien Scout GmbH, Andreasstr. 10, 10243 Berlin


As always there will be slots of 30min each for talks on your Hadoop topic. After each talk there will be time for discussion.

It is important to indicate attendance. Only registered visitors will be permitted to attend.

Register here: https://www.xing.com/events/hadoop-get-together-1114707


Talks scheduled thus far:

Speaker:
Dragan Milosevic

Session:
Robust Communication Mechanisms in zanox Reporting Systems

It happened an annoying number of times that we wanted to improve only one particular component in our distributed reporting system, but often had to update almost everything due to the RPC version-mismatch, which occurred in a communication between the updated component and the rest of our system. To mitigate this problem and to significantly simplify the integration of new components, we extended the used RPC protocol to perform a version handshake before the actual communication starts. This RPC extension is accompanied with serialisation/deserialization methods, which are downward compatible due to being able to successfully deserialise any
serialised older version of exchanged objects. Putting together these extensions makes it possible for us to successfully operate multiple versions of frontend and backend components, and to have the power to autonomously decide what and when should be updated/improved in our distributed reporting system.


Two other talks are planned and I will provide you with further information soon.

A big Thank You goes to Immobilien Scout GmbH for providing the venue at no cost for our event and for sponsoring the videotaping of the presentations.

Looking forward to seeing you in Berlin,

David

Apache Hadoop Get Together Berlin

2012-07-23 20:41
As seen on Xing - the next Apache Hadoop Get Together is planned to take place in August:

When: 15. August, 18 p.m.

Where: Immobilien Scout GmbH, Andreasstr. 10, 10243 Berlin


As always there will be slots of 30min each for talks on your Hadoop topic. After each talk there will be time for discussion.

It is important to indicate attendance. Only registered visitors will be permitted to attend.

Register here: https://www.xing.com/events/hadoop-get-together-1114707


Talks scheduled thus far:

Speaker:
Dragan Milosevic

Session:
Robust Communication Mechanisms in zanox Reporting Systems

It happened an annoying number of times that we wanted to improve only one particular component in our distributed reporting system, but often had to update almost everything due to the RPC version-mismatch, which occurred in a communication between the updated component and the rest of our system. To mitigate this problem and to significantly simplify the integration of new components, we extended the used RPC protocol to perform a version handshake before the actual communication starts. This RPC extension is accompanied with serialisation/deserialization methods, which are downward compatible due to being able to successfully deserialise any
serialised older version of exchanged objects. Putting together these extensions makes it possible for us to successfully operate multiple versions of frontend and backend components, and to have the power to autonomously decide what and when should be updated/improved in our distributed reporting system.


Two other talks are planned and I will provide you with further information soon.

A big Thank You goes to Immobilien Scout GmbH for providing the venue at no cost for our event and for sponsoring the videotaping of the presentations.

Looking forward to seeing you in Berlin,

David

Apache Hadoop Get Together Berlin

2012-07-23 20:41
As seen on Xing - the next Apache Hadoop Get Together is planned to take place in August:

When: 15. August, 18 p.m.

Where: Immobilien Scout GmbH, Andreasstr. 10, 10243 Berlin


As always there will be slots of 30min each for talks on your Hadoop topic. After each talk there will be time for discussion.

It is important to indicate attendance. Only registered visitors will be permitted to attend.

Register here: https://www.xing.com/events/hadoop-get-together-1114707


Talks scheduled thus far:

Speaker:
Dragan Milosevic

Session:
Robust Communication Mechanisms in zanox Reporting Systems

It happened an annoying number of times that we wanted to improve only one particular component in our distributed reporting system, but often had to update almost everything due to the RPC version-mismatch, which occurred in a communication between the updated component and the rest of our system. To mitigate this problem and to significantly simplify the integration of new components, we extended the used RPC protocol to perform a version handshake before the actual communication starts. This RPC extension is accompanied with serialisation/deserialization methods, which are downward compatible due to being able to successfully deserialise any
serialised older version of exchanged objects. Putting together these extensions makes it possible for us to successfully operate multiple versions of frontend and backend components, and to have the power to autonomously decide what and when should be updated/improved in our distributed reporting system.


Two other talks are planned and I will provide you with further information soon.

A big Thank You goes to Immobilien Scout GmbH for providing the venue at no cost for our event and for sponsoring the videotaping of the presentations.

Looking forward to seeing you in Berlin,

David