ApacheConNA: On delegation

2013-05-10 20:19


In her talk on delegation Deb Nicholson touched upon a really important topic in
OSS: Your project may live longer than you are willing to support it yourself.


The first important point about delegation is to delegate - and to not wait
until you have to do it. Soon you will realise that mentoring and delegation
actually is a way to multiply your resources.


In order to delegate people to delegate to are needed. To find those it can be
helpful to understand what motivates people to work in general as well as on
open source in particular: Sure, fixing a given problem and working on great
software projects may be part of it. As important though is recognition
individually and in groups of people.


Keeping that in mind, ``Thanking'' is actually a license to print free money in
the open source world. Do it in a verbose manner to be believable, do it in
public and in a way that makes your contributors feel a little bit of glory.


Another way to lead people in is to help out socially: Facilitate connections,
suggest connections, introduce people. Based on the diversity of the project
you are working on you may be in a way larger network and have access to much
more corporations and communities than any peer who is not active. Use that
potential.


Also when leading OSS projects keep in eye on people being rude: Your project
should be accessible to facilitate participation.


In case of questions treat them as a welcome opportunity to pull a new
community member in: Answer quickly, answer on your list, delegate to middle
seniors to pull them in. Have training missions for people who want to get
started and don't know your tooling yet. Have prepared documents to provide
links to in case questions occur.


In Apache we tend to argue people should not fall victim of volunteeritis.
Another way to put that is to make sure to avoid the licked cookie syndrom:
When people volunteer to do a task and never re-appear that task is tainted
until explicitly marked as ``not taken'' later on. One way to automate that is
to have a fixed deadline after which tasks are automatically marked as free to
take and tackle by anyone.


When it comes to the question of When to write documentation: There really is
no point in time that should stop you from contributing docs - all the way from
just above getting started level (writing the getting started docs for those
following you) up to the ``I'm an awesome super-hacker'' mode for those trying
to hack on similar areas.


Especially when delegating to newbies make sure to set the right expectations:
How long is it going to take to fix an issue, what is the task complexity, tell
them who is going to be involved, who is there to help out in case of road
blocks.


In general make sure to be a role model for the behaviour you want in your
project: Ask questions yourself, step back when your have taken on too much,
appreciate people stepping back.


Understand the motivation of your new comers - try to talk to them one on one
to understand their motivation and help to align work on the project with their
life goal. When starting to delegate, start with tasks that seem to small to
delegate at all to get new people familiar with the process - and to get
yourself familiar with the feeling of giving up control. Usually you will need
to pull tasks apart that before were done by one person. Don't look for a
person replacement - instead look for separate tasks and how people can best
perform these.


Make visible and clear what you need: Is it code or reviews? Documentation or
translations, UX helpers? Incentivise what you really need - have code sprints,
gamify the process of creating better docs, put the logo creation under a
challenge.


All of this is great if you have only people who all contribute in a very
positive way. What if there is someone who's contributions are actually
detrimental to the project? How to deal with bad people? They may not even do
so intentionally... One option is to find a task that better suits their
skills. Another might be to find another project for them that better fits
their way of communicating. Talk to the person in question, address head on
what is going on. Talking around or avoiding that conversation usually only
delays and enlarges your problem. One simple but effective strategy can be to
tell people what you would like them to do in order to help them find out that
this is not what they want to do - that they are not the right people for you
and should find a better place.


More on this can be found in material like ``How assholes are killing your
project'' as well as the ``Poisonous people talk'' and the book ``Producing
open source software''.


On the how of dealing with bad people make sure to criticise privately first,
chack in a backchannel of other committers for their opinion - otherwise you
might be lonely very quickly. Keep to criticising the bahaviour instead of the
person itself. Most people really do not want to be a jerk.

ApacheConNA: First keynote

2013-05-09 20:13


All three ApacheCon keynotes were focussed around the general theme of open
source communities. The first on given by Theo had very good advise to the
engineer not only striving to work on open source software but become an
excellent software developer:




  • Be loyal to the problem instead of to the code: You shouldn't be
    addicted to any particular programming language or framework and refuse to work
    and get familiar with others. Avoid high specialisation and seek cross
    fertilisation. Instead of addiction to your tooling you should seek to
    diversify your toolset to use the best for your current problem.

  • Work towards becoming a generalist: Understand your stack top to bottom -
    starting with your code, potentially passing the VM it runs in up down to the
    hardware layer. Do the same to requirements you are exposed to: Being 1s old
    may be just good enough to be ``always current'' when thinking of a news
    serving web site. Try to understand the real problem that underpins a certain
    technical requirement that is being brought up to you. This deep understanding
    of how your system works can make the difference in fixing a production issue
    in three days instead of twelve weeks.




The last point is particularly interesting for those aiming to write scalable
code: Software and frameworks today are intended to make development easier -
with high probability they will break when running at the edge.


What is unique about the ASF is the great opportunity to meet people with
experience in many different technologies. In addition there is an unparalleled
level of trust in a community as diverse as the ASF. One open question that
remains is how to leverage this potential successfully within the foundation.


Apache Hadoop Get Together Berlin

2013-05-08 23:41
This evening I joined the group over at Immobilienscout 24 for today's Hadoop Get Together. David Obermann had invited Dr. Falk-Florian Henrich from CeleraOne to talk about their real-time analytics on live data streams.



Their system is being used by the New York Times Springer's Die Welt for traffic analysis. The goal is to identify recurring users that might be willing to pay for the content they want to read. The trade-off here is to keep readers interested long enough to make them pay in the end, instead of scaring them away with a restrictive pay wall which would immediately lead to way less ad revenues.

Currently CeleraOne's system is based on a combination of MongoDB for persistent storage, ZeroMQ for communicating with the revenue engine and http/json for connecting to the controlling web frontend. The live traffic analysis is all done in RAM, while long term storage ends up in MongoDB.

The second speaker was Michael Hausenblas from MapR. He spends most of his time contributing to Apache Drill - an open source implementation of Google's Dremel.






Being an Apache project Drill is developed in an open, meritocratic way - contributors come from various different backgrounds. Currently Drill is in its early stages of development: They have a logical plan, a reference interpreter, a basic SQL parser. There is a demo application. As data backends they support HBase.

For most of the implementation they are trying to re-use existing libraries, e.g. for the columnar storage Drill is looking into either using Twitter's Parquet or Hive ORC file format.

In terms of contributing to the project: There is no need to be a rockstar programmer to make valuable contributions to Apache Drill: Use cases, documentation, test data are all valuable and appreciated by the project.

For more information check out the slide deck (this is an older version - this nights edition most likely soon to be published):



If you missed today's event make sure to get enlisted in the Hadoop Get Together Xing Group so next time you get a notification.

One thing to note though: When registering for the event - please make sure to free your ticket if you cannot make it. I had a few requests from people who would have loved to attend today who didn't get a ticket but would most likely have fit into the room.

Apache Hadoop Get Together Berlin

2013-05-08 23:41
This evening I joined the group over at Immobilienscout 24 for today's Hadoop Get Together. David Obermann had invited Dr. Falk-Florian Henrich from CeleraOne to talk about their real-time analytics on live data streams.



Their system is being used by the New York Times Springer's Die Welt for traffic analysis. The goal is to identify recurring users that might be willing to pay for the content they want to read. The trade-off here is to keep readers interested long enough to make them pay in the end, instead of scaring them away with a restrictive pay wall which would immediately lead to way less ad revenues.

Currently CeleraOne's system is based on a combination of MongoDB for persistent storage, ZeroMQ for communicating with the revenue engine and http/json for connecting to the controlling web frontend. The live traffic analysis is all done in RAM, while long term storage ends up in MongoDB.

The second speaker was Michael Hausenblas from MapR. He spends most of his time contributing to Apache Drill - an open source implementation of Google's Dremel.






Being an Apache project Drill is developed in an open, meritocratic way - contributors come from various different backgrounds. Currently Drill is in its early stages of development: They have a logical plan, a reference interpreter, a basic SQL parser. There is a demo application. As data backends they support HBase.

For most of the implementation they are trying to re-use existing libraries, e.g. for the columnar storage Drill is looking into either using Twitter's Parquet or Hive ORC file format.

In terms of contributing to the project: There is no need to be a rockstar programmer to make valuable contributions to Apache Drill: Use cases, documentation, test data are all valuable and appreciated by the project.

For more information check out the slide deck (this is an older version - this nights edition most likely soon to be published):



If you missed today's event make sure to get enlisted in the Hadoop Get Together Xing Group so next time you get a notification.

One thing to note though: When registering for the event - please make sure to free your ticket if you cannot make it. I had a few requests from people who would have loved to attend today who didn't get a ticket but would most likely have fit into the room.

Apache Hadoop Get Together Berlin

2013-05-08 23:41
This evening I joined the group over at Immobilienscout 24 for today's Hadoop Get Together. David Obermann had invited Dr. Falk-Florian Henrich from CeleraOne to talk about their real-time analytics on live data streams.



Their system is being used by the New York Times Springer's Die Welt for traffic analysis. The goal is to identify recurring users that might be willing to pay for the content they want to read. The trade-off here is to keep readers interested long enough to make them pay in the end, instead of scaring them away with a restrictive pay wall which would immediately lead to way less ad revenues.

Currently CeleraOne's system is based on a combination of MongoDB for persistent storage, ZeroMQ for communicating with the revenue engine and http/json for connecting to the controlling web frontend. The live traffic analysis is all done in RAM, while long term storage ends up in MongoDB.

The second speaker was Michael Hausenblas from MapR. He spends most of his time contributing to Apache Drill - an open source implementation of Google's Dremel.






Being an Apache project Drill is developed in an open, meritocratic way - contributors come from various different backgrounds. Currently Drill is in its early stages of development: They have a logical plan, a reference interpreter, a basic SQL parser. There is a demo application. As data backends they support HBase.

For most of the implementation they are trying to re-use existing libraries, e.g. for the columnar storage Drill is looking into either using Twitter's Parquet or Hive ORC file format.

In terms of contributing to the project: There is no need to be a rockstar programmer to make valuable contributions to Apache Drill: Use cases, documentation, test data are all valuable and appreciated by the project.

For more information check out the slide deck (this is an older version - this nights edition most likely soon to be published):



If you missed today's event make sure to get enlisted in the Hadoop Get Together Xing Group so next time you get a notification.

One thing to note though: When registering for the event - please make sure to free your ticket if you cannot make it. I had a few requests from people who would have loved to attend today who didn't get a ticket but would most likely have fit into the room.

ApacheConNA: Meet the indian tribe

2013-05-08 20:10
ApacheCon is the ``User Conference of the Apache Software Foundation''. What
should that mean? If you are going to Apache Con you have the chance of meeting
committers of your favourite projects as well as members of the foundation
itself. Though there are a lot of talks that are interesting from a technical
point of view the goal really is to turn you into an active member of the
foundation yourself. This is true for the North American version even more than
for the European edition.


Though why should you as a general user of Apache software be interested in
attending then? Pieter Hintjens put it quite nicely in an interview on his
latest ZeroMQ book with O'Reilly:




If you are using free software in particular in commercial setups you really do
want to know how the project is governed and what it takes to get active and
involved yourself. What would it take to move the project into a direction that
fits your business needs? How do you make sure features you need are actually
being added to the project instead of useless stuff?


ApacheCon is the conference to find out how Apache projects work internally,
the place to be to meet active people in person and put faces to names. Lots of
community building events focus on getting newbies in touch with long term
contributors.

How to get your submission accepted at Berlin Buzzwords

2013-05-07 11:21
Disclaimer: Intentionally posting on my private blog - these are my own criteria, not general advice from the review committee.

Berlin Buzzwords is in it's fourth year. Probably the most tedious task of all is having to select talks to make it into the final schedule. With roughly 120 submissions and roughly 30 slots to fill the result is that three quarters of all submissions have to be rejected. Last year I shared some details on how we do talk ranking given reviewers have provided their input.

Now the mechanics of ranking are clear, people have asked me what goes into the reviews themselves. Here I can only speak for myself: After doing reviews ourselves during the first two years, Simon, Jan and myself decided to spread the work of reviewing submissions among a larger team of people. As nearly all of them had attended Berlin Buzzwords in the past already (or had at least followed the conference remotely) we could assume they were roughly familiar with what kind of content would be a good fit. As a result review guidelines that we send out tend to be rather light:

Berlin Buzzwords is a conference from geeks for geeks: The goal is to get the people actively working in the field together to meet and exchange ideas. Content should have some technical depth - in particular pure marketing talks and obvious product placements without further technical value are not welcome. We usually invite both, interesting case studies as well as talks highlighting the technical details a project is built upon.



In the end judgement is up to the individual reviewer - so I can speak only for myself when listing what you should do to get your talk accepted.


  • Be on topic. There's always a handful of submissions that look and sound like pure marketing, product placement or simply aren't related to software engineering at all. Those tend to be easy to spot and weed out.
  • Tell us what you are talking about. An abstract is there to provide some detail on your presentation - don't be just funny, promising overly generic content. In order to decide whether or not your talk is relevant please provide some details on which direction you'll be heading.
  • Don't be too detailed in the abstract neither - there's no need to list the content of every slide. Make sure the abstract correctly summarizes your talk, making it catchy and nice to read usually helps if the content is solid.
  • We try to find those speakers that have not only an interesting topic to talk about but are also a pleasure to listen to, who can successfully get their point across. We cannot know every potential speaker in person though. As a result it helps if you list which conferences you've spoken at in the past, any videos of previous talks is helpful as well. As a general piece of advice: Choosing Berlin Buzzwords as your first conference to speak at ever usually is a great way to disaster. Get some practice at local meetups like the Berlin Hadoop Get Together, the data science day, the Java User Group Berlin Brandenburg, the RecSys Stammtisch Berlin or the MongoDB User Group Berlin to name just a few.
  • Make sure your talk is novel - submitting the same topic in 2012 and 2013 is a great way to ensure getting rejected. Also it is fine to submit a talk you have given at another conference earlier. However if everyone in the Buzzwords audience is very likely to have watched the exact same version of your presentation earlier already, we are less likely to accept your talk.
  • Finally: When drafting your bio make sure to include details that explain why you are the perfect expert to talk about the topic at hand. As much as I'd like to I don't know every project's committer by name. Provide some help by pointing out explicitly what your contributions have been or in what context you have used the technology you are presenting. Don't be shy to list that you are a co-founder of a successful project. Not only does this information help with selecting talks, it also provides some background for the audience to judge the claims you make.


Two words on the role of free software at Buzzwords: There is no explicit requirement to only talk about software that is publicly available under a free software license however if some project or framework is presented it helps to be open source to raise the applicability for the audience. Most projects discussed at Berlin Buzzwords are developed openly. In order to get the maximum out of these projects it pays to know how they work internally, how to get active yourself, how to contribute. As a result discussions and talks on project governance are generally welcome.

A parting note: With way more than half of all submissions to reject making a final decision will always be hard. Being rejected doesn't necessarily mean that your proposal was bad. Following the above advise may raise chances of being accepted - however it is no guarantee. We could raise the number of accepted talks by extending the conference by another track or even another day - at the cost of raising the ticket price substantially. However we want not only "big corp representatives" but a diverse audience, attendees that get active themselves, that help shape the conference:


There's plenty of space and time to get active in addition to the main conference program. Use the time and space to shape the conference.

Keepers of secrets - FOSDEM 09

2013-02-20 20:49

The closing keynote was given by Leslie Hawthorn whom I had the pleasure of meeting last year during Berlin Buzzwords. In her talk she shared insights into a topic commonly encountered in open source leadership that is way less often talked about than should be the case: Being in the role of a community leader people will talk to you about all sorts of confidential information and ask you to not share that information with other no matter how beneficial that might be for both parties.

Essentially if you've never been a community leader – it is much less about technical skills and way more about strategy, marketing, development events and unpaid therapy really.

Leslie first introduced the types of secrets:


  • There are lots of one-on-one communications. There are several small group conversations. After all this is what makes humans human. However no matter how much a community trusts the people meeting in small groups, ultimately someone will feel betrayed, someone will suspect evil things being drafted in those discussions – even though the conversation really may just involve the quality of the beer they had yesterday.
  • Being social entities we ultimately need input from our peers. This may mean that we require input on things we perfectly well know that we are not supposed to discuss these topics with anyone.
  • There are secrets that are only secrets when told to the wrong person. There is information that is shared publicly – as in “on a website that requires no authentication whatsoever” - but that due to the nature of how information is discovered by certain people will never make it to the right person anyway.
  • Some things are innociuous.
  • Some things are blindingly apparent, but aren't told anyway.


All of this becomes all the more interesting once you become a community leader. Your ultimate goal is to foster empathy and inclusion. You have to understand not only what you communicate, but also how to say certain things.

One example: Assume there is a contributor in a critical code path that is having a hard time privately and appears less and less often online. He told you the reason why, but asked you to not talk about it for whatever reason. On the other hand the community – being uninformed as they are – is loosing trust in the community member, blaming him for stopping progress. How should you react? Well, the three solution paths are extremely obvious but that doesn't make them any easier:

  • Encourage disclosure.
  • Ask for permission to disclose parts yourself.
  • Encourage the community to talk to the individual directly.

The worst you can do is to ignore the issue. Still that is what many people do, simply because it is the most comfortable solution. Go out of your comfort zone – your goal should be to make your project thrive.

What about that one person that just doesn't get they are hurting the project. The good-hearted person whose actions slow down the project? People on your project will get cranky, waste cycles on herding volunteer work if you avoid dealing with this person. There is no manual on dealing with frustrations and feelings in open source projects - though Poisonous people is a great intro to the topic by Brian Fitspatrick and Ben Collins-Sussman, so is their book on “Team Geek” published at O'Reilly:



Though these issues are messy and make you feel uncomfortable – do deal with them as quickly as you can, otherwise they will kill your project. Either correct the educational issues of the person in question, suggest other ways to be effective and ultimately be willing to kindly but sincerely ask the person to move on.

We have negotiations each day – most of them we do not notice as they happen in the comfort zone of “I like the person and our interests are very well aligned.” The more uncomfortable ones that we actually remember are the ones involving either constellations where we do not like the people involved but are well aligned, where we like the people involved but aren't well aligned or in the extreme case neither like the people involved nor are we well aligned. Especially in the uncomfortable situations it makes sense to remember that negotiations really come in up to six stages:

  • Being willing to openly ask for what you need
  • Asking for what you need.
  • Finding common ground and reaching agreement
  • If impossible, finding the best alternative for boot
  • If still impossible, agreeing to not having reached agreement.


Value honesty above all, but really do not be a tactless jerk. Diplomacy in order to reach your goals is ok: Ultimately you have to decide whether you want to be right or whether you want to win.

To summarise make sure you care about your project – the people in it will need most love when you have most reason to hate them.

One final recommendation after a question for leader burnout from the audience: Noticing burn out is as easy as observing that each morning you wake up with that “oh no, I don't want to do this, I want to walk away” kind of feeling wrt. to stuff that formerly used to be a lot of fun to do. First counter measure: RUN AWAY! Take vacation, turn of your electronics, hug a tree – get away from what is turning you down. After returning make sure you involve your peers in your work. If you cannot get on with your former pet project, find a successor. Nothing will kill your project faster than a burnt out leader dragging the project down. The reason for your burn out really can be as simple as having seen the same negative things over and over again so you do not want to deal with them yet again and having seen the same positive things over and over again so they do no longer give you any reward for your work. It may just be time to move on and do something else.

On making Libre Office suck less – a major refactoring effort - FOSDEM 08

2013-02-19 20:47
Libre Office is currently in a phase of code cleanup and refactoring that turns the whole code base upside down. What that means is that people need tooling to avoid quality from going down and allow for new features going in without too much risk. The project made good experiences with using gerrit for code review of patches, tinderbox for fast integration testing, strict whitespace checks to avoid unintended mistakes, use clang compiler plugins. They have less process that allows for change anywhere in any part of the code base.

There is an easy hacks page for people to get started quickly. I know that kind of thing from the Hadoop issue tracker and really appreciate having this to get new developers comfortable with the code base and all the tooling around. They apply reply-header mangling to allow for responses to go back to posters on their mailing list w/o prior subscription. They moved from their own dmake that wasn't industry standard to standard make tooling to build the project. They are in the process of translating all the German comments – shout-out to all German speaking readers of this blog: Help the Libre Office developers understand the code better by providing your German speaking skills for translation.

Some anecdotes: They found 4+ String classes in the code base and managed to get rid of one of them only recently. They are busy killing dead code, kicking out string macros, fixing cpplint warnings, refactoring code that was writing pre-STL into clean STL code, getting rid of obsolete libraries, fixing the windows installer, killing proprietary translation services and replacing that with an open one, getting rid of cargo-cult componentisation, getting rid of code duplication e.g. in the import filter implementation. They are reducing structure sizes for calc, switching to a layout based frontend, optimising the red lining for writer. The goal is to really have no no-go areas.

In order to retain quality in this fluid setup they opted for a drastic increase in unit- and integration test coverage, using bug documents as source for tests. Though the Bugzilla assistant they made it way easier even for non-experts and end-users to submit bug reports.

They are going for time-based, 6-monthly releases. Due to a long build time they are keeping track of all past binary builds for bi-section purposes – currently in git which most likely isn't the most ideal choice.

What works is putting graphs of bugs created vs. fixed over time in front of developers to keep the number of bugs low.

Within version 4.0 Libre Office is shipping:

  • better interop features for word documents with comments
  • RTF drawing imports
  • RTF improved formulae import
  • Docx annotation support
  • CMIS support for better interaction with Sharepoint, Alfresco and Nuxeo
  • More import filters e.g. for Microsoft publisher
  • Visio is now completely supported
  • Support for arbitrary XML to spreadsheet mappings
  • Conditional formulae
  • Stock option pricing support
  • Android remote control support for slides
  • Libre Logo integration for schools
  • Image rendering, smoothing, re-sizing and scaling was improved
  • Better support for right-to-left writing arabic languages
  • Style previews for fonts
  • Better unity integration


There even is an Android port in the works!

E17 - FOSDEM 07

2013-02-18 20:46
I'm really glad the NoSQL room was all packed on afternoon – otherwise I'd have missed an amazing talk by people behind Enlightenment – a window manager that is older than Gnome, nearly older than KDE and has been my favourite choice for years and years (simply because they have sensible default configuration options: focus follows mouse, virtual desktops that allow for desktop switching when moving the mouse close to the screen edges, menu opening when clicking anywhere on the desktop background, options for remembering window placement and configuration on re-boot etc).

Finally in December last year they actually did realease E17 after more than a decade of work. They now feature a tiling module, split desktops per screen, launchers, taskbars, systrays (*brrr*), screenshotting and multiple sharing options, custom layout modules for desktop and mobile.

There's a full fledged file manager that is also used as file selector. There is a compositor with wayland client support, that works decently even on old or slow hardware (think raspberry pi).
Their main goal is not to build a window manager that even your grandma can use. Rather they focus on stuff for the geeks that just works, is efficient, has lots of eye candy and when run on a nexus7 instead of unity saves 200MB of RAM.

Their main goal is to be a base for touch and mobile development. The number of desktops is shrinking giving way to more and more mobile devices. Fortunately the project is now sponsored (as in paid developers) by Samsung as part of their Tizen efforts. E17 does work as part of Tizen for years now, the only part missing is a product running the software available for purchase.

The goals for E18 (to be released end of 2013 – hear, hear) include going beyond the desktop, to polish things up, provide more default profiles for diverse devices, optimise battery and memory consumption, run without swap space, avoid going to memory instead of the cache to avoid draining the battery of mobile devices. There will be image and font sharing across processes, faster software rendering, async rendering with more threads. There's even thoughts to deal with different finger size issues on touch devices.

On using the composite manager as default: It made the code and optimisation a whole lot easier, though there are still issues with multiple screens that all switch compositing off in case of full screen games that cannot run with it turned on.

There will be work to integrate better with wayland, support for physics and sounds in themes, more compositing signals, improved gadget infrastructures, easier content sharing options – and all the cool stuff users can think of.