Talk:Open Source Metrics

From Civic Commons Wiki

Jump to: navigation, search

Contents

Background

A nonprofit called the Open Forum Foundation, based in DC, does a diversity of things around improving citizen engagement in government and are currently working with OpenTheGovernment.org to encourage the US to meet the commitments it made in its National Action Plan as part of the Open Government Partnership. Two of the US commitments are to open source the platforms behind data.gov and We The People. As part of our work, we want to gauge how successfully the government has accomplished these goals while also providing a roadmap to them of what they need to do to really do it right.

Along with Open Government Partnership (OGP), a cross-national effort, they are looking at items 1.1 and 3.2 on this page: http://makeopengovernmentwork.wordpress.com/ (Text in black is governments' commitments; text in green is recommendations put together by the team.)

Interesting people to ping

Who else has been working on open source metrics lately, and would be good to ping?

Peter Loewen?

Specifically on the types of questions you ask, Pia Waugh had a test she applied and wrote about a few years ago: http://flosshub.org/content/foundations-openness

Simon Phipps worked on an openness index also for this type of question: http://blogs.computerworlduk.com/simon-says/2011/02/the-open-by-rule-governance-benchmark/index.htm

In terms of people working on metrics, Rich Sands and Abhay Mujumdar are working on adding some of the metrics around mailing lists & bug trackers to Ohloh, the Libresoft guys http://libresoft.es/ are developing free software to parse and store mailing lists, bug trackers and source code repositories, and get quantitative information out of them. Ross Turk was doing some forums metrics with Talend last time we met, and Karsten Wade is doing a lot of metrics stuff inside Red Hat. There's also a metrics working group mailing list Karsten started - I don't know exactly who's on there, it hasn't been very active: metrics-wg@theopensourceway.org

And then there's Dave Neary, Dawn Foster and Sumana Harihareswara. Wikimedia actually has a significant metrics team working on data analysis there. And it's probably worth talking to Donnie Berkholz and Stephen O'Grady from RedMonk too - Donnie did a lot of metrics on Gentoo a while back. And one more for the road: Paul Adams from KDE and Germàn Poo Caamano from GNOME have both done a significant amount of metrics and visualisation work on the respective projects.

Diederik Van Liere of Wikimedia foundation

Some academics tinkering (or who have tinkered) around this area that might be interesting to bounce things by:

  • Finne Boonen
  • Don Davis
  • James Howison

...there are more, but they're either (1) qualitative instead of quantitative, or (2) extremely well-established professors instead of graduate students or young faculty, and thus possibly harder to get feedback from in the initial stages.

Also, Leonard Richardson (husband of Sumana Harihareswara, who's been mentioned in this thread already) has a wicked knack for finding interesting statistics and would be a fun brainstormer to throw into the mix.

References

Any publications, blog posts, etc that you've made or that you know of that are relevant to this?

Instrument development

If we could develop metrics for what it means to actually open source a project so that it's meaningful and useful to a community beyond its original developers, then that would be a very useful guideline for agencies to work with, especially among OGP members, but also across governments generally. They're hungry for it.

A lot of these metrics need some kind of qualitative analysis - basically, someone looks, and scores a project on a scale (may be 0 or 1, may be 1 to 10) -- Dave

Naturally, I have some thoughts on this topic too. I'll insert a few of them here just as way of kicking off the discussion -- these are more meant to start discussion than to set its parameters. There are some obvious, standard tests that I suspect we'd all agree on:

  1. Source code is available under an open source license
  2. In a public, version-controlled repository...
    1. and this repository is the master repository that the most active developers (e.g., the original release devs) use for their daily commits, i.e., it is not a second-class copy to which changes are grudgingly pushed occasionally, while the real development master lies hidden behind some firewall.
  3. With a public discussion forum for developers...
    1. that they actually use, instead of sending private emails amongst themselves to make development decisions.
Once we have 2(a), you could probably do this with MailingListStats. --Dave
    1. Not crucial whether the forum is email-based or web-based or both, but it's better if it supports both, and not ideal if it doesn't support email.
It depends on the project and community type. For a core developer community, email is a must - Dawn can testify that active developers don't use forums, and active forum users don't use mailing lists. --Dave
  1. With (possibly) a separate discussion forum for user questions...
Yes, it can be. The important thing to measure is the time lapse between question and answer, I think. Also would be interesting to look at the time lapse between first question and first answer - are new forum users being given the confidence to answer questions? --Dave
    1. but if no developers ever listen and respond on that list, that's a bad sign
  1. With a public bug tracker...
Again, cross-reference 2(a) with something like direct database queries or Bicho for this - you can see how active the active coders are in the bug forum. --Dave
    1. and it should be the master bug tracker, the one actually used by the most active developers
  1. With (possibly) a real-time chat room forum (e.g., IRC)...
    1. not crucial, but always a good sign.
    2. I don't think there's been enough research on activity levels and activity types in real-time forums (which, after all, have

somewhat more interesting timing data associated with their responses than non-real-time forums like mailing lists do), but correct me if there has actually been research on this.

I'm not convinced IRC is useful beyond a certain scale. Successful IRC channels have a certain number of characteristics in common with successful neighbourhoods: good behaviour is reinforced, bad behaviour is discouraged through social norms, there is a mix of uses so that there are always enough people around to reinforce the positive culture. All too often IRC channels end up being like public parks - hang-outs for people behaving badly, resulting in the people who wander through to have bad experiences and stay away, reinforcing the cycle. IRC channels that work very well are those where you have a distributed worldwide developer community who use it as a work tool for real-time communications, while avoiding the trap of using it as a replacement for more public mailing list posts. Something like a mailing list for co-workers to hang out in, where anyone else is also welcome (like some of the Mozilla channels). --Dave
Have a look at SuperSeriousStats - it may help you do some of that analysis. --Dave

That's all obvious, and is more "pass/fail tests" than "metrics".

The only non-obvious point I would want to make at this stage is that, once the above standard tests are met, the best place to measure project health & rate of usage growth is *not* in the discussion forums, but rather in the bug tracker. I've written a bit about this at http://www.rants.org/2010/01/10/bugs-users-and-tech-debt/. Short summary: "The bug report rate is a proxy for the user acquisition rate." and (paraphrased) "there may be more to be learned about a project from watching interactions in its bug tracker than interactions in just the conversational forums".

from David:

  1. Is there at least two installations being run by separate government entities (and likely many more)
  2. How many vendors offer support
  3. Some metrics around governance and development roadmap

To close off, I actually think that for the vast majority of governments the issue of open source versus closed source is really not that relevant, they are much more interested in SLAs, support levels, depth/quantity of vendors and reliability of product. I actually think the best thing to do is to stop talking about open source as a distinct class of product and instead talk about how to measure it along these lines against proprietary software. Only a small number of governments are actually going to create and share code, many governments have very few (if any) actual software developers. My own work in this space in Vancouver was entirely focused on figuring out how to put open source solutions on an equal playing field with proprietary solutions - and on almost every issues, it boiled down to an issue of sustainability, which of course, I often believe open source does better on. --David

Mel:

My guess is that this is going to need to be a quantitative *and* qualitative measurement, taken over time (rather than a snapshot-in-time numerical answer, although http://www.theopensourceway.org/wiki/How_to_tell_if_a_FLOSS_project_is_doomed_to_FAIL does a pretty decent job of the basics and is hilarious).

Instrument design (which is what you're talking about here) relies on a couple assumptions, so it might be helpful to lay those out first:

  • how much in the way of resources is it acceptable for the instrument to consume for a single run? (which is a roundabout way of saying "do we need to be able to run it in 30 seconds with a shell script, or can we have a random observer spend 4 hours doing assessment for this, or do you need a panel of multiple FOSS experts for a 2-hour block, or...")
  • what's the scope of projects/aspects this needs to be able to tackle? (only software projects of a certain type? content and hardware also? primarily assessing code health, or do you want separate metrics for documentation/translation/marketing/QA/etc subgroups? ok to start with an English-only variant of the test first, then figure out i18n? etc.)
  • who will be the consumers of the end information given by this instrument, and what sorts of decisions are they trying to make? (so you say governments -- but what are they trying to figure out?)