I feel at ease. For once I am not confused. At least I am less confused than I was earlier.
You remember the sequence of posts I wrote about opensource and gatekeepers? [Those new to the conversation can find them here, here, here, here and here, in chronological order. Alternatively you can enter "gatekeeper" into the search box in the sidebar and get the same results. But it's not important.]
I’ve been tracking what Aaron Swartz has been saying about Wikipedia in Raw Thought, his blog. Now why would I do this? Simple. In a strange kind of way I think that Wikipedia is the Drosophila of social software, much like a recent Scientific American article suggested that chess could be the Drosophila of cognitive science. You can measure things in Wikipedia, conduct experiments, analyse what’s happening, predict results, compare actuals against predictions. Wikipedia provides the nearest thing to a live laboratory for the study of social software. For now, anyway.
In a recent post, headlined Who Writes Wikipedia, Swartz has some interesting analysis of what happens there. I quote from the post:
- Wales decided to run a simple study to find out: he counted who made the most edits to the site. “I expected to find something like an 80-20 rule: 80% of the work being done by 20% of the users, just because that seems to come up a lot. But it’s actually much, much tighter than that: it turns out over 50% of all the edits are done by just .7% of the users … 524 people. … And in fact the most active 2%, which is 1400 people, have done 73.4% of all the edits.” The remaining 25% of edits, he said, were from “people who [are] contributing … a minor change of a fact or a minor spelling fix … or something like that.”
They expected to find an extreme Pareto Law in operation. They looked for it. And they were not disappointed. They found it.
There were critics of the findings, who felt that counting the raw number of edits was not a useful indicator, that there should be a measure related to that which was created, the body text. [Notice I didn't say Content. Bad word.] And Wales indicated this would happen over time.
When I first heard this, I couldn’t get my head around the Pareto-versus-Long-Tail tension. Surely Wikipedia should behave Long-Tail rather than Pareto? And Wisdom-Of-Crowds Emergence surely cannot depend on a small gatekeeper fraternity, because that is hard to scale? But I’m used to the gatekeeper mentality, so I just stayed quiet. And confused.
Swartz was curious, and did a little bit of homegrown investigation. Here are his early observations:
- Wales seems to think that the vast majority of users are just doing the first two (vandalizing or contributing small fixes) while the core group of Wikipedians writes the actual bulk of the article. But that’s not at all what I found. Almost every time I saw a substantive edit, I found the user who had contributed it was not an active user of the site. They generally had made less than 50 edits (typically around 10), usually on related pages. Most never even bothered to create an account.
So there was some reason to look at what the critics (of the original findings) were saying. Maybe analysis at number-of-edits level was not particularly useful; maybe the amount of original text created would be a better yardstick.
Here’s what Swartz found, admittedly in a simple “homegrown” experiment:
- To investigate more formally, I purchased some time on a computer cluster and downloaded a copy of the Wikipedia archives. I wrote a little program to go through each edit and count how much of it remained in the latest version. Instead of counting edits, as Wales did, I counted the number of letters a user actually contributed to the present article.
- If you just count edits, it appears the biggest contributors to the Alan Alda article (7 of the top 10) are registered users who (all but 2) have made thousands of edits to the site. Indeed, #4 has made over 7,000 edits while #7 has over 25,000. In other words, if you use Wales’s methods, you get Wales’s results: most of the content seems to be written by heavy editors.
- But when you count letters, the picture dramatically changes: few of the contributors (2 out of the top 10) are even registered and most (6 out of the top 10) have made less than 25 edits to the entire site. In fact, #9 has made exactly one edit — this one! With the more reasonable metric — indeed, the one Wales himself said he planned to use in the next revision of his study — the result completely reverses.
Read the article for yourself. I have no idea who Swartz is, never met him, never even heard of him until I decided (following the gatekeeper debate) to try and understand what makes Wikipedia tick. That’s all.
One swallow does not make a summer. Swartz’s findings are based on a relatively small sample; and whenever you study anything remotely connected with society, there is a high risk that you bring in cultural bias and what Stephen Jay Gould called Reification (in The Mismeasure of Man). I am not, therefore, suddenly extrapolating Swartz’s findings into a resolution of the MidEast crisis.
What I am doing is sharing those findings with you. They support Wisdom of Crowds. They support Emergence. They support micromarkets and microbrands. They support Long Tail rather than Hit Culture. They support markets being conversations. All these are about people and relationships and access and empowerment, but in different fields and with different perspectives.
And most importantly, they suggest that the success of social software is based on everyone doing something about whatever it is they are passionate about. Everyone. Passion.
There’s an important principle in there somewhere, which (I hope) will be borne out by more intensive and formalised study:
Given enough eyeballs, all bugs are indeed shallow….provided there is code to look at. The value starts with the created output, not with the bug.
Gatekeepers can and should be eyeballs that help us all remove bugs and improve the pool of ….code….information….ideas….whatever.
What worried me about gatekeepers, something I could not articulate well before this, was the ability of the gatekeepers to restrict creativity. Particularly when we are yet to grow out of the Blame Culture and Hierarchy and Command and Control.
Without creativity there is nothing, no shallow bugs, no bugs, no nothing. Nothing.
Let’s not forget that as we experiment with, build, adapt and evolve governance models for social software.
To put it in Swartz’s words:
- Even if all the formatters quit the project tomorrow, Wikipedia would still be immensely valuable. For the most part, people read Wikipedia because it has the information they need, not because it has a consistent look. It certainly wouldn’t be as nice without one, but the people who (like me) care about such things would probably step up to take the place of those who had left. The formatters aid the contributors, not the other way around.
The formatters aid the contributors, not the other way round. Great stuff.