Thinking about web statistics

This post was triggered in part by a story in today’s Wall Street Journal, where an apparently biased survey suggested that millions of teenagers were buying alcohol online in the US.

The phrase “Lies, damned lies, and statistics” appears to be well over a century old, so I’m not treading new ground here; I think it’s more like combing through fossil remains. There are many attributions; the earliest published reference is in the late nineteenth century, and received wisdom suggests that the originator of the phrase was Disraeli, while it was given currency by Twain.

It is instructive to watch what’s happening with web statistics; the more I look at it, the more I feel that we need new ways of measuring what happens on the web.

I think the web is disaggregating something that has always been a centralised processes of collecting relevant numbers; I think the web is often disintermediating the central specialised body that produces the numbers as well. This makes a lot of people uncomfortable, since they lose the ability to preview and massage the numbers. And I think something else is going on; Chris Anderson’s Long Tail is not something that sits well with central-minded people either.

Let’s take a few examples:

Some months ago, RageBoy was commenting on how the geographical distribution of his readership seemed to vary according to the tool used, with some tools showing hits in the most outrageous places. And remember this is RageBoy I’m talking about, so I mean outrageous when I say outrageous. [Couldn’t find the original post. Chris?] I have seen some evidence to support this, and all I can say is that we’re not very good at this right now. Sure, new and better tools are coming along, but my perception is that Web 1.0 does not make this easy, the infrastructure can be gamed by accident or design.

Maybe a week ago, Miss Rogue commented that it was all a farce anyway, talking about how ranking and hits were being gamed; when she comes back from the film, she’d probably say these numbers are like Counting Snakes On a Plane :-)
And then today Doc Searls gave a Harry Frankfurt response to some noise on the web to do with A-list bloggers and traffic and hits and search engine optimisation and all that jazz. I quote:

Nick also says,
As the blogophere has become more rigidly hierarchical, not by design but as a natural consequence of hyperlinking patterns, filtering algorithms, aggregation engines, and subscription and syndication technologies, not to mention human nature, it has turned into a grand system of patronage operated – with the best of intentions, mind you – by a tiny, self-perpetuating elite. A blog-peasant, one of the Great Unread, comes to the wall of the castle to offer a tribute to a royal, and the royal drops a couple of coins of attention into the peasant’s little purse. The peasant is happy, and the royal’s hold over his position in the castle is a little bit stronger.
Bullshit.
Want to succeed in the blogosphere, or the Web in general? Easy. Do search engine optimization. Here’s how:

  1. Write quotable stuff about a lot of different subjects.
  2. Do it consistently, for months if not years.
  3. Link a lot, as a way of giving credit and of sending readers to other sources of whatever it is you write about.
That’s it.
I can’t promise royalty, because there isn’t any. But I can promise a rewarding relationship with the readers you’ll get, regardless of how many there are.

Wonderful stuff.

Update. Here’s Hugh on the Carr piece:

  • There are basically two rules of blogging:
  • 1. Nobody is going to read your blog unless there’s something in it for them.
  • 2. Nobody is going to link to your blog unless there’s something in it for them.
  • These two rules apply to us all, A-List and Z-List alike. If you don’t like these rules, you’re better off finding an ecology whose rules you like better. Life is short.

These are serious issues, I will come to the reasons shortly.

But in the meantime. Maybe many of us know that the numbers are not that reliable, but maybe many of us don’t care too much about it. I look at my Technorati rank, sure. And I learn something about how it works and what it means. And yes I get a kick out of being in the top 10K, but not that big a kick. Because I don’t blog for my technorati ranking. What I really use Technorati for is first and foremost to find stuff in the blogosphere. And then maybe learn a little about the wisdom as well as the madness of crowds, by looking at what appears to be popular, but only at the tag level. It is rare that I delve deeper. And I also use Technorati to find out who’s linking to me. If markets are conversations (which they are) and blogs are the opensourcing of ideas (which they are) then it seems to make sense to find out just who you’re talking to. Relationships not transactions. Covenant not contract.

Now to the meat. And why I wrote this post.
Traditional thinking, pre-web, pre-Long-Tail, liked to use surveys and sampling techniques and normal distributions and a bunch of other stuff in order to define something they called audiences and traffic. Traffic they liked to measure in order to figure out something called hits. Hits that denoted their incredible ability to market something called content.

Sampling. Traffic. Hits. Content. Stanchions of the past. Pillars that are the lychgate to the churchyard of an obsolescing age.

They just don’t get micromarkets and microconversations and non-broadcast-mode and non-centrally administered and not content and not audience and not hits.
An age that is not yet obsolete. An age where attempts will be made to maintain, even strengthen, these stanchions.

And how will this strengthening take place?

With numbers. Numbers that you and I know are, shall we say, weak. Numbers that will nevertheless be used to “educate” people, particularly those that create legislative support. Which translates into lock-ins and protection and “advertising” and annuity revenue streams and all that jazz.
So next time you see numbers that tell you just how many gazillion illegal downloads happened while you read this sentence, how many gazillion dollars it will take to provide the infrastructure for all this live TV that is clogging up the tubes and slowing down someone’s internet, how many multigazillion illegal copies of software already exist on consumer desktops, or for that matter how much of the internet is dedicated to filesharing, next time you see all this, don’t be surprised. Don’t ask “How can it be?”. You have to be able to measure the problem in order to get the protection.
When you see bloggers being called A-List against their wishes, don’t ask “How can this be?”. You have to have hits.

When you see web sites with unbelievable links and hits and distribution, don’t ask “How can this be?”. You have to have audiences and traffic.
But don’t worry. The Emperor Has No Clothes On. People will get wise to this. As the numbers get better. Which they will.

Patently not patents?

I used to be bemused, confused, even slightly irritated, decades ago, when I read stories about broad all-encompassing patents given on things that were patently public domain for millenia. Examples are attempts to patent the curative properties of turmeric, or the name and style and quality of basmati rice. Read this and related stories if you want to know more.

Some of you may be aware of the recent debates and lawsuit involving Blackboard and Desire2Learn, and possibly dragging in opensource providers Moodle and Sakai. You can find the BBC coverage quoting Michael Geist here.

I quote from the BBC article:

Interestingly, open source and internet tools are emerging as the first line of defence against the Blackboard patent and lawsuit. Angry educators have launched an online petition calling on Blackboard to drop the lawsuit and to agree to forego any future patent suits.

I am not sure whether open source information has been used as a defence before, but this becomes a case to watch and to learn from.

One of the other sites referred to, noedupatents, looks interesting, but I have not yet had time to research it.

This whole story is yet another reason why the current IPR regime needs changing.

Dannie, Clarence, any comments?

Musing about winners and losers

I’m still reading Pip Coburn’s The Change Function, where, amongst other things, he tips a number of winner and loser technologies.

It must have been over twenty years ago when I read William Murray’s Tip on a Dead Crab. In those days, if you were into sporting mystery, the only choices you had were Dick Francis or Dick Francis. There wasn’t even a Stephen Dobyns around. [Yes I did take my mystery reading seriously in those days, and still do.]
One of the things I really liked about Murray’s story was the story behind the title. Forgive my memory, it’s been a long time, but what I remember goes like this. Somewhere, maybe it was set in Australia, they used to bet on crabs. The bet was simple. Each crab had a number on its “back”. All the crabs were in a basket. The basket was emptied in the centre of a large circle on the ground. After n minutes, the crab nearest to the centre at the time was declared the winner.

I don’t know much about horse racing, but I was always under the impression that professional gamblers were interested in horses guaranteed to lose; there were never any guarantees about winning. So it was with a wry smile I read about Tip On A Dead Crab, since the tip, despite being on a dead creature, actually guaranteed a win.

And it was with all this in mind that I read Pip’s statements on future winners and losers. Which were the more valuable tips, the winners or the losers? I’m still working my way thru the book, so comments will only follow later.

Just thinking aloud. What would be more valuable for an enterprise CIO? I think it depends on the specific technologies and their entry/exit costs, but it made me think.

Musing about collaboration

My father, and his father before him, were financial journalists; and for a while so was I, until my father died suddenly in 1980.

They had an unusual approach to vertical integration as practised in those days. They wrote pretty much everything in the journal (with the help of a faithful few), edited it, printed and published it. Every week, around 32 pages of comment, all in English. And all this in Calcutta from 1928 to 1980. They owned the journal, the press, an ad agency and even a restaurant for the print workers.

The “flagship” part of this journal was a weekly 1500 word essay called Clive Street Gossip. Clive Street was the financial heartland of India for many years, until the “political” capital was rudely moved to the Lutyens-designed New Delhi, I think it was in 1910. [Before you say it: I was definitely not around at the time, however old I may seem]. Over the next forty years or so, the importance of Clive Street (and of Bengal, from a financial viewpoint) slowly waned as India lurched towards Independence, and by then it was Bombay that became the financial capital of the country.

Now Clive Street is primarily to be found on eBay, in vintage postcards, interspersed amongst the depictions of the Black Hole and the Great Eastern Hotel. But that was in another country, and besides..

Clive Street Gossip was written by Eavesdropper, the pen-name adopted by my father and his father. I wrote precisely one column using that name. I cannot be sure what images the column name evokes in you, but the reality was quite prosaic. They spoke of markets and of conversations, and of the social life of that information, in something that vaguely resembled a weekly blog with three or four posts every week.

And that’s what put the food on the family table.

So you can imagine what my early years were like. And why I studied economics, why I read voraciously, why I was so struck by The Social Life of Information and The Cluetrain Manifesto. Why I still continue to be struck by them.

But all this was largely before the Information Age; I’d never seen a real computer until 1980, except for a wondrous afternoon playing some form of Star Something on a Commodore Pet in the late 1970s.

What has entranced me since then is the magic of collaboration, the sheer unadulterated joy of co-creation. That may have been influenced at least in part by my Calcutta upbringing: I have often wondered whether it is even possible to do something alone in Calcutta. Anything. [You’re right, I have very fond memories of the place where I spent 23 unbroken years].

So when I think about information, about the internet, about identity and privacy and confidentiality, about patents and copyrights and digital rights and intellectual property, it is always in the context of collaboration.

And currently, I am wrestling with two issues. Particularly as a consequence of the availability of social software at affordable price points in open architectural models, when I can see the possibility of the collaborative magic happening.
One, is there such a thing as group selection, in a Darwinian natural selection sense? Do groups have adaptive capacities? Do social organisms evolve on a natural-selection basis? The concept is not new, but fell way out of favour in the Sixties, and never really resurfaced. And I think it would be really useful to model enterprise behaviour in the context of group selection, both within and well as beyond the enterprise boundary.
Two, was anything ever truly invented by one person in isolation? Here I am not referring to the serendipity aspect, where more than one person “simultaneously” invents something. What I mean is the impact of the group operating around and sometimes under the guidance and tutelage of the “inventor”, or sometimes in partnership.
My interest in the concept of group selection comes from a number of drivers:

  • One, I think it is a good way to understand cellular social organisms ranging from modern church movements through to social or professional networks, even terrorist organisations.
  • Two, it best explains why we have this counterproductive Blefuscu-versus-Lilliput polarisation about everything that matters nowadays. Groups will tend to attack others and protect their own.
  • Three, it allows me to consider and integrate the concepts and issues raised in Emergence and in Linked and in The Tipping Point, amongst others. On relationships and networks and flow.
  • Four, it lets me work in that hard-to-find space where religion and science aren’t necessarily in conflict.
  • And five, it helps me understand, define and defend altruism.

To move this forward, I am currently reading some of the works of David Sloan Wilson, amongst others; if anyone knows of other works they would be prepared to recommend, I’m all ears. Or should that be eyes? Conversations. Ears.

I am also looking as deeply as I can into the process of invention and of music/literature creation. Who did the inventing or creating. Who helped. How they learnt from mistakes. What the original idea was, and what was patented or published. What type of patent. What the time delta was between original thought, the experiments and failures, the final patent or product. Were Lennon and McCartney collaborators? An author and his/her editor? A husband and wife? Is a family a group that behaves selectively? No Man is an Iland.
To move this forward, I am busy acquiring hardcopy of original patents and a whole pile of literature ranging from scientific papers through to biographies and autobiographies. And reading them. Again, if anyone out there has some pointers to give me, I’d be grateful.

I think these two aspects are crucial to our understanding of collaboration. Does group selection exist, and if so how does it work? Is invention or music/literature/art creation a solo process or is it truly collaborative and only superficially solo?

More on Project ROIs

Following on from a variety of posts in the blogosphere (including his own), Dennis Howlett has written a serious and considered kernel for what could be a really worthwhile discussion on Project ROIs. Please do read it and respond accordingly, Dennis makes an open invitation calling for participation.

In a strange kind of way, maybe we can “prove” the value of blogging, the “ROI” of blogging, by using blogs to develop a sensible way of measuring project ROI….