Four pillars: The disaggregation and reaggregation of search

Brendon Mclean tipped me the wink on Splunk. A search engine explicitly for logs and message queues and database transactions and the like, “IT information”. Sometime ago Chris Locke had told me about Krugle. Finding search code and related technical documentation. Dohop, from Iceland, concentrates on building the best travel search engine. Dibdabdoo is all about hand-finished web laundering for kids, using human judgement to validate kid-friendly content.

So. While Google go serious on Appliance and One Box, and people like FAST come at the enterprise in a different way, there are people spending time and energy building specialised search. And I’m still trying to work out why.

So I tried to see what attributes search could have. For example:

  • The space being covered: a disk or server or many of them, at one’s desk, behind a firewall, everywhere, the web proper.
  • The type of thing being covered: text or file or image or music or whatever, as narrow or as broad as needed
  • The way the space is covered or indexed or checked for changes.
  • The way the searcher interacts with the engine and the engine with the searcher, including personalisation and relevance heuristics

Early search was all about the space being covered and the way it was covered or made relevant. And as I understand more about the Splunks and Krugles of this world, the bulk of today’s innovation seems to be about the “type of thing being covered”, with a little bit on the interaction between searcher and engine. iTunes search became spotlight this way, I guess.
I wonder. I promised Steve Patrick and Phil Dawes I would never start a “semantic web project” at the bank, because our own internal equivalent of industry body and standards body and vendor would kick into overdrive to kill it every which way, a sort of natural antibody ever-present in large organisations, whereas what I wanted was a Steven Johnson emergence.

Maybe this, the emergence of the Krugles and the Splunks, is how some parts of the semantic web will come to be. The data that Tim Berners-Lee wants to see migrating to the web may not always get there via standards like RDF, however hard we try. Because standards are meat and drink to lock-in specialists, about as meaningful and as useful as governments and regulators in preventing lock-in.

But a million different Krugles and Splunks covering different areas deeply and doing it in such a way that information ecosystems can evolve? Some sort of high-cohesion-loose-coupling approach to layered search. Open on standards and agnostic on platforms and opensource in approach. [Opensource free as in freedom, not as in gratis, in case people think otherwise]. Guerilla and emergent in business model and approach. Maybe.

I could be talking absolute tosh, but isn’t that partly what blogs are for? To start the snowballs rolling and to see what happens. If I have to go by the progress made by the zillions of standards bodies in IT, I’d rather back the guerilla approach.

Tim Berners-Lee in the New Scientist

I was reading the New Scientist over the weekend, and came across this interview with Tim Berners-Lee. [My apologies, but unless you have a subscription the magazine won’t let you get past the article stub].

I’ll paraphrase what the interview said; any and all errors and misinterpretations are mine and mine alone. Where I have quoted directly from the article, this has been made clear.

  • Web was about putting documents and images online; semantic web is about putting data online.
  • We can publish articles and papers now, but not the underlying data. We need the data.
  • To publish this data we need a mark-up language for data. So we created RDF.
  • RDF lets you put data on the web and make connections so we have one big database.
  • When we free this data magical things can and will happen.
  • Some get the power of this; many don’t; the life sciences guys are good at getting it.
  • Privacy and data protection are issues, but nowhere near as much as people make out
  • Web did not fulfil potential for showing the “how”, stayed on the “what”
  • As HTML became a truly powerful presentation medium, looking improved and editing died
  • Blogs and wikis are helping change that, though we have much to learn about social software
  • “We have to learn about how people like to make groups and learn about the social systems involved in collaborations as well as the technical side of things”
  • “The internet was designed not to care what was done with it. It just moved packets of information from one place to another: the fundamental properties that make the internet work could not be held to ransom”
  • “The internet is all about division between layers”
  • “The web tries not to prefer one sort of information over another”
  • “The web needs to be the way it is to work”
  • “Before the web, and even now, a lot of the systems were being designed to be completely consistent. The way we’ve traditionally done that is to make top-down hierarchical systems, whether in organisations or in programming. This has always been considered a good thing. The maxims of top-down, structured programming are “information-hiding” so that modules don’t see into each other but are black boxes tied together at the edges.
  • “The maxim of the web, however, is if you have something important, give it a label and then people will link to it.
  • “….by trying to constrain ourselves to use hierarchical systems, we’ve reached the limit of scale”

Lots of good stuff. More later.

Not quite Four Pillars: Using technology to remember things or find lost things

I was intrigued to see this story about an RFID enabled purse that lets you know what’s not in it. While the specific story is unnecessarily sexist, the principle has potential. RFID enabled checklists.

And it made me think about something else.

I’ve lost an iPod nano and an iPod shuffle. At home. I know they’re both there somewhere. But where I know not. Again, I am less worried about these two iPods gone astray, they will resurface sometime. But wouldn’t it be nice to have a way of finding your (submerged) next-generation iPod? Is there a way already?

Butler, Ribstein and Sarbanes-Oxley

[Now how on earth did I move from Technorati rankings to Sarbanes-Oxley in one Saturday step? Easy when you know how. File Not Found to SOx via 404….]

The latest Economist, in an article entitled The Trial of Sarbanes-Oxley, reminded me of this document. It’s written by an economist and a law professor and well worth a read for those who are interested in such esoteric things. But then I’m told Einstein never wore SOx……

One paragraph in the Economist article stood out to me.

“Much of the blame for this should be pinned on accounting firms, which, despite being seen by the public as big offenders in the Enron and WorldCom scandals, have emerged as the big beneficiaries from SOX. According to Joe Grundfest, a former SEC commissioner, the audit industry has several incentives to “push Section 404 compliance to a point of socially inefficient hyper-vigilance”. To avoid further damage to their reputations, and to minimise the risk that they will be sued over accounting irregularities, audit firms are adopting the most prudent possible interpretations of the Section 404 rules — rules that are vague and open to argument. And, as Mr Grundfest points out, the “more onerous the requirements of Section 404, the more money the audit profession can earn” by selling its services.

Again, for those who are interested, please read Michael Power’s pamphlet The Risk Management of Everything, where he pretty much predicts the SOx debacle in style. Note to myself: must arrange to have lunch with Prof Power again soon.[An aside: I bought the pamphlet after reading a synopsis of his PD Leake lecture in 2004. Then, the only way to get the document was via Demos. Now Demos itself points you to Amazon, with no difference in price or conditions. Interesting]

“Incumbent to watch”

I was re-reading The Next Net 25 on Business 2.0 at a more leisurely pace than the first time around, you can find the whole article here.

In the article, all Gaul is divided into five parts:

  • Social media
  • Mashup and filters
  • The new phone
  • The webtop
  • Under the hood

An interesting list, one that I want to look at more carefully in the context of Four Pillars.

Against each of those classifications, they name five companies to watch. Interesting as well.

Also against each classification, they named “incumbent to watch”. Here’s the list of ITWs (I know, I just couldn’t resist the TLA -) )

  • Social media: Yahoo
  • Mashup and filters: Google
  • The new phone: Skype
  • The webtop: Microsoft
  • Under the hood: Amazon

 

There’s something obsolete-making about being called the “Incumbent to Watch”. It feels a lot like being called The Establishment during the 60s.

 

_39996929_13_ap.jpg

 

I can’t help hearing Terry saying “I coulda been a contender“. The question is, who’s Charlie?

And thank you BBC for the photo.