Four pillars: The disaggregation and reaggregation of search

Brendon Mclean tipped me the wink on Splunk. A search engine explicitly for logs and message queues and database transactions and the like, “IT information”. Sometime ago Chris Locke had told me about Krugle. Finding search code and related technical documentation. Dohop, from Iceland, concentrates on building the best travel search engine. Dibdabdoo is all about hand-finished web laundering for kids, using human judgement to validate kid-friendly content.

So. While Google go serious on Appliance and One Box, and people like FAST come at the enterprise in a different way, there are people spending time and energy building specialised search. And I’m still trying to work out why.

So I tried to see what attributes search could have. For example:

  • The space being covered: a disk or server or many of them, at one’s desk, behind a firewall, everywhere, the web proper.
  • The type of thing being covered: text or file or image or music or whatever, as narrow or as broad as needed
  • The way the space is covered or indexed or checked for changes.
  • The way the searcher interacts with the engine and the engine with the searcher, including personalisation and relevance heuristics

Early search was all about the space being covered and the way it was covered or made relevant. And as I understand more about the Splunks and Krugles of this world, the bulk of today’s innovation seems to be about the “type of thing being covered”, with a little bit on the interaction between searcher and engine. iTunes search became spotlight this way, I guess.
I wonder. I promised Steve Patrick and Phil Dawes I would never start a “semantic web project” at the bank, because our own internal equivalent of industry body and standards body and vendor would kick into overdrive to kill it every which way, a sort of natural antibody ever-present in large organisations, whereas what I wanted was a Steven Johnson emergence.

Maybe this, the emergence of the Krugles and the Splunks, is how some parts of the semantic web will come to be. The data that Tim Berners-Lee wants to see migrating to the web may not always get there via standards like RDF, however hard we try. Because standards are meat and drink to lock-in specialists, about as meaningful and as useful as governments and regulators in preventing lock-in.

But a million different Krugles and Splunks covering different areas deeply and doing it in such a way that information ecosystems can evolve? Some sort of high-cohesion-loose-coupling approach to layered search. Open on standards and agnostic on platforms and opensource in approach. [Opensource free as in freedom, not as in gratis, in case people think otherwise]. Guerilla and emergent in business model and approach. Maybe.

I could be talking absolute tosh, but isn’t that partly what blogs are for? To start the snowballs rolling and to see what happens. If I have to go by the progress made by the zillions of standards bodies in IT, I’d rather back the guerilla approach.

Let me know what you think

This site uses Akismet to reduce spam. Learn how your comment data is processed.