Why write my own indexing and search engine
January 30, 2007
Like any developer, I used to trust only the code I wrote myself. All that has changed with OpenSource and its widespread use.
However Iam always on the lookout to write something that is better than what is available in OpenSource – personally a means to justify the act of writing an application :)
The growing volume of data in an enterprise can be a liability or an asset, depending on how you see it. Access to this data converts it to useful information.
How does one access information easily? Do we really care about the millions of hits that Google returns? I dont think we go beyond the first couple of pages.
I define “Effective search” to address the above issue – I need to get to the information of interest fast, period.
OpenSource indexing and search frameworks are far behind the commercial ones like a Google search appliance or the Verity or other search engines.
Looking around, Lucene turned out to be a good fit for my index. The catch is I still required parsers, readers and data sources to make it complete.
This led me to write Ferret. It doesnot re-invent the wheel i.e wherever possible.
The good news is that it can index file systems & web sites(secure inranets and public sites). The best part is that it is highly customizable – I can add a datasource to index databases for e.g or add parsers to new file types.
The recent announcement on availability of Omnifind led me to evaluate it and of course compare with Ferret. After some extensive study of its features, Iam still to find out if I will be able to recommend it to a client when I cannot customize many aspects except the look & feel maybe. Also it beats me why I cannot schedule an indexing operation or atleast provide API to invoke the indexer! Omnifind suits the “indexing for dummies” needs but not for any active deployment within a coprorate portal for e.g.
For now, Ferret does all this and has found a client :)