Monday 14 January 2008

Carrot2: New Open Source Search Results Clustering Engine

My previous post informed you about this new "Wikia Search Engine" that is available on the Internet since last week, and today I learned from Mary Ellen Bates newsletter that there is another new search engine available for Info's to explore in time of desperate need. This is what Mary says about Carrot2.
Carrot2 (http://demo.carrot2.org), an open source search-results-clustering engine, just recently out in beta. In a nutshell, it takes search results, analyzes them and, on the fly, creates groups of the most common concepts or terms from those results. Since this is all done by algorithms rather than by humans, expect the odd result every once in a while, but I found the clusters to be consistently useful.
Carrot2's default is to search the web using eTools.ch, a Swiss meta-search engine that queries 10 search engines, including Google, Yahoo, Ask and MSN. However, since eTools only returns the top 20 results from each search engine, I prefer not to use eTool search results. Instead, you can click a tab to limit your search to Google, Yahoo, MSN, Wikipedia, PubMed and a few other finding tools. Because clustering is a computationally intensive process, Carrot2 limits the search results by default to the top 100 results from any of the search engines. However, you can click the Show Options link and set Carrot2 to search and sort up to 400 results. (Note that increasing the number of search results also increases the number of results from each search engine when using the eTools meta-search engine from 20 to 40.)
Geek that I am, I find it even more intriguing that, under that "Show Options" link is a pull-down menu that lets you select which of six different sorting algorithms you want to use. The clustering results are dramatically different (although keep in mind that the search results themselves stay the same -- only the clusters change). With my "social capital" search, I was able to see a variety of groupings of my search results, and identify some of the key writers and terms.
Carrot2 may not be your day-to-day search tool, but it is tremendously useful for those searches in which it is difficult to sift the wheat from the chaff.
www.BatesInfo.com/tip.html.
P.S. I have just had a look at this Search Engine. It is not difficult to use and explore. I will add direct link on this blog, so it will be easier for you to access when you need.

No comments: