BotSpot has been updated!
Go to http://www.botspot.com to find what you need


The Secret Agent Man

How Mata Hari Works

Mata Hari uses specialized agents, or bots, to query each of the over 140 search engines available in its list of resources. This is because Boolean operators and syntax are supported differently from one search engine to another. To make the development and modifications of these agents a relatively simple matter, The WebTools Company has developed its own 5th generation language called Virtual IQ (VIQ). Eventually, the company plans to make VIQ available to customers.

You can also use Mata Hari with databases on local networks, however, custom bots must be written in VIQ to query these databases. (Currently, the only way to get these agents is to have The WebTools Company build them for you.) Having the ability to apply Mata Hari to the databases on your Intranet opens up some very interesting data-mining possibilities. For instance, you could use the analytical power of the program to quickly and easily create custom reports that normally would require complicated SQL statements. This could, obviously, have enormous payoffs in terms of conserving time and increasing management information.

Mata Hari uses the average of several advanced statistical computations to score documents for relevance, as shown in Exhibit 5. The combination of these techniques enables the program to deliver results that are significantly superior to conventional searches. They include the standard Boolean, Vector Space Model (VSM), Extended Boolean Information Retrieval (EBIR), and the modified EBIR (mEBIR) methods.

Search engines are queried with the standard Boolean method (i.e., conjunctions, disjunctions, and negation). Amazingly, Mata Hari can even perform Boolean searches on engines that lack support for Boolean operators. According to Jerry Tardif, Mata Hari is able to conduct Boolean operations on search results as they are returned from these engines. Once the pages associated with results are located and downloaded into a local database, Mata Hari applies the VSM, EBIR, and mEBIR methods to compute the relevance scoring.

VSM draws on the power of set theory to identify similarities in documents. EBIR is based on mathematical computations of the distance between Boolean operators and it blends VSM with Boolean logic for better results than either method can achieve separately. To improve the accuracy of scoring further, Mata Hari employs a modified EBIR (mEBIR) that accounts for the frequency query terms appear in documents. Mata Hari also offers a "Relevance Feedback" feature that lets you select up to 32 documents, as if they were a template, and then re-ranks and scores all the other pages against them. The result is a new ranking that indicates the pages most like the originals.

 Return