The Secret Agent Man
How Mata Hari Works
Mata Hari uses specialized agents, or bots, to query each of the over 140 search
engines available in its list of resources. This is because Boolean operators and syntax
are supported differently from one search engine to another. To make the development and
modifications of these agents a relatively simple matter, The WebTools Company has
developed its own 5th generation language called Virtual IQ (VIQ). Eventually,
the company plans to make VIQ available to customers.
You can also use Mata Hari with databases on local networks, however, custom bots must
be written in VIQ to query these databases. (Currently, the only way to get these agents
is to have The WebTools Company build them for you.) Having the ability to apply Mata Hari
to the databases on your Intranet opens up some very interesting data-mining
possibilities. For instance, you could use the analytical power of the program to quickly
and easily create custom reports that normally would require complicated SQL statements.
This could, obviously, have enormous payoffs in terms of conserving time and increasing
management information.
Mata Hari uses the average of several advanced statistical computations to score
documents for relevance, as shown in Exhibit
5. The combination of these techniques enables the program to deliver results that are
significantly superior to conventional searches. They include the standard Boolean, Vector
Space Model (VSM), Extended Boolean Information Retrieval (EBIR), and the modified EBIR
(mEBIR) methods.
Search engines are queried with the standard Boolean method (i.e., conjunctions,
disjunctions, and negation). Amazingly, Mata Hari can even perform Boolean searches on
engines that lack support for Boolean operators. According to Jerry Tardif, Mata Hari is
able to conduct Boolean operations on search results as they are returned from these
engines. Once the pages associated with results are located and downloaded into a local
database, Mata Hari applies the VSM, EBIR, and mEBIR methods to compute the relevance
scoring.
VSM draws on the power of set theory to identify similarities in documents. EBIR is
based on mathematical computations of the distance between Boolean operators and it blends
VSM with Boolean logic for better results than either method can achieve separately. To
improve the accuracy of scoring further, Mata Hari employs a modified EBIR (mEBIR) that
accounts for the frequency query terms appear in documents. Mata Hari also offers a
"Relevance Feedback" feature that lets you select up to 32 documents, as if they
were a template, and then re-ranks and scores all the other pages against them. The result
is a new ranking that indicates the pages most like the originals.
Return
|