BotSpot has been updated!
Go to http://www.botspot.com to find what you need

Best of the Bots
Launch Queries on an Intranet or Across the Web
By Don Barker

Don Barker, Columnist Mata Hari v1.10, from The WebTools Company, is the latest release of a powerful metasearch tool that now lets you compose sophisticated querydblogo.gif (3188 bytes) statements to gather the exact information you want from the Web. Like the original seductive World War I spy and exotic dancer, Mata Hari is quick on its feet. So fast, according to Jerry Tardif, the President of The WebTools Company, that it beats all the competition in delivering qualified search results. In addition to blazing speed, this client software lets you construct precise search statements to gather information from more than 140 search engines simultaneously.

A recent study by NPD Online Research found that 77% of Web users employ multiple search engines to find information. In another report, researches noted that even the best search engines only index approximately one third of the Web but coverage improves dramatically when using multiple search sites. Thus, metasearch tools have become a necessity in today's Web environment, where users must wade through over 400 million pages to locate relevant information.

However, most metasearch bots do little more than add to information overload because they don't provide a way to tailor your queries or filter out duplicates and documents in foreign languages.smlmh.gif (1668 bytes) Mata Hari v1.10's Universal Search Power solves these problems by letting you use a vast range of Boolean operators in a single easy-to-use interface that works with over 140 search engines to narrow or broaden your searches, plus filters out unwanted documents in other languages and pages from high level domains (e.g., .mil or .gov). Since each search engine varies in the way and the extent it supports Boolean operators, this is quite a feat.

Mata Hari is a relatively painless download (1.5 MB) and install. Its Main Screen lets you compose two structured queries for simultaneous searching of multiple search engines, as shown in Figure 1. The tabs across the top of the Main Screen allow you to specify filters and "groups" on the Internet to search (e.g., Computers, Education, Health-Medicine, Jobs, Mailing Lists, Movies, News, SciFi, and UseNet). Mata Hari also lets you create your own custom filter and engine groups. The filtering for Mata Hari v1.10 is not as granular as BullsEye (i.e., Mata Hari doesn't offer a simple way to eliminate secondary domain names -- you have to modify the configuration file), but it does offer the ability to limit searches by page size and the date last modified.

The Query 1 and Query 2 input boxes let you enter highly structured search statements composed of this exhaustive list of Boolean operators and syntax:

  • AND -- stipulates that terms on both sides of this operator must be present somewhere in a document, or data, to be scored as result, or hit
  • OR -- specifies that terms on EITHER side of the operator are sufficient to be returned as a hit
  • AND NOT -- indicates that terms appearing after this operator eliminates the document, or data, from being reported as a result
  • NEAR, BEFORE, AFTER -- specify the number of words that can appear between two search terms and their order in a document for it to be counted as a hit (placing NOT in front of each of these operators requires inverse behavior to be accepted as a hit)
  • Wildcards (stemming) -- beginning characters that must match the same beginning characters in a document's words in order for it to be scored a hit
  • Phrases -- series of words, enclosed by double quotes, that indicate a document must contain the series exactly as shown for it to be scored as a hit
  • Parentheses -- specify the sequence in which operators should be evaluated, starting with the inner most ones and working out

As you can see, the support for Boolean operators and syntax in Mata Hari is unsurpassed by any other metasearch tool. In fact, Mata Hari's Universal Search Power even imbues Boolean search functionality to search engines without native support for these operators. If you mark the Strict Boolean adherence checkbox on the Main Screen, "...all selected engines will perform as Boolean engines whether they are Boolean engines or not. The same is true of filtering, phrase searching, plain-text searching, use of special search characters, etc." says Jerry Tardif.

As a consequence, Mata Hari lets you construct the most precise metasearch statements possible. For example, let's say you want to find out the latest information about the changes of an asteroid hitting Earth. You begin by choosingearth2.jpg (10293 bytes) the keywords and phrases that most accurately describe the topic (e.g., Earth, asteroid, hit, hitting, strike, striking, etc.). Next, you combine these search terms, with the necessary Boolean operators and syntax, to form a search statement that includes everything you want to find while excluding the pages you don't want to see. In this instance, the query will be "Earth AND asteroid AND NOT movie AND hit OR hitting OR strike OR striking," as shown in Figure 2

Structured queries like this one are the most effective way to search the Internet but they can be difficult to construct properly. Fortunately, Mata Hari provides a handy feature that allows you to look at the actual Boolean expression that will be sent. Just click the button at the end of the Query input box to display the structured search statement, as shown in Figure 3. The parentheses indicate the order, or precedence, in which the search terms will be processed by the search engines. Processing starts with the inner most search terms and moves out (for details on how to construct structured queries, see The WebTools Company award-winning search tutorial at http://www.thewebtools.com/searchgoodies/tutorial.htm).

For example, the search terms "Earth AND asteroid AND NOT movie" have the most parentheses around them, so this group will be processed first and together. As a result, search engines will look for documents that contain both "Earth" and "asteroid" but not "movie." The subsequent search terms connected by the Boolean OR operator indicate that at least one of the terms "hit," "hitting," "strike," or "stricking" should also be present in the pages.

Of course, adding more terms like "slam," "slamming," "collide," "colliding," etc. would return additional useful results. Unfortunately, the current version of Mata Hari limits the length of structured queries, so your only recourse is to construct a second query using the other search terms. This is inconvenient and it will hopefully be resolved in the next release.asteroid.jpg (8349 bytes)

Once you are satisfied with the structure of your query, you simply click the Search button on the Main Screen to begin the search. After a moment, the Search Progress Details window appears, as shown in Figure 4. This window keeps you updated about the status of the search. The top portion contains a series of counters to track the location and downloading processing. Just beneath these indicators, a pane shows which search engines have finished, while the lowest pane displays details about the pages found.

When all the search engines results are retrieved, Mata Hari stores, analyzes, and displays them in the Results Details window, as illustrated in Figure 5. The Results Details window scores the results based on the likely usefulness of a each page, with 100 being the highest rank and 0 the lowest. Since the pages are stored in a local database, you can use the built-in Local Viewer to display pages or view them in your own browser offline. If you double-click on a page in the Results Details window, your browser opens and downloads a fresh copy of the selected page from the Web.

Mata Hari provides a highly useful feature in the Results Details window called Local HTML. This option enables you to create a searchable database from your existing bookmarks, which lets you quickly find specific references when your bookmark list grows unwieldy. In addition, you can use the Local HTML feature to capture a directory page from a service like Yahoo! so you can more easily search all the links shown.mh_eye.gif (2113 bytes)

Mata Hari also offers a number of sophisticated statistical techniques to further refine your search results. They include the ability to select up to 32 pages and have Mata Hari rerank the rest of your results to identify the other pages most like those marked. The Intersection and Union options on the Query X Engines tab let you evaluate information from the intersection of two related queries and fine-tune search engines to produce the best results. 

The tabs across the bottom of the Results Details window display data about the scoring, search terms, and statistics for the downloaded pages. The scoring list provides specifics about ranking of each page while the search terms list is handy for culling through long results sets or to identify query terms to refine subsequent searches. The statistics list tabulates useful summary data to evaluate the quality of the results.

You can use the Results Details window to select pages for inclusion in an HTML report. The selected pages can be annotated to highlight important information. The report can then be passed along to others via email or the Web.

Mata Hari is one of a new breed of metasearch bots (called lexibots) that promise to make Web searching a much easier, more accurate, and less frustrating process. It is fast and requires only a small footprint on your computer. And perhaps most interesting of all, Mata Hari is designed to work not only with the Internet but local databases and HTML pages, which opens up some very interesting possibilities for intranet users. Mata Hari is so tempting that even the CIA has succumbed to its charms.


 The WebTools Company
 Download 30 day FREE evaluation copy of Mata Hari 1.1

 Searchbots

 Classification Direct Internet Access: http://www.searchbots.com/

 Return to "Best of the Bots" main page