Best of the Bots
Launch Queries on an Intranet or Across the Web
By Don Barker
Mata Hari v1.10, from The
WebTools Company, is the latest release of a powerful metasearch tool that now lets
you compose sophisticated query
statements to gather the exact information you want from the Web. Like the original
seductive World War I spy and exotic dancer, Mata Hari is quick on its feet. So fast,
according to Jerry Tardif, the President of The WebTools Company, that it beats all the
competition in delivering qualified search results. In addition to blazing speed, this
client software lets you construct precise search statements to gather information from
more than 140 search engines simultaneously.
A recent study by NPD Online Research found that 77% of Web users employ multiple
search engines to find information. In another report,
researches noted that even the best search engines only index approximately one third of
the Web but coverage improves dramatically when using multiple search sites. Thus,
metasearch tools have become a necessity in today's Web environment, where users must wade
through over 400 million pages to locate relevant information.
However, most metasearch bots do little more than add to information overload because
they don't provide a way to tailor your queries or filter out duplicates and documents in
foreign languages. Mata Hari v1.10's Universal Search
Power solves these problems by letting you use a vast range of Boolean operators in
a single easy-to-use interface that works with over 140 search engines to narrow or
broaden your searches, plus filters out unwanted documents in other languages and pages
from high level domains (e.g., .mil or .gov). Since each search engine varies in the way
and the extent it supports Boolean operators, this is quite a feat.
Mata Hari is a relatively painless download (1.5 MB) and install. Its Main
Screen lets you compose two structured queries for simultaneous searching of
multiple search engines, as shown in Figure 1. The
tabs across the top of the Main Screen allow you to specify filters and "groups"
on the Internet to search (e.g., Computers, Education, Health-Medicine, Jobs, Mailing
Lists, Movies, News, SciFi, and UseNet). Mata Hari also lets you create your own custom
filter and engine groups. The filtering for Mata Hari v1.10 is not as granular as BullsEye (i.e., Mata Hari
doesn't offer a simple way to eliminate secondary domain names -- you have to modify the
configuration file), but it does offer the ability to limit searches by page size and the
date last modified.
The Query 1 and Query 2 input boxes let you enter highly structured search statements
composed of this exhaustive list of Boolean operators and syntax:
- AND -- stipulates that terms on both sides of this operator must be
present somewhere in a document, or data, to be scored as result, or hit
- OR -- specifies that terms on EITHER side of the operator are
sufficient to be returned as a hit
- AND NOT -- indicates that terms appearing after this operator
eliminates the document, or data, from being reported as a result
- NEAR, BEFORE, AFTER -- specify the
number of words that can appear between two search terms and their order in a document for
it to be counted as a hit (placing NOT in front of each of these operators requires
inverse behavior to be accepted as a hit)
- Wildcards (stemming) -- beginning characters that must match the same beginning
characters in a document's words in order for it to be scored a hit
- Phrases -- series of words, enclosed by double quotes, that indicate a
document must contain the series exactly as shown for it to be scored as a hit
- Parentheses -- specify the sequence in which operators should be
evaluated, starting with the inner most ones and working out
As you can see, the support for Boolean operators and syntax in Mata Hari is
unsurpassed by any other metasearch tool. In fact, Mata Hari's Universal Search Power even
imbues Boolean search functionality to search engines without native support for these
operators. If you mark the Strict Boolean adherence checkbox on the Main Screen,
"...all selected engines will perform as Boolean engines whether they are Boolean
engines or not. The same is true of filtering, phrase searching, plain-text searching, use
of special search characters, etc." says Jerry Tardif.
As a consequence, Mata Hari lets you construct the most precise metasearch statements
possible. For example, let's say you want to find out the latest information about the
changes of an asteroid hitting Earth. You begin by choosing the keywords and phrases that most accurately
describe the topic (e.g., Earth, asteroid, hit, hitting, strike, striking, etc.). Next,
you combine these search terms, with the necessary Boolean operators and syntax, to form a
search statement that includes everything you want to find while excluding the pages you
don't want to see. In this instance, the query will be "Earth AND asteroid AND NOT
movie AND hit OR hitting OR strike OR striking," as shown in Figure 2.
Structured queries like this one are the most effective way to search the Internet but
they can be difficult to construct properly. Fortunately, Mata Hari provides a handy
feature that allows you to look at the actual Boolean expression that will be sent. Just
click the button at the end of the Query input box to display the structured search
statement, as shown in Figure 3. The parentheses
indicate the order, or precedence, in which the search terms will be processed by the
search engines. Processing starts with the inner most search terms and moves out (for
details on how to construct structured queries, see The WebTools Company award-winning
search tutorial at http://www.thewebtools.com/searchgoodies/tutorial.htm).
For example, the search terms "Earth AND asteroid AND NOT movie" have the
most parentheses around them, so this group will be processed first and together. As a
result, search engines will look for documents that contain both "Earth" and
"asteroid" but not "movie." The subsequent search terms connected by
the Boolean OR operator indicate that at least one of the terms "hit,"
"hitting," "strike," or "stricking" should also be present
in the pages.
Of course, adding more terms like "slam," "slamming,"
"collide," "colliding," etc. would return additional useful results.
Unfortunately, the current version of Mata Hari limits the length of structured queries,
so your only recourse is to construct a second query using the other search terms. This is
inconvenient and it will hopefully be resolved in the next release.
Once you are satisfied with the structure of your query, you simply click the Search
button on the Main Screen to begin the search. After a moment, the Search Progress Details
window appears, as shown in Figure 4. This window
keeps you updated about the status of the search. The top portion contains a series of
counters to track the location and downloading processing. Just beneath these indicators,
a pane shows which search engines have finished, while the lowest pane displays details
about the pages found.
When all the search engines results are retrieved, Mata Hari stores, analyzes, and
displays them in the Results Details window, as illustrated in Figure 5. The Results Details window scores the results
based on the likely usefulness of a each page, with 100 being the highest rank and 0 the
lowest. Since the pages are stored in a local database, you can use the built-in Local
Viewer to display pages or view them in your own browser offline. If you double-click on a
page in the Results Details window, your browser opens and downloads a fresh copy of the
selected page from the Web.
Mata Hari provides a highly useful feature in the Results Details window called Local
HTML. This option enables you to create a searchable database from your existing
bookmarks, which lets you quickly find specific references when your bookmark list grows
unwieldy. In addition, you can use the Local HTML feature to capture a directory page from
a service like Yahoo! so you can more easily search all the links shown.
Mata Hari also offers a number of sophisticated statistical techniques to further
refine your search results. They include the ability to select up to 32 pages and have
Mata Hari rerank the rest of your results to identify the other pages most like those
marked. The Intersection and Union options on the Query X Engines tab let you evaluate
information from the intersection of two related queries and fine-tune search engines to
produce the best results.
The tabs across the bottom of the Results Details window display data about the
scoring, search terms, and statistics for the downloaded pages. The scoring list provides
specifics about ranking of each page while the search terms list is handy for culling
through long results sets or to identify query terms to refine subsequent searches. The
statistics list tabulates useful summary data to evaluate the quality of the results.
You can use the Results Details window to select pages for inclusion in an HTML report.
The selected pages can be annotated to highlight important information. The report can
then be passed along to others via email or the Web.
Mata Hari is one of a new breed of metasearch bots (called lexibots) that promise to
make Web searching a much easier, more accurate, and less frustrating process. It is fast
and requires only a small footprint on your computer. And perhaps most interesting of all,
Mata Hari is designed to work not only with the Internet but local databases and HTML
pages, which opens up some very interesting possibilities for intranet users. Mata Hari is
so tempting that even the CIA has succumbed to its charms.
The WebTools Company
Download 30 day FREE evaluation copy of Mata Hari 1.1
Searchbots
Classification Direct Internet Access: http://www.searchbots.com/
Return to "Best of the
Bots" main page
|