BotSpot has been updated!
Go to http://www.botspot.com to find what you need

Current Events

BOT2001 Report
In Search Of Search Bots

By Brian Proffitt


For those of us who can't find our car keys in the morning, facing something like the Internet can be a daunting task. With the vast proliferation of information on the Internet, is it any wonder the number of search bots has grown by leaps and bounds in recent years?

But which one to pick out of so many tools? And, if you are of a mind to program one of your own, how do you go about it?

Intelliseek Inc., a Cincinnati-based software company, is the maker of the client-side search bots BullsEye and BullsEye Pro, two excellent examples of how a stand-alone search engine can work. So when Sundar Kadayam, the cofounder and CTO of Intelliseek Inc. stands up and starts giving a lesson on how to put together an effective search bot, you'd better listen.

Kadayam began his talk to the BOT2001 audience with a definition of what he calls bots and agents.

Quoting IBM Fellow Ted Selker, Kadayam explained that "An agent is a software thing that knows how to do things that you could do if you have the time."

Agents, in order to be at their best, need to be autonomous, adaptive, and collaborative, Kadayam went on to say.

Bots, then, are agents that can be sent off on a mission. There are many different kinds of bots, Kadayam said, but the one type he homed in on were the search bots, which he described as "...bots that specifically help people find information."

The need for search bots is greater then ever, as converging technologies have come together in recent years to "create an environment where a user can be drowning in information, yet starving in knowledge," Kadayam said. So great is this need that right now search engine use is the second most popular use for the Internet, following e-mail messaging.

Search bots need to do more than just go out and seek answers to user queries. Kadayam outlined six major task areas a search bot must undertake on behalf of the user:

  • Expand search coverage to cover distributed sources and the "invisible Web"
  • Guarantee freshness by eliminating dead links and removing stale hits with non-matching content
  • Improve relevance by adding quality metrics and incorporating user feedback
  • Analyze and filter out irrelevant documents and cluster/categorize to aid visualization of data
  • Report and collaborate by annotating and generating reports and aiding in collaboration with other agents
  • Track and alert through continuously monitoring sources and alerting the user when key things are found

In order to accomplish these goals, Kadayam explained that all search bots should go through a five-stage process when processing a query.

The first step is selecting the best information sources and of all of the steps, this one is the most knowledge intensive. To do this effectively, your bot needs to know have knowledge of information sources, the query, the user, and the user's environment.

You can accomplish the acquisition of this data explicitly, by using a centralized query broker or collaborative self-reporting by information sources to get the knowledge of information sources, and asking the user for everything else. The drawback to this approach is, how to maintain the huge database of information you are bound to create?

Source selection can be done implicitly, by gathering information based on the user's past and present actions. The thing to be cautious of here is maintaining absolute security on the user's profile that such implicit methods will create.

The second step of the process is sending the query and receiving the results. To be effective here, a search bot should query multiple sources simultaneously, keep a list of alternate sources in case one site is not online or too slow, and adapt to changes on the Web. The search bot should not overload the server, the Internet, or the user by contacting dozens of sources for each query, Kadayam emphasized.

In the third step, post-processing the results, care should be taken to filter, arrange, and analyze the results in such as way that the user can quickly see relaxant and well-organized results. Search bots should not spend so much time on post-processing, though, that they lose the user in the process.

The fourth step is presenting the results, which should be done in a visual pleasing manner. The best search bots give the user some control over how their information is displayed. Try not to overload the user, Kadayam warned, and don't expect to please everyone all the time.

Finally, a good search bot will update the results for the user at a later date. Use spiders and tracking programs for this functionality, but be careful not to overload the user.

If these guidelines are followed, your search bot will be a well-used one.

And what about the future? Kadayam predicts smarter bots, which will learn about the user through the user's patterns, and also actively seek out and collaborate with other search agents. And, he added, this technology should become even more transparent to the users of tomorrow.