BruinBot Crawler
In the WebArchive project, we are interested in building a Web search
engine prototype,
where the users can ask for different versions of pages collected during
different periods of time. BruinBot is the crawler that we have
developed here at UCLA, and we use it to download parts of the Web which
are important for our research. BruinBot operates by following links
on the Web in order to discover pages which we subsequently download for our
search engine.
If your site has been recently visited by BruinBot it is because
we consider the content that you provide both interesting and
appropriate for our research. During our downloads we do our best to be
courteous to the sites that we
crawl and we adhere to the rules that they define in their robots.txt
files. At present we download one page every 2 seconds from a
Web site, and we have been redownloading the site once every week.
If you want our (or any other) crawler not to visit a particular
portion of your site, you can specify it by writing a simple
robots.txt file (the robot exclusion protocol):
http://www.robotstxt.org/wc/exclusion.html
Please direct any feedback regarding our crawler to
ntoulas at cs dot ucla dot edu