Data Sets
The following datasets are available:

Real-Web dataset containing hash values of the content of 353,739 web
pages collected over a period of six months (Feb. 1999 - July 1999).
[ history.all.gz
]

Same real-web dataset formated in three columns (web_site, web_page, change_history).
Change history is a sequence of bits: 1 means that the specific page has
changed between the respective visits and 0 means that it remained the same
(e.g. 10000 means that the page changed the second time we visited it i.e.
on March).
[ history.all.norm.gz
]

Synthetic dataset containing info for 300,000 pages in three columns (web_site,
web_page, change_history) over 200 visiting cycles. The change frequency
of the pages follows a normal distribution.
[
synthetic.all.norm.gz
]