|
||
|
|
||
|
Data Sets The following datasets are available:
Real-Web dataset containing hash values of the content of 353,739 web
pages collected over a period of six months (Feb. 1999 - July 1999).
[ history.all.gz
]
Same real-web dataset formated in three columns (web_site, web_page, change_history).
Change history is a sequence of bits: 1 means that the specific page has
changed between the respective visits and 0 means that it remained the same
(e.g. 10000 means that the page changed the second time we visited it i.e.
on March). [ history.all.norm.gz
]
Synthetic dataset containing info for 300,000 pages in three columns (web_site,
web_page, change_history) over 200 visiting cycles. The change frequency
of the pages follows a normal distribution. [
synthetic.all.norm.gz
]
|
||
|
|
||
|