Zombse

The Zombie Stack Exchanges That Just Won't Die

View the Project on GitHub anjackson/zombse

Coping with spam in web archivers

There is a known problem with the Wayback Machine regarding domain name ownership - oftentimes, the domain changes hands, and the new owner either puts a spam blog on it, or puts a robots.txt which forbids crawling, and then WM promptly deletes the entire website history.

So, are there any options to combat this problem? Archive crawlers which don't retroactively delete content, for example.

EDIT: This question is an offspring from Preserving website content. This one is about "how to deal with spam", and the other one is "what websites we could/should preserve".

wizzard0

Comments