Zombse

The Zombie Stack Exchanges That Just Won't Die

View the Project on GitHub anjackson/zombse

When is it legal/ethical to preserve website content?

There is a known problem with the Wayback Machine regarding domain name ownership - oftentimes, the domain changes hands, and the new owner either puts a spam blog on it, or puts a robots.txt which forbids crawling, and then WM promptly deletes the entire website history.

Is it legal to crawl and store personal copies of websites to protect against this particular problem? or to build such an engine?

wizzard0

Comments

Answer by Andy Jackson

Whether you as a individual have the right to take a copy of a website, and for what purposes, and whether you can redistribute it, will be down to you local copyright laws. I expect fair use/fail dealing exceptions mean you can copy but not distribute, but YMMV.

Another option is to be aware that there is more than just one web archive, each with different policies, so you could check for other holdings or nominate the site for archival elsewhere.

EDIT

I notice you included the ethics issue when you updated the title. This a very good point, and one that's not really been explored in detail as far as I know (although this work on Search Engines and Ethics might be a good starting point). Certainly, personally speaking, I'm not sure large national web archives should be archiving Facebook as a matter of course (for example), any more than we should be archiving everyone's email, or digitising everybody's letters. Such personal information should be 'opt in' only, IMO.

Comments

Answer by mopennock

The ethical issues around web archiving are interesting and the ethics of web archiving can vary from case to case. To judge whether a particular case is ethical, you might want to consider:

There have been a few papers published in recent years:

The latter is particularly relevant given the context stated in your question.

Comments